Giter VIP home page Giter VIP logo

Comments (6)

adejanovski avatar adejanovski commented on May 27, 2024

Hi @lqid ,

it is possible that Reaper cannot reach some nodes through JMX, especially across DCs.
Could you please try to run a repair through the GUI, then activate it and click on the repair to open the details panel ? What's written on the last event line ?
image

If there's nothing obvious here we'll need to go through the logs in order to find what's wrong.
Things you can try to narrow the problem down :

  • Run a full repair instead of an incremental one
  • Run reaper in memory mode instead of database to check if the storage backend is the problem

from cassandra-reaper.

zznate avatar zznate commented on May 27, 2024

Cassandra 2.2.5 running on Windows Server 2008 R2

@lqid I very much recommend updating to 2.2.8, even 2.2.9 tip (we may not formally release 2.2 again) as there are a number of minor streaming and repair issues fixed in between those versions, most relevant is sane streaming timeouts by default.

from cassandra-reaper.

lqid avatar lqid commented on May 27, 2024

Hi @adejanovski
Regarding the JMX connection, I've made sure that the server Reaper resides on is able to remotely connect via JMX to each of the Cassandra nodes via JConsole.

Last event reads: Triggered repair of segment 4669 via host node1
Repaired progress bar remains at 0/6

Looking in the Reaper logs, the last message relating to a repair is:

DEBUG  [2017-01-04 08:45:48,251] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair completed successfully] 
DEBUG  [2017-01-04 08:45:48,251] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair command #1 finished in 3 minutes 37 seconds]

As per Bhuvan Rawal's email to [email protected], I've also tried adjusting Reaper configuration to repairRunThreadCount: 1, which had no apparent effect.

As for running a full repair instead of an incremental one, I had already tried that, with the same result.
I'll run Reaper in memory mode after I give this some time, but I suspect I'll need to do a cluster restart again.

@zznate I'll definitely take that to heart, and I do agree with upgrading to the latest version just on principle. I'll run with that as soon as the opportunity comes up for us to upgrade.

from cassandra-reaper.

lqid avatar lqid commented on May 27, 2024

Good news, and bad...

I rescind my previous comment of adjusting repairRunThreadCount: 1 having no effect.
Before modifying this, repairs would "hang" (as the title of this issue suggests) with threads just doing nothing indefinitely, however, now I am seeing log messages on both the Reaper server and Cassandra nodes with normal repair progress messages, albeit them coming through very slowly. (To expected with such a low thread count, I assume?).

Last event is also being updated as below...
Last event reads: Triggered repair of segment 4667 via host node3

Again, in Reaper logs, notice time stamp and delta from previous comment:

DEBUG  [2017-01-04 10:17:40,320] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair completed successfully] 
DEBUG  [2017-01-04 10:17:40,320] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair command #1 finished in 3 minutes 56 seconds] 

Note that Repaired progress bar still remains at 0/6.
Not sure how the progress bar denominator is calculated, to be honest(?)

from cassandra-reaper.

adejanovski avatar adejanovski commented on May 27, 2024

@lqid : I'm able to reproduce the problem using a CCM cluster with Cassandra 2.2.5. The acceptance test suite fails as the first segment is never marked as DONE.
Running it with Cassandra 2.2.8 works fine though.

I've traced the problem back to CASSANDRA-11430 : we're still using the deprecated repair methods in Reaper, which didn't properly handle notifications in Cassandra 2.2 until 2.2.6.

I'd support @zznate recommendation to upgrade to the latest 2.2 in order to have properly working repairs.

We have an open issue for switching to non deprecated repair methods but no ETA yet.

from cassandra-reaper.

lqid avatar lqid commented on May 27, 2024

Understood. Thank you all for the support and clear explanations.

from cassandra-reaper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.