Comments (6)
Hi @lqid ,
it is possible that Reaper cannot reach some nodes through JMX, especially across DCs.
Could you please try to run a repair through the GUI, then activate it and click on the repair to open the details panel ? What's written on the last event line ?
If there's nothing obvious here we'll need to go through the logs in order to find what's wrong.
Things you can try to narrow the problem down :
- Run a full repair instead of an incremental one
- Run reaper in memory mode instead of database to check if the storage backend is the problem
from cassandra-reaper.
Cassandra 2.2.5 running on Windows Server 2008 R2
@lqid I very much recommend updating to 2.2.8, even 2.2.9 tip (we may not formally release 2.2 again) as there are a number of minor streaming and repair issues fixed in between those versions, most relevant is sane streaming timeouts by default.
from cassandra-reaper.
Hi @adejanovski
Regarding the JMX connection, I've made sure that the server Reaper resides on is able to remotely connect via JMX to each of the Cassandra nodes via JConsole.
Last event reads: Triggered repair of segment 4669 via host node1
Repaired progress bar remains at 0/6
Looking in the Reaper logs, the last message relating to a repair is:
DEBUG [2017-01-04 08:45:48,251] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair completed successfully]
DEBUG [2017-01-04 08:45:48,251] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair command #1 finished in 3 minutes 37 seconds]
As per Bhuvan Rawal's email to [email protected], I've also tried adjusting Reaper configuration to repairRunThreadCount: 1
, which had no apparent effect.
As for running a full repair instead of an incremental one, I had already tried that, with the same result.
I'll run Reaper in memory mode after I give this some time, but I suspect I'll need to do a cluster restart again.
@zznate I'll definitely take that to heart, and I do agree with upgrading to the latest version just on principle. I'll run with that as soon as the opportunity comes up for us to upgrade.
from cassandra-reaper.
Good news, and bad...
I rescind my previous comment of adjusting repairRunThreadCount: 1
having no effect.
Before modifying this, repairs would "hang" (as the title of this issue suggests) with threads just doing nothing indefinitely, however, now I am seeing log messages on both the Reaper server and Cassandra nodes with normal repair progress messages, albeit them coming through very slowly. (To expected with such a low thread count, I assume?).
Last event is also being updated as below...
Last event reads: Triggered repair of segment 4667 via host node3
Again, in Reaper logs, notice time stamp and delta from previous comment:
DEBUG [2017-01-04 10:17:40,320] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair completed successfully]
DEBUG [2017-01-04 10:17:40,320] [ppe_cass1_c1] c.s.r.c.JmxProxy - Received notification: javax.management.Notification[source=repair:1][type=progress][message=Repair command #1 finished in 3 minutes 56 seconds]
Note that Repaired progress bar still remains at 0/6.
Not sure how the progress bar denominator is calculated, to be honest(?)
from cassandra-reaper.
@lqid : I'm able to reproduce the problem using a CCM cluster with Cassandra 2.2.5. The acceptance test suite fails as the first segment is never marked as DONE.
Running it with Cassandra 2.2.8 works fine though.
I've traced the problem back to CASSANDRA-11430 : we're still using the deprecated repair methods in Reaper, which didn't properly handle notifications in Cassandra 2.2 until 2.2.6.
I'd support @zznate recommendation to upgrade to the latest 2.2 in order to have properly working repairs.
We have an open issue for switching to non deprecated repair methods but no ETA yet.
from cassandra-reaper.
Understood. Thank you all for the support and clear explanations.
from cassandra-reaper.
Related Issues (20)
- Exceptions are being raised when using LOCAL mode with the mgmt-api
- "Forbidden access. Please login to access this page", when try to access the webUI on 8080 port HOT 3
- Cassandra 5x: removal of deprecated dateof function causes startup failure in cassandra-migration library HOT 2
- Upgrade to jdk 21
- I cannot login to the UI HOT 3
- cassandra-reaper with ScyllaDB version 5.0/5.1/5.2 HOT 1
- Reaper error HOT 2
- Update documentation for Cassandra 2.x
- Bug: webui will generate a huge number of requests if `_refreshClusterStatus` keep getting error responses
- [Feature Request] Enable different repair types within the same schedule.
- Make TLS connection available to management-api
- webui login not working in docker HOT 3
- Cassandra Reaper strips dashes and underscore from metrics tags HOT 3
- Incorrect values for millisSinceLastRepairForSchedule in prometheusMetrics HOT 4
- Investigate why CI breaks with python 3.12
- Autoscheduler and incremental repairs
- upgrade dropwizard.version 2.1.0 or higher to fix CVEs
- NPE when listing repairs HOT 4
- Spring vulnerability HOT 1
- Disallow a full/incremental to start if another one is running HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cassandra-reaper.