Giter VIP home page Giter VIP logo

Comments (57)

nberardi avatar nberardi commented on July 30, 2024

Please mock up a change and submit a pull request this is probably a valuable addition to have.

from fluentcassandra.

lhowlader avatar lhowlader commented on July 30, 2024

I am currently working on implementing the Circuit Breaker pattern along with unit tests. Once it is implemented and tested, we will not need to blacklist a server. I believe the Circuit Breaker pattern is relevant to this issue. Please feel free to comment if you have considered the pattern and have any insight on why it should not be used. Current plan is to be able to configure the number of attempts that trip the breaker. Also, confugrable min (millisec) and max (millsec) params that allows the retries (to see if Breaker in Open state can be Closed) progrssively between min and max. Once max is reached, the Circuit Breaker for the node will publish an event that can be subsribed to indicating that the node require special attention/intervention.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

I think this will be a great addition. I never really liked the blacklist, so I am happy to hear somebody is trying to get rid of it.

from fluentcassandra.

dlbromen avatar dlbromen commented on July 30, 2024

Update:
@Smoles and myself have taken over this functionality from lhowlander. There will probably be a lot of discussion about what we've done so please let me know if you'd like to move it to another issue or a different communication channel.

Some of the design principles:

  1. Do not blacklist a server forever.
  2. Blacklist on a configurable number of failures (not just one).
  3. Automatically retry a server after a configurable amount of time. If it fails again, reblacklist. If it succeeds, re-enable it.

from fluentcassandra.

dlbromen avatar dlbromen commented on July 30, 2024

Sorry, accidently hit submit...

  1. Don't open the floodgates when retrying a server. Just try a single connection
  2. When re-enabling a server, make it available for use (especially true when using Pooling=True)
  3. Don't consider everything a failure. Set of specific I/O related exceptions.

.. and there are probably more.

We've got something staged up, here: https://github.com/W3i/fluentcassandra/tree/circuit_breaker

Let us know how you would like to proceed. Thank you.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

Is this ready for prime time?

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

I've implemented something similar that I am currently using in production but also takes it a step further and classify's various exception types as retry/clienthealthy -- so while you might get a connection that's good from CassandraSession.GetClient(), it might go down between the time it was opened successfully and when a command is executed against it. If we have more than one server in the list try to move to another one up to a max retry count.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

eplowe: do you have the code you can submit for it? I;d like to take a look and possibly use it in our scenario as well.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

Sure -- When I get home I'll commit into my fork. My retry implementation
is definitely not as elegant as yours, but works like a charm. I just
create a Timer in RoundRobinServerManager that is triggered on
Blacklisting. Also added two properties to CassandraException to signify
retry/healthy-ness of a node. And in Operation I modified TryExecute to
inspect the the type of exception thrown and then build a proper
cassandraexception from it as well as if it should retry and if the node is
healthy. ExecutionOperation() in CassandraSession was then modified to loop
while there was no exception or until a counter reaches the max defined
retry count.

On Tue, Nov 13, 2012 at 4:27 PM, ilushka85 [email protected] wrote:

eplowe: do you have the code you can submit for it? I;d like to take a
look and possibly use it in our scenario as well.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10344412.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

If any of you guys want to send a pull request, I would be happy to take it. I know this is an issue that we need to get resolved soon. Eventually I would like to make the a ServiceManager that is ring aware so that the RoundRobin can be retired.

from fluentcassandra.

tjake avatar tjake commented on July 30, 2024

I was also thinking of the circuit breaker pattern. I'll try out @dlbromen's patch

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

eplowe did you get a chance to upload your changes?

from fluentcassandra.

tjake avatar tjake commented on July 30, 2024

@eplowe ^

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

No sorry been kinda crazy with the day job. Let me get it by today.
On Nov 14, 2012 11:08 AM, "Jake Luciani" [email protected] wrote:

@eplowe https://github.com/eplowe ^


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10371782.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

https://github.com/eplowe/fluentcassandra/tree/blacklisting-enhancements

Let me know what you think. So far from my testing, it appears to be done what I expect. With that said, there may be some edge cases not accounted for.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

Should we really be using RoundRobin anymore? It was created at a time when there were no mechanisms to describe the nodes in a cluster. But now, but we now have that ability with the describe_ring function.

https://github.com/managedfusion/fluentcassandra/blob/master/src/Operations/CassandraClientWrapper.cs#L154

So I guess my question is should we keep putting lipstick on this pig, or move to something that is better suited for Cassandra moving forward?

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

Agreed 100% --- but for now either dlbromens patch, or my patch can act as a stop-gap until this new mechanism is in place.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

With that said, I'd be more than happy to work on an auto discovery mechanism.

from fluentcassandra.

tjake avatar tjake commented on July 30, 2024

What does autodiscover have todo with round robin? Wouldn't you use autodiscover to get the list of servers to round robin? Also auto discover means you need to be sure you don't connect across DCs.

Some clients like Astynax try to connect to the one of the replicas so you cut down on coordinators hops. But even then you would still need some kind of round robin across replicas.

We've been running with @dlbromen's patch and so far so good.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

You'd need something similar for sure. The idea is that as long as your
code can connect to at least one c* box it'll update itself with a list of
servers. So you could add/remove servers to your cluster without having to
update your configs/code in most situations.

Hector does this and uses policies to determine which machine to connect to
-- load balanced, round robin , etc
On Nov 15, 2012 8:56 AM, "Jake Luciani" [email protected] wrote:

What does autodiscover have todo with round robin? Wouldn't you use
autodiscover to get the list of servers to round robin? Also auto discover
means you need to be sure you don't connect across DCs.

Some clients like Astynax try to connect to the one of the replicas so you
cut down on coordinators hops. But even then you would still need some kind
of round robin across replicas.

We've been running with @dlbromen https://github.com/dlbromen's patch
and so far so good.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10409128.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

@eplowe @dlbromen @tjake Are you guys using connection pool in your environment? If yes are you seeing it properly release connections?

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

If you don't define a connection lifetime I am seeing the connections left
open, which makes sense, at least to me. The most expensive part of
creating a socket connection is the establish.

On Thu, Nov 15, 2012 at 11:48 AM, ilushka85 [email protected]:

@eplowe https://github.com/eplowe @dlbromenhttps://github.com/dlbromen
@tjake https://github.com/tjake Are you guys using connection pool in
your environment? If yes are you seeing it properly release connections?


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10415525.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

@eplowe well the connections should be left open... the question is if they get returned properly. This is an issue we have had that I have commented on and had a slight but improper fix for this issue.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

@ilushka85 Understood. And by returned properly, you mean from say an
exception? Based on my tests, and this is just from my particular use cases
so YMMV, I am seeing the call to PooledConnectionProvider::Close happen
when CassandraSession::Dispose is called -- which will remove the
connection from the usedconnections list and return it back to the
freeconnection queue. I'll take a look at the issue for further context.

On Thu, Nov 15, 2012 at 12:11 PM, ilushka85 [email protected]:

@eplowe https://github.com/eplowe well the connections should be left
open... the question is if they get returned properly. This is an issue we
have had that I have commented on and had a slight but improper fix for
this issue.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10416529.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

heres a simple test if you debug this and say after the tenth time thru the loop look at the connection pool you will see ten connections are in use and not returned to freed.

var builder = new ConnectionBuilder(ConfigurationManager.AppSettings["ResultlyDataConnectionString"]);

    var db = new CassandraContext(builder);



    CassandraColumnFamily usersColumnFamily = new CassandraColumnFamily<UTF8Type>(db, "Users");


    for (int x = 0; x < 100; x++)
    {
        var rows = usersColumnFamily.Get().TakeKeys(25);
        string lastKey = rows.Last().Key;


        Console.WriteLine(lastKey);
    }

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

You're right -- I did a similar test against a my dev cassandra server. I
think I see a possible solution. I'll post reply again when I have more.

On Thu, Nov 15, 2012 at 12:32 PM, ilushka85 [email protected]:

heres a simple test if you debug this and say after the tenth time thru
the loop look at the connection pool you will see ten connections are in
use and not returned to freed.

var builder = new
ConnectionBuilder(ConfigurationManager.AppSettings["ResultlyDataConnectionString"]);

var db = new CassandraContext(builder);



CassandraColumnFamily usersColumnFamily = new CassandraColumnFamily<UTF8Type>(db, "Users");


for (int x = 0; x < 100; x++)
{
    var rows = usersColumnFamily.Get().TakeKeys(25);
    string lastKey = rows.Last().Key;


    Console.WriteLine(lastKey);
}


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10417317.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

@eplowe i posted a hack that fixes this in my other message:

My crude attempt at fixing this was inserting the following snippets in each of the operation function classes

var temp = output.GetEnumerator();
if (Session.WasDisposed)
{
Session.WasDisposed = false;
Session.Dispose();
}

Any better ideas on how to do the above? it has to do with how .net doesnt really execute code till later.

IN addition i noticed connections having an exception do not get returned to pool or closed in original code.

from fluentcassandra.

dlbromen avatar dlbromen commented on July 30, 2024

I haven't had a chance to catch up with the recent activity here, but can you please take the Connection releasing discussion to issue #29. Thank you.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

also current round robin method and connection pool means that second and subsequent queries you execute are not hit on next server in round robin but on the same one open connection ... so if your code only requires the use of one connection and you are using a pool all executions will happen on that one server regardless of how many you have defined.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

I have a general idea in my head that I am churning over while at lunch
atm. Connections not returning due to exception (server down, invalid
request, etc) should be fixed in my patch that is currently in my fork.
Let's move this thread to the proper issue # moving forward.
On Nov 15, 2012 12:45 PM, "ilushka85" [email protected] wrote:

@eplowe https://github.com/eplowe i posted a hack that fixes this in my
other message:

My crude attempt at fixing this was inserting the following snippets in
each of the operation function classes

var temp = output.GetEnumerator();
if (Session.WasDisposed)
{
Session.WasDisposed = false;
Session.Dispose();
}

Any better ideas on how to do the above? it has to do with how .net doesnt
really execute code till later.

IN addition i noticed connections having an exception do not get returned
to pool or closed in original code.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10417842.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

@eplowe Ok. Waiting for your input there.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

@eplowe any thoughts on my comment above about your round robin replacement changes not taking into account using other servers when using a pool?

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

I have a change that I am still testing/not happy with that does auto
discovery plus returning potentially a diff node that's up for a single
context that's running many different commands so its not just leaning on
the first node in the queue that's up and available
On Nov 16, 2012 11:13 AM, "ilushka85" [email protected] wrote:

@eplowe https://github.com/eplowe any thoughts on my comment above
about your round robin replacement changes not taking into account using
other servers when using a pool?


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10452221.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

@eplowe using your blacklisting enhancements with enough timeout exceptions using the pool the connections still do not get reset and all subsequent connections fail as pool is exhausted.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" [email protected] wrote:

@eplowe https://github.com/eplowe using your blacklisting enhancements
with enough timeout exceptions using the pool the connections still do not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10479535.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe [email protected] wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" [email protected] wrote:

@eplowe https://github.com/eplowe using your blacklisting enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHubhttps://github.com//issues/51#issuecomment-10479589.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

I've simulated throwing random timeout executions and from what I am seeing
I am seeing connections go back from _usedConnections to the free
connection queue. Are you talking about fluentcassandra throwing this exception?

"No connection could be made because all servers have failed."

If so, that's because I am marking the Client as not healthy for a
timeout, it should actually still technically be healthy, that's an
oversight on my part

On Sat, Nov 17, 2012 at 5:11 PM, ilushka85 [email protected] wrote:

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe [email protected] wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" [email protected] wrote:

@eplowe https://github.com/eplowe using your blacklisting
enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<

https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10479597.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

So for a brief moment all servers will be blacklisted. When the timer runs
it'll re-add to the serverqueue if no exception is thrown when it tries to
open the connection. I modified CassnadraOperationException to mark the
IsHealthy = true, instead of false for a TimedOutException and then set
MaxRetries = 5 in my connection string and re ran the test with random
timeoutexceptions being thrown in Operation::Execute and everything behaved
as expected.

On Sat, Nov 17, 2012 at 5:28 PM, Eric Plowe [email protected] wrote:

I've simulated throwing random timeout executions and from what I am
seeing I am seeing connections go back from _usedConnections to the free
connection queue. Are you talking about cassandra throwing this exception?

"No connection could be made because all servers have failed."
?? If so, thats because I am marking the Client as not healthy for a timeout, it should actually still technically be healthy, that's an oversight on my part

On Sat, Nov 17, 2012 at 5:11 PM, ilushka85 [email protected]:

Don't have an example per say but that's what we see happening in our app
now.

Sent from my iPhone

On Nov 17, 2012, at 4:10 PM, Eric Plowe [email protected]
wrote:

Please provide an example.
On Nov 17, 2012 5:05 PM, "ilushka85" [email protected] wrote:

@eplowe https://github.com/eplowe using your blacklisting
enhancements
with enough timeout exceptions using the pool the connections still do
not
get reset and all subsequent connections fail as pool is exhausted.


Reply to this email directly or view it on GitHub<

https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479535>.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10479589>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10479597.

from fluentcassandra.

ilushka85 avatar ilushka85 commented on July 30, 2024

@eplowe I have not been able to reproduce since we corrected what was causing the timeout issue in the first place but will continue to test it.

We have found a bug in cassandra with secondary indexes just so you are aware that we reported to datastax who is working on a solution. Basically timeouts occur if you rely on secondary indexes and perform deletes on the column with the secondary index. If you delete enough of them a timeout will occur during querys against the secondary index while it skips over tombstones as they still exist in the secondary index. this is what lead to our timeouts.

In any case.... Were you able to make any more progress on your changes with auto discovery etc?

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

Any chance we can get a patch? Was hoping to wrap up the next release before Thanksgiving.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

I'll push the fix to mark TimedOutException::IsClientHealthy as true vs
false shortly. As for the auto discovery that will not be wrapped up before
the holidays as its a rather big change that I'd like to put more time into
it before pushing the branch for review.
On Nov 20, 2012 9:11 AM, "Nick Berardi" [email protected] wrote:

Any chance we can get a patch? Was hoping to wrap up the next release
before Thanksgiving.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10555661.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

@eplowe thanks for the TimedOutException

RE: auto discovery, this functionality should be moved to another connection manager, since for the time being we still need to support the current methodology of server lists that are provided. Right now I am favoring if in the connection string the server is just a server:* then we will use auto discovery, anything else will be the old method. How does that sound?

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

Agreed. You still do need to provide at least one server because you need
something to connect to run describe ring against. I've also added auto
discover as a connection string/builder property as well as the ability to
define how often it should attempt to auto discover.
On Nov 20, 2012 2:36 PM, "Nick Berardi" [email protected] wrote:

@eplowe https://github.com/eplowe thanks for the TimedOutException

RE: auto discovery, this functionality should be moved to another
connection manager, since for the time being we still need to support the
current methodology of server lists that are provided. Right now I am
favoring if in the connection string the server is just a server:* then
we will use auto discovery, anything else will be the old method. How does
that sound?


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10569330.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

You are right, I like your method better. I think auto discover should be turned on by default, what is your thoughts?

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

That's the route I am going. Also, to give you a heads up I am removing the
concept of normal vs pool provider and moving to a more generalized manager
concept that will manage separate pools of connections per host instead of
one pool spread out for n servers
On Nov 20, 2012 2:56 PM, "Nick Berardi" [email protected] wrote:

You are right, I like your method better. I think auto discover should be
turned on by default, what is your thoughts?


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10570168.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

The change for marking TimedOutException as IsClientHealthy = true instead
of false has been committed and pushed.

On Tue, Nov 20, 2012 at 3:03 PM, Eric Plowe [email protected] wrote:

That's the route I am going. Also, to give you a heads up I am removing
the concept of normal vs pool provider and moving to a more generalized
manager concept that will manage separate pools of connections per host
instead of one pool spread out for n servers
On Nov 20, 2012 2:56 PM, "Nick Berardi" [email protected] wrote:

You are right, I like your method better. I think auto discover should be
turned on by default, what is your thoughts?


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10570168.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

Did you send a pull request?

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

There you go. Also let me know what you decide to do with the connection
pooling/enumerators/lazy loading issue.

On Tue, Nov 20, 2012 at 4:03 PM, Nick Berardi [email protected]:

Did you send a pull request?


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10572899.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

Thanks for the commit from issue #88, I think that closes the issue for now. At least until we open the code back up for #29, regarding auto searching of servers.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

Enumerator/lazy should be switched to lists since it is an RPC call.
Pooling we will wait and see how the removal of lazy commands effect it. I
would like to keep those commits separate so test they can be evaluated
individually.

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 4:07 PM, Eric Plowe [email protected] wrote:

There you go. Also let me know what you decide to do with the connection
pooling/enumerators/lazy loading issue.

On Tue, Nov 20, 2012 at 4:03 PM, Nick Berardi [email protected]:

Did you send a pull request?


Reply to this email directly or view it on GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10572899>.


Reply to this email directly or view it on
GitHubhttps://github.com//issues/51#issuecomment-10573079.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

Just looked at the refactoring -- You should stop the timer when you try to recover, and the recovery time IMO is too long. You could have a situation where you're having some network hiccups and given the amount of traffic to your application you blacklist all your servers and now you have 30 seconds until we attempt to check if they are back up or not. 30 seconds when you're getting getting hammered with traffic feels like years.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

You do realize you weren't actually starting or stopping anything? You were
just setting trigger times in the endless loop that runs in the timer
thread, mine essentially does the same thing but uses a random time between
0 and 30 seconds.

A better one to use for servers according to MSDN is Timers.Timer. But I
was lazy and didn't want to update the code too much.

http://msdn.microsoft.com/en-us/library/system.timers.timer.aspx

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 6:41 PM, Eric Plowe [email protected] wrote:

Just looked at the refactoring -- You should stop the timer when you try to
recover, and the recovery time IMO is too long. You could have a situation
where you're having some network hiccups and given the amount of traffic to
your application you blacklist all your servers and now you have 30 seconds
until we attempt to check if they are back up or not. 30 seconds when
you're getting getting hammered with traffic feels like years.


Reply to this email directly or view it on
GitHubhttps://github.com//issues/51#issuecomment-10579487.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

http://msdn.microsoft.com/en-us/library/yz1c7148%28v=vs.80%29.aspx

Timer.Change Method (Int32, Int32)

dueTime

The amount of time to delay before the invoking the callback method specified when the Timer was constructed, in milliseconds. Specify Timeout.Infinite to prevent the timer from restarting. Specify zero (0) to restart the timer immediately. 

period

The time interval between invocations of the callback method specified when the Timer was constructed, in milliseconds. Specify Timeout.Infinite to disable periodic signaling. 

Also, there is a slight bug -- the reason I was passing the two states on the exception was to know what to do based on the exception. But you are doing things like blacklisting a connection and retrying for TimedOutExceptions -- which is wrong, a timeout doesn't necessarily indicate an unhealthy client. Just that the particular command timed out. You're also not taking into account IOException and NotFoundException -- which are important. That is a pretty easy fix.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

Also, I just noticed this as well, and when we do remove a server from the blacklist we need to add it back to _serverQueue.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

And sorry if I diverged from your style I'll make sure future changes
adhere! With that said thanks for the comments and insight. So far its been
great working with you.
On Nov 20, 2012 6:54 PM, "Nick Berardi" [email protected] wrote:

You do realize you weren't actually starting or stopping anything? You
were
just setting trigger times in the endless loop that runs in the timer
thread, mine essentially does the same thing but uses a random time
between
0 and 30 seconds.

A better one to use for servers according to MSDN is Timers.Timer. But I
was lazy and didn't want to update the code too much.

http://msdn.microsoft.com/en-us/library/system.timers.timer.aspx

Nick Berardi
(484) 302-0125
Sent from my tablet.

On Nov 20, 2012, at 6:41 PM, Eric Plowe [email protected] wrote:

Just looked at the refactoring -- You should stop the timer when you try
to
recover, and the recovery time IMO is too long. You could have a situation
where you're having some network hiccups and given the amount of traffic
to
your application you blacklist all your servers and now you have 30
seconds
until we attempt to check if they are back up or not. 30 seconds when
you're getting getting hammered with traffic feels like years.


Reply to this email directly or view it on
GitHub<
https://github.com/managedfusion/fluentcassandra/issues/51#issuecomment-10579487>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10579912.

from fluentcassandra.

nberardi avatar nberardi commented on July 30, 2024

I know what they say, but how do you think it actually work under the covers? It is essentially this.

while(true) {
    if (TickCount == CallbackTickCount)
        Callback();
}

I do realize this is an over simplification, but essentially this is how all timers work. Also on MSDN they do make this note on http://msdn.microsoft.com/en-us/library/system.threading.timer.aspx

System.Threading.Timer is a simple, lightweight timer that uses callback methods and is served by thread pool threads. It is not recommended for use with Windows Forms, because its callbacks do not occur on the user interface thread. System.Windows.Forms.Timer is a better choice for use with Windows Forms. For server-based timer functionality, you might consider using System.Timers.Timer, which raises events and has additional features.

Also here is the difference between System.Timers.Timer and System.Threading.Timer

http://stackoverflow.com/questions/1416803/system-timers-timer-vs-system-threading-timer

RE: IOException and NotFoundException

I didn't actually see you implement them anywhere, so I assumed it was a mistake.

https://github.com/eplowe/fluentcassandra/blob/de1be9790a4d27c77a98a692e10916d2be16625e/src/Operations/Operation.cs

Also it has been good to work with you, I just like to maintain some sort of order so the codebase doesn't get overrun with personal one-off changes that only benefit one entity.

from fluentcassandra.

eplowe avatar eplowe commented on July 30, 2024

Oh I know. I actually believe they sleep to avoid spinning -- my memory
might be a little hazy from looking at the implementation in reflector. I
took them into account in CassandraOperationException. And I agree a 100%
about keeping the code base inline. I just noticed the issue and needed to
rectify it for my environment but didn't take the time to refactor the
code. Bad discipline on my part.
On Nov 20, 2012 7:27 PM, "Nick Berardi" [email protected] wrote:

I know what they say, but how do you think it actually work under the
covers? It is essentially this.

while(true) {
if (TickCount == CallbackTickCount)
Callback();}I do realize this is an over simplification, but essentially this is how all timers work. Also on MSDN they do make this note on http://msdn.microsoft.com/en-us/library/system.threading.timer.aspxSystem.Threading.Timer is a simple, lightweight timer that uses callback methods and is served by thread pool threads. It is not recommended for use with Windows Forms, because its callbacks do not occur on the user interface thread. System.Windows.Forms.Timer is a better choice for use with Windows Forms. For server-based timer functionality, you<
/span> might consider using System.Timers.Timer, which raises events and has additional features.`
Also here is the difference between System.Timers.Timer and System.Threading.Timer
http://stackoverflow.com/questions/1416803/system-timers-timer-vs-system-threading-timer
RE: IOException and NotFoundException
I didn't actually see you implement them anywhere, so I assumed it was a mistake.
https://github.com/eplowe/fluentcassandra/blob/de1be9790a4d27c77a98a692e10916d2be16625e/src/Operations/Operation.cs
Also it has been good to work with you, I just like to maintain some sort of order so the codebase doesn't get overrun with personal one-off changes that only benefit<
/span> one entity.


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-10580764.

from fluentcassandra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.