Giter VIP home page Giter VIP logo

Comments (5)

aarondandy avatar aarondandy commented on May 31, 2024

That sounds pretty strange. Maybe a place to start would be some more details:

  • What do you mean by "fail"?
  • Are you getting an exception or does the result differ from expectations?
  • Does this happen consistently for a specific word? Is it possible to create a minimum reproduction by using a specific word with a specific TimeLimit value?
  • Is the difference with Check or Suggest?
  • For the machines where the results don't meet expectations, does it always not meet expectations or is the result intermittent even on those machines?

My guess is you are running into the time limits but you mentioned you tinkered with the time limits already. The design of Hunspell has, in my opinion, an awkward timing mechanism to prevent overuse of CPU resources. You might have tried this already, but increasing the MinTimer to a larger value and increasing the TimeLimit values might help ensure you get more consistent results during testing. See:

public int MinTimer { get; set; } = 100;
/// <summary>
/// The time limit for some long running steps during suggestion generation.
/// </summary>
/// <remarks>
/// Timelimit: max ~1/4 sec (process time on Linux) for a time consuming function.
/// </remarks>
public TimeSpan TimeLimitSuggestStep { get; set; } = TimeSpan.FromMilliseconds(250);
/// <summary>
/// The time limit for each compound suggestion iteration.
/// </summary>
public TimeSpan TimeLimitCompoundSuggest { get; set; } = TimeSpan.FromMilliseconds(100);
/// <summary>
/// The time limit for each compound word check operation.
/// </summary>
public TimeSpan TimeLimitCompoundCheck { get; set; } = TimeSpan.FromMilliseconds(50);
/// <summary>
/// A somewhat overall time limit for the suggestion algorithm.
/// </summary>
public TimeSpan TimeLimitSuggestGlobal { get; set; } = TimeSpan.FromMilliseconds(250);

from wecantspell.hunspell.

ADD-eNavarro avatar ADD-eNavarro commented on May 31, 2024

That sounds pretty strange. Maybe a place to start would be some more details:

  • What do you mean by "fail"?

I mean that it fails to use the right algorithm to give an answer.

  • Are you getting an exception or does the result differ from expectations?

Different result, seems to be using the second algorithm.

  • Does this happen consistently for a specific word? Is it possible to create a minimum reproduction by using a specific word with a specific TimeLimit value?

Yes, our tests are made with single and multiple words (a phrase), but always the same, so to know the expected result. We have played around with the TimeLimit value, all the way from 1ms (to force second algorithm) to 3000ms (12 times the base time limit to be sure it's the first one solving the query). That's how we realized that this issue happened in some machines and in those only.
I'm not sure if you're asking for a minimum reproduction example code, if that's the case please tell and I'd gladly write it.

  • Is the difference with Check or Suggest?

To be precise, in the multi-word input I run a Check on each word first and Suggest only in those not present in the dictionary, but that part working fine I haven't checked the inner workings of Check. does it use the same timed algorithm-switching mechanism?

  • For the machines where the results don't meet expectations, does it always not meet expectations or is the result intermittent even on those machines?

Seems to be consistent in those machines. Will make a deeper check though.

My guess is you are running into the time limits but you mentioned you tinkered with the time limits already. The design of Hunspell has, in my opinion, an awkward timing mechanism to prevent overuse of CPU resources. You might have tried this already, but increasing the MinTimer to a larger value and increasing the TimeLimit values might help ensure you get more consistent results during testing. See:

public int MinTimer { get; set; } = 100;
/// <summary>
/// The time limit for some long running steps during suggestion generation.
/// </summary>
/// <remarks>
/// Timelimit: max ~1/4 sec (process time on Linux) for a time consuming function.
/// </remarks>
public TimeSpan TimeLimitSuggestStep { get; set; } = TimeSpan.FromMilliseconds(250);
/// <summary>
/// The time limit for each compound suggestion iteration.
/// </summary>
public TimeSpan TimeLimitCompoundSuggest { get; set; } = TimeSpan.FromMilliseconds(100);
/// <summary>
/// The time limit for each compound word check operation.
/// </summary>
public TimeSpan TimeLimitCompoundCheck { get; set; } = TimeSpan.FromMilliseconds(50);
/// <summary>
/// A somewhat overall time limit for the suggestion algorithm.
/// </summary>
public TimeSpan TimeLimitSuggestGlobal { get; set; } = TimeSpan.FromMilliseconds(250);

As I said, I've played quite a bit with the different time configurations available, changed nothing.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on May 31, 2024

It sounds like you are saying there is a timing issue and something in the first part of some algorithm is going too slow on some machines which prevents the following part of Suggest from returning results. If I got that right, this is starting to make some sense to me. To debug this, you could create a test case for your specific wods and dictionaries to see if you can find specifically which code in the codebase is returning results and which code is not being executed. Setting breakpoints on or around opLimiter usages might reveal which specific code is going slow for your specific words and dictionary.

var opLimiter = new OperationTimedLimiter(Options.TimeLimitSuggestGlobal, _query.CancellationToken);

from wecantspell.hunspell.

ADD-eNavarro avatar ADD-eNavarro commented on May 31, 2024

Let me explain a bit better.
In #40 (comment) you said that at some point of the code, the suggestion algorithm switches from the one it begins using (MapRelated) to NGram, which I call first and second algorithms, respectively.
Now, the issue is that, in some machines, even with a long TimeLimit, I'm getting the same results as when I use a TimeLimit of 1 (to force NGram use internally). The only point in common for those machines is the processor family, as stated.
I will try debugging opLimiter and let you know the results.

from wecantspell.hunspell.

aarondandy avatar aarondandy commented on May 31, 2024

@ADD-eNavarro , I made a new release that might fix your issue. Give it a try and let me know. I was previously using Environment.TickCount which wasn't really a great choice. This new release changes that and may behave differently.

https://github.com/aarondandy/WeCantSpell.Hunspell/releases/tag/5.0.0

from wecantspell.hunspell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.