Giter VIP home page Giter VIP logo

Comments (10)

creachadair avatar creachadair commented on July 17, 2024

If the finalizer mechanism isn't sufficient to avoid memory pressure, then the API is incompatible (despite superficial similarity) and shouldn't be treated as a drop-in replacement IMO.

from enry.

kuba-- avatar kuba-- commented on July 17, 2024

you mean incompatible with go's std. library?
This is go binding for c implementation of oniguruma. For every c implementation you have to manually free memory and finalizer in bindings is just a wish. Native go implementation doesn't have this problem.

from enry.

creachadair avatar creachadair commented on July 17, 2024

This is go binding for c implementation of oniguruma. For every c implementation you have to manually free memory and finalizer in bindings is just a wish. Native go implementation doesn't have this problem.

Yes, agreed. My point is that one cannot replace a use of (say) the Go regexp package with Oniguruma C bindings without re-working the control flow of the library that depends on it, if the integration with the GC is not sufficient to manage memory. In particular, if the library demands manual release of allocated memory, it's not sufficient to update the function calls and types, we also need to change the way we track and plumb regexp matching types through the code that uses it.

Which is mostly to say that fixing this may require more changes than just plugging in free calls. The point of using Oniguruma in the first place was supposed to be performance, but that relies in large part on the ability to cache reused expressions.

from enry.

kuba-- avatar kuba-- commented on July 17, 2024

So what do you suggest?

  • get rid of oniguruma and just stick to go's std. library
  • keep oniguruma and just rely on finalizers
  • add free/close/dispose function (next to MustCompile) to enry's API

from enry.

creachadair avatar creachadair commented on July 17, 2024

So what do you suggest?

  • get rid of oniguruma and just stick to go's std. library
  • keep oniguruma and just rely on finalizers
  • add free/close/dispose function (next to MustCompile) to enry's API

Any of those seems viable. My inclination would be to rely on finalizers and not worry about it, but the main issue raises the concern that this isn't sufficient. If we aren't having memory leaks or other runtime issues because of this, I think it should be sufficient to trust the finalizer; otherwise we should do something different.

Adding a free function isn't enough: Everywhere a wrapped object is allocated we would have to match it with a release, which means adding an explicit ownership path through the API. The obvious way to do that is to release eagerly whenever a value goes out of scope. I predict that would result in a performance degradation, since we'd have to recompile the same expressions repeatedly.

I'm not sure why we switched to Oniguruma in the first place, but I suspect it's either a matter of syntax (since regexp supports only RE2, which doesn't handle common PCRE-style extensions) or of performance. If it's for syntax we might be stuck; if it's for performance I think we can do better without needing a C binding.

from enry.

vmarkovtsev avatar vmarkovtsev commented on July 17, 2024

The author of the original oniguruma port here. The goal was to get a performance boost, which appeared to be great - 2x to 4x as far as I remember. I picked it because somebody mentioned it would be fun to try, and it was fun for me to write some code. Nobody knew we would have annoying memory problems beforehand.

oniguruma is a much faster regexp engine, and knowing the nature of the benchmarks/real usage, I doubt that any regexp recompilation will mitigate the benefit. I bet that it will be negligible.

from enry.

kuba-- avatar kuba-- commented on July 17, 2024

So far, we didn't notice memory leaks (that's why I emphasized potential). Mysql has a slightly different use case. For cached pre-compiled matchers we rely on finalizers, otherwise we call dispose. So, I'm ok to keep what we have in enry.
Regarding switch to oniguruma, as far as I know the main reason was performance (oniguruma is at least 4x faster). Also for mysql oniguruma is a default regex engine because of go's regex is kind of special, but in enry it doesn't have to be a case.

from enry.

bzz avatar bzz commented on July 17, 2024

I would advise not to add complexity to this common library until this is proven necessary for all downstream use cases. Personally, so far I have not seen:

if the integration with the GC is not sufficient to manage memory

this to be the case yet and esp. because, as Michale noted above, that would most probably require API surface changes.

What makes me particularly reluctant is

So far, we didn't notice memory leaks (that's why I emphasized potential)

On the performance

The goal was to get a performance boost, which appeared to be great - 2x to 4x as far as I remember

Although that is true for general regexp matching, I would really love to have some data points first to support that claim specifically on enry performance e.g as a part of benchmark suite (#210 that could be extended to potentially spotlight any memory leaks)

from enry.

kuba-- avatar kuba-- commented on July 17, 2024

OK, so I'm closing this issue.

from enry.

abdennour avatar abdennour commented on July 17, 2024

performance must be considered seriously here.
It's really not a joke, namely when i am designed a centralized pipeline template that all teams use it.

Just imagine the miserable pipeline system, one guy push a huge git repo !!

from enry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.