Giter VIP home page Giter VIP logo

Comments (3)

jviiret avatar jviiret commented on May 30, 2024

This would be a fairly significant change. The Hyperscan compiler can return a lot of different error messages -- consider everything that can be produced by the parser, for example!

It may also require an ABI change, and we try to limit those as much as possible to allow for easy library upgrades.

We view pretty much any error result from compilation as fatal, and generally the only code we see that needs to retry compilation under any circumstances tends to be prefiltering code that tries for full Hyperscan support first, then switches on HS_FLAG_PREFILTER for a second attempt if that fails.

Can you describe your application and what you are trying to do as a result of "Pattern too large"? If you're able to share the patterns that you are seeing issues with, we would love to see them with a view to improving pattern support in general.

from hyperscan.

starius avatar starius commented on May 30, 2024

We view pretty much any error result from compilation as fatal, and generally the only code we see that needs to retry compilation under any circumstances tends to be prefiltering code that tries for full Hyperscan support first, then switches on HS_FLAG_PREFILTER for a second attempt if that fails.

I want to distinguish two cases:

  • pattern is incorrect. Print error message and abort the program.
  • Pattern is correct, but too large for Hyperscan. Pass it to other regex library which is slower but accepts the pattern.

Can you describe your application and what you are trying to do as a result of "Pattern too large"? If you're able to share the patterns that you are seeing issues with, we would love to see them with a view to improving pattern support in general.

I uploaded the pattern and the testing tool to gist. The pattern matches post address in Russia. Original pattern (written for other library) is stored in file original-pattern.txt. Then I applied pattern preprocessor which takes patterns written for that library and produces patterns for hyperscan. I saved the result to processed-pattern.txt. Hyperscan accepts first pattern, but it fails with "Pattern is too large" on preprocessed pattern. I have to preprocess regexps to achieve compatibility with other regexp library. Syntax and meaning of some character sets in that library differ from ones of Hyperscan, so the only reliable way to use hyperscan in place of that library is to preprocess a pattern. Results of preprocessing reproduce original behaviour precisely, but are sometimes too large for Hyperscan.

from hyperscan.

jviiret avatar jviiret commented on May 30, 2024

Thanks for the extra detail. (We will also take a look at this pattern and see what we can do to improve support for it.)

Hyperscan is able to generate a very large number of compile errors, and we would prefer not to lose this expressiveness by hardening them into enums.

Assuming the semantics of both libraries are the same (as you say), are you sure you need to distinguish these cases -- can't you just always fall back to your alternative library if a pattern fails compilation with Hyperscan?

One possibility is that you could call hs_expression_info() first, which parses the expression and does some preliminary analysis -- this will generate errors for expressions with invalid syntax without going through the full compile process.

from hyperscan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.