Giter VIP home page Giter VIP logo

Comments (21)

pfalcon avatar pfalcon commented on July 3, 2024

See dpgeorge#9 (comment) for some previous discussion.

from micropython.

dpgeorge avatar dpgeorge commented on July 3, 2024

I like regex, and it's useful, so I think it's definitely worth trying to get it on the MCU. If we go for a cutdown version, then it should be a strict subset of Python's regex, not a variation.

from micropython.

hagenkaye avatar hagenkaye commented on July 3, 2024

as an FYI, the musl c library contains the tre regex code.

https://github.com/laurikari/tre

As well, it contains some bug fixes that were never fixed in the original code. (according to the wiki at musl)

from micropython.

Neon22 avatar Neon22 commented on July 3, 2024

Note that tre does not appear to have the following (which Python re does):

Its not a terrible set of limitiations - providing the other aspects match the same of course.

from micropython.

chipaca avatar chipaca commented on July 3, 2024

Note Python does not use PCRE. The default "re" has not been PCRE-based since at least 2.2, and the PCRE implementation of "re" was only included for (AFAIK undocumented) backwards compatibility until 2.3.

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

FYI, I started writing FFI bindings for libpcre in https://github.com/micropython/micropython-lib/tree/master/re-pcre , with the idea to have more or less complete re module implementation for unix port.

from micropython.

chipaca avatar chipaca commented on July 3, 2024

Not sure how useful it'll be, but you can look at
http://svn.python.org/projects/python/branches/release22-maint/Lib/pre.pyand
the pcre* files in
http://svn.python.org/projects/python/branches/release22-maint/Modules/ for
inspiration.

On Tue, Apr 8, 2014 at 11:20 PM, Paul Sokolovsky
[email protected]:

FYI, I started writing FFI bindings for libpcre in
https://github.com/micropython/micropython-lib/tree/master/re-pcre , with
the idea to have more or less complete re module implementation for unix
port.

Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-39908960
.

John Lenton ([email protected]) ::: http://chipaca.com

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

re-pcre now has enough gear to run json module from CPy.

from micropython.

dpgeorge avatar dpgeorge commented on July 3, 2024

You mean, json module copied from CPy can run under uPy, by utilising re-pcre, right?

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

Yep, the only change required was this: micropython/micropython-lib@a4833ed (CPy stdlib is full of such dirtiness). Ah, disclaimer: can run on trivial input ;-)

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

Ok, I had another pass on this. Let's remind ourselves that regex matching can be implemented either as "recursive-descent" executor with backtracking (essentially, a Turing machine), or NFA/DFA (non-deterministic and deterministic finite automaton). The latter offers some performance guarantees, but harder to understand (and thus modify to add more features), and usually needs some bunch of memory to represent compiled regex, the former is easier to understand/modify and add non-regular features (with modern regex'es full of them).

Technically good contender for "backtracking" category: SLRE . Compiles to under 5K (x86, -Os). Problem spot: has "active" "license life" with several incarnations:

@dpgeorge, do you envision any probs adding MIT version to codebase? (This is certainly better than situation we had with "stm" port, which contained potentially proprietary code.)

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

Example of FA-based contender: http://swtch.com/plan9port/unix/ . Messy code from outer space. Support captures, which was big progress for FA impl as http://swtch.com/~rsc/regexp/regexp1.html says. Doesn't support non-greedy quantifers. Compiles to ~7K.

There're bunch of hacks which seem to add non-greedy's and stay unbloated (unlike TLE):

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

Generally, grepping for "regex" for github can lead to few interesting, if practical, finds, e.g. https://github.com/fy0/tinyre "A tiny python style regex engine implement.", with nice Chinese readme and comments.

from micropython.

dpgeorge avatar dpgeorge commented on July 3, 2024

do you envision any probs adding MIT version to codebase?

No. There is no problem if it's MIT and we acknowledge the source of the code. Now with the extmod/ directory, we can land it there.

(Of course, I would love to write my own re engine, but that's not a good use of time :)

fy0/tinyre looks quite nice, and the license might even be MIT compatible.

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

FYI, I kinda coded a module for slre, but then next step is refactoring slre to have sane interface (re-refactoring, because it's used to have it such, but latest explicitly MIT-licensed version has screwed it up), optimize memory usage (it just allocates big statically-sized buffer for compiled regexp), etc, etc.

All that seemed kinda boring, so I made another pass over solutions mentioned above. I looked at tinyre, and it actually compiles to small enough code too, but all the Chinese comments doesn't help, and I saw funny programming practices like infinite loop in case of error. So, previous assessment holds: nobody known what exactly is done there and why.

Then I took a more detailed look at https://code.google.com/p/re1/ (I almost skipped it last time). And that's true gem - first of all, it's truly minimal code (but implements all important features like greedy vs non-greedy quantifiers and submatch captures), and then, it separates regexp compiler from execution backends, and then provides few alternative backends (essentially, tiny VMs). Backends size from 400 bytes on. Frontend is implemented using yacc though, so is big, but I already reimplemented parser in adhoc way.

There's issue that backtracking (== requiring least explicit runtime state for execution, but potentially lot of stack space) backends don't have protection against known backtracking killers like (a*)*. I even tried to implement that protection, until I checked how slre deals with it, and it infinite-loops too. So, I guess with our aims, it should be good enough to have just switchable backends to deal with it.

from micropython.

dpgeorge avatar dpgeorge commented on July 3, 2024

Nice find with p/re1. I also would have reimplemented the parser.

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

Yeah, it's really nice, spotting VM executor byte sizes from 220 bytes (thumb2). Of course, when you start to add real-world features like NUL-cleanness and assertions, it diminish those impressive figures, but nonetheless, it should be possible to have regexp impl in 2K.

My current stuff is at https://github.com/pfalcon/re1.5 (yeah, already re-dubbed it as "re1.5" ;-) ). Things implemented: exact allocation of internal state, passing input string by length, "^" & "$" assertions. TODO before it'll be basically usable: options (like multiline, which changes semantics of "^/$") and char classes. Bottom line: @dpgeorge, if you wanted to implement regex engine (almost) from scratch, there's still an opportunity ;-D.

from micropython.

dpgeorge avatar dpgeorge commented on July 3, 2024

Ha, sounds like you are almost there! Are you trying to making it CPython compliant?

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

Depending on what you mean. To start talking about "CPython compliance", would need to implement named groups. And full "CPython compliance" is not achievable for finite-automaton based implementation (and re1's feature is completely replaceable backends, so FA featureset is a common bound).

from micropython.

pfalcon avatar pfalcon commented on July 3, 2024

Ok, "ure" module based on re1.5 as described above was merged (and actually went thru couple upgrade iterations adding more features).

from micropython.

dpgeorge avatar dpgeorge commented on July 3, 2024

Thanks @pfalcon, really great work!

from micropython.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.