Comments (21)
See dpgeorge#9 (comment) for some previous discussion.
from micropython.
I like regex, and it's useful, so I think it's definitely worth trying to get it on the MCU. If we go for a cutdown version, then it should be a strict subset of Python's regex, not a variation.
from micropython.
as an FYI, the musl c library contains the tre regex code.
https://github.com/laurikari/tre
As well, it contains some bug fixes that were never fixed in the original code. (according to the wiki at musl)
from micropython.
Note that tre does not appear to have the following (which Python re does):
- look-ahead, or look-behind operators
- http://www.regular-expressions.info/lookaround.html
- these are on the list of upcoming features..
- conditionals
- named captures
- more than 9 matches possible
Its not a terrible set of limitiations - providing the other aspects match the same of course.
from micropython.
Note Python does not use PCRE. The default "re" has not been PCRE-based since at least 2.2, and the PCRE implementation of "re" was only included for (AFAIK undocumented) backwards compatibility until 2.3.
from micropython.
FYI, I started writing FFI bindings for libpcre in https://github.com/micropython/micropython-lib/tree/master/re-pcre , with the idea to have more or less complete re module implementation for unix port.
from micropython.
Not sure how useful it'll be, but you can look at
http://svn.python.org/projects/python/branches/release22-maint/Lib/pre.pyand
the pcre* files in
http://svn.python.org/projects/python/branches/release22-maint/Modules/ for
inspiration.
On Tue, Apr 8, 2014 at 11:20 PM, Paul Sokolovsky
[email protected]:
FYI, I started writing FFI bindings for libpcre in
https://github.com/micropython/micropython-lib/tree/master/re-pcre , with
the idea to have more or less complete re module implementation for unix
port.Reply to this email directly or view it on GitHubhttps://github.com//issues/13#issuecomment-39908960
.
John Lenton ([email protected]) ::: http://chipaca.com
from micropython.
re-pcre now has enough gear to run json module from CPy.
from micropython.
You mean, json module copied from CPy can run under uPy, by utilising re-pcre, right?
from micropython.
Yep, the only change required was this: micropython/micropython-lib@a4833ed (CPy stdlib is full of such dirtiness). Ah, disclaimer: can run on trivial input ;-)
from micropython.
Ok, I had another pass on this. Let's remind ourselves that regex matching can be implemented either as "recursive-descent" executor with backtracking (essentially, a Turing machine), or NFA/DFA (non-deterministic and deterministic finite automaton). The latter offers some performance guarantees, but harder to understand (and thus modify to add more features), and usually needs some bunch of memory to represent compiled regex, the former is easier to understand/modify and add non-regular features (with modern regex'es full of them).
Technically good contender for "backtracking" category: SLRE . Compiles to under 5K (x86, -Os). Problem spot: has "active" "license life" with several incarnations:
- http://slre.sourceforge.net/ - "THE BEER-WARE LICENSE""THE BEER-WARE LICENSE", in spirit compatible with MIT/BSD, but certainly not OSI-approved license
- https://code.google.com/p/slre/source/checkout - MIT
- https://github.com/cesanta/slre - GPL2
@dpgeorge, do you envision any probs adding MIT version to codebase? (This is certainly better than situation we had with "stm" port, which contained potentially proprietary code.)
from micropython.
Example of FA-based contender: http://swtch.com/plan9port/unix/ . Messy code from outer space. Support captures, which was big progress for FA impl as http://swtch.com/~rsc/regexp/regexp1.html says. Doesn't support non-greedy quantifers. Compiles to ~7K.
There're bunch of hacks which seem to add non-greedy's and stay unbloated (unlike TLE):
- https://code.google.com/p/re1/source/browse/
- https://github.com/lukesandberg/Regex
- https://github.com/MaskRay/Regex (supports non-greedy)
- https://github.com/openresty/sregex (many features, goes bloat way)
- https://github.com/copenhas/regex
from micropython.
Generally, grepping for "regex" for github can lead to few interesting, if practical, finds, e.g. https://github.com/fy0/tinyre "A tiny python style regex engine implement.", with nice Chinese readme and comments.
from micropython.
do you envision any probs adding MIT version to codebase?
No. There is no problem if it's MIT and we acknowledge the source of the code. Now with the extmod/ directory, we can land it there.
(Of course, I would love to write my own re engine, but that's not a good use of time :)
fy0/tinyre looks quite nice, and the license might even be MIT compatible.
from micropython.
FYI, I kinda coded a module for slre, but then next step is refactoring slre to have sane interface (re-refactoring, because it's used to have it such, but latest explicitly MIT-licensed version has screwed it up), optimize memory usage (it just allocates big statically-sized buffer for compiled regexp), etc, etc.
All that seemed kinda boring, so I made another pass over solutions mentioned above. I looked at tinyre, and it actually compiles to small enough code too, but all the Chinese comments doesn't help, and I saw funny programming practices like infinite loop in case of error. So, previous assessment holds: nobody known what exactly is done there and why.
Then I took a more detailed look at https://code.google.com/p/re1/ (I almost skipped it last time). And that's true gem - first of all, it's truly minimal code (but implements all important features like greedy vs non-greedy quantifiers and submatch captures), and then, it separates regexp compiler from execution backends, and then provides few alternative backends (essentially, tiny VMs). Backends size from 400 bytes on. Frontend is implemented using yacc though, so is big, but I already reimplemented parser in adhoc way.
There's issue that backtracking (== requiring least explicit runtime state for execution, but potentially lot of stack space) backends don't have protection against known backtracking killers like (a*)*
. I even tried to implement that protection, until I checked how slre deals with it, and it infinite-loops too. So, I guess with our aims, it should be good enough to have just switchable backends to deal with it.
from micropython.
Nice find with p/re1. I also would have reimplemented the parser.
from micropython.
Yeah, it's really nice, spotting VM executor byte sizes from 220 bytes (thumb2). Of course, when you start to add real-world features like NUL-cleanness and assertions, it diminish those impressive figures, but nonetheless, it should be possible to have regexp impl in 2K.
My current stuff is at https://github.com/pfalcon/re1.5 (yeah, already re-dubbed it as "re1.5" ;-) ). Things implemented: exact allocation of internal state, passing input string by length, "^" & "$" assertions. TODO before it'll be basically usable: options (like multiline, which changes semantics of "^/$") and char classes. Bottom line: @dpgeorge, if you wanted to implement regex engine (almost) from scratch, there's still an opportunity ;-D.
from micropython.
Ha, sounds like you are almost there! Are you trying to making it CPython compliant?
from micropython.
Depending on what you mean. To start talking about "CPython compliance", would need to implement named groups. And full "CPython compliance" is not achievable for finite-automaton based implementation (and re1's feature is completely replaceable backends, so FA featureset is a common bound).
from micropython.
Ok, "ure" module based on re1.5 as described above was merged (and actually went thru couple upgrade iterations adding more features).
from micropython.
Thanks @pfalcon, really great work!
from micropython.
Related Issues (20)
- os.rmdir() deletes files HOT 3
- network.hostname: max length of hostname HOT 1
- inconsistencies between os.uname, sys.implementation, platform.platform
- RP2: Hard I2C not working HOT 4
- docs: incorrect link on the web page Wiznet W5500-EVB-Pico HOT 3
- Add support for ESP32-C2
- wiznet: spi baudrate declared as 2 and 20 MHz ?! HOT 3
- network.WIZNET5 : link status; isconnected(): the documentation is incorrect?
- if uart receives 0xf0 byte then mcu after entering lightsleep never wakes up HOT 6
- RP2: HOT 4
- Build a esp32 firmware with version 1.22.2 meet a build issuse HOT 1
- PPP.active freezes MicroPython HOT 3
- PICO rp2.PIO(1),remove_program() makes all instruction memory available for python PIO and therefor allows overwrite of CYW43 PIO program
- esp8266, esp32: WiFi: clarification about PowerManagement attributes
- network: wiznet: send_ethernet
- network: wiznet: zeroconf HOT 2
- Please add 'v1.24.0-preview' Git tag for mpy-cross headers generating properly HOT 1
- ESP32: Unable to use SDCard and any other SPI device in the same bus HOT 2
- libhydrogen error when trying to build mboot with packing enabled
- Cryptolib throws ValueError HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from micropython.