iconnect / regex Goto Github PK
View Code? Open in Web Editor NEWregex: A Regular Expression Toolkit for regex-base
Home Page: http://regex.uk
License: Other
regex: A Regular Expression Toolkit for regex-base
Home Page: http://regex.uk
License: Other
The contents of the examples directory (i.e., the tutorial/tests/examples) should be split off into regex-examples leaving regex with just the dependencies needed for the library.
See #3
The Matches, Match and Capture types that are generated by the regex match operators, and passed into the replacement functions, keep the original matched text available so that:
Matches
, Match
, Capture
are always understood to have the original search text, containing within them the full context of the match, which is highly convenient in general, but in particular,This works very well in general for scripting applications that will in general be processing small-scale texts and for line-oriented applications that are matching (short) lines of text.
It may not work for applications that are processing large text files en-bloc.
To fix #3 we should probably be using special-purpose data types that may not be so convenient to use in general applications.
compileRegex
should take just a singe string argument — the text of the regex — with a variant (compileRegexWithOptions
) for specifying options.
See this reddit conversation .
There is no reason why sed' can't be completely generalised — except that we don't have a linesE
method for Replace
, which is easily fixable.
Only the tutorial has been written up.
Apart from using fast backends very little effort has been applied to making the package efficient on the grounds that:
we want to get it right before making it fast and
the primary motivation is to make RE-based scripting in Haskell more attractive and many of those applications typically aren't performance sensitive (as the filters in the package used to process the literate Haskell programmes and generate the API modules are not performance sensitive).
As the dude says, if you need high-performance filters you should probably be writing them by hand — at least until this this issue has been fixed!
The parser in Text.RE.TestBench is almost identical to the parser in Text.RE.Internal.NamedCaptures, which it should use.
This is being held up on a resolution of #2 (for now).
Aiming for the tests to be covering 90% of the code.
ghcjs generally wants native Haskell packages, so we will separate out the PCRE api into a separate regex-with-pcre package.
re-gen-cabal sdist
shpould commit the Hackage release tar archives
before generating the tags
add re-gen-cabal bump-version
(alias for re-prep bump-version
)
Not working :-(
The argument order of replaceAll
is inconsistent with replace
. replaceAll
is correct.
This is a follow-up to the recent re-organization.
We want to:
collect together the exports of the Tools
modules into a single RE.Text.Tools
module;
export the Parsers module from the TestBench module;
export the reSource
, compileRegex
, compileRegexWith
and escape
functions from the API modules instead of the RE modules.
As these technically break the API we need another minor version bump.
re-gen-modules-test is failing on Windows in-place testing (on AppVeyor) with:
re-gen-modules-test.exe: src/Text/RE/TDFA/ByteString/Lazy.hs: openBinaryFile: does not exist (No such file or directory)
Looks like Windows git does not support symbolic links.
This
replaceAll "${d}/${m}/${y}" $ src *=~ [re|${y}([0-9]{4})-${m}([0-9]{2})-${d}([0-9]{2})|]
could be accidentally written as
replaceAll "${d}/${m}/${y}" $ src =~ [re|${y}([0-9]{4})-${m}([0-9]{2})-${d}([0-9]{2})|]
and it would pass the type checker, but behave differently.
The named captures were designed to work with the new operators, which can easily preserve the capture names in the definite result type — not so easy in the case of =~
.
I can see three options for the 1.0.0.0 release:
leave everything as it is;
remove the old =~
and =~~
operators from the API;
fix them up so that when they yield Match
or Matches
results they preserve the capture names.
The question answers it self I think — we should do 3 of course.
Most of the pages are presentable but:
The About
page needs to say more about the rationale;
The Contact
page needs to say more about contributing to the project.
The Tutorial
page provide a little more context.
This is the home page/README for the web site, GitHub and Hackage -- it needs to be right!
It needs to be concise with links out to the relevant website pages.
Text.RE
module to include just abstracted Match
and Matches
;Text.RE.Summa
module for collecting together all of the assets that don't belong to the back ends;Text.RE.Types
collecting all of the Types modules.They are basically done but we need:
better presentation of the tutorial GHCI tryouts with the expected results shown explicitly;
add literate programme commentary for all of the .lhs
modules in the library and examples (see #8).
We should:
move all regex
types modules under Text.RE.Types
;
move Parsers
into TestBench.Parsers
;
move Edit
into Tools.edit
;
cut down what we export from Text.RE
:
do not export Options_
, only SimpleOptions
;
do not export Testbench
;
do not export the Tools
.
I think regex
can be made to avoid relying on TemplateHaskell
+QuasiQuotes
for recent GHC versions which provide the TemplateHaskellQuotes
extension, which would have the benefit that GHCs which don't have interpreter support would be able to compile regex
, and also the TemplateHaskellQuotes
extension is considered "safe" under SafeHaskell, whereas TemplateHaskell
is "unsafe".
There are currently 3 modules which rely on TemplateHaskell
,
Text/RE/Options.lhs
Text/RE/TDFA/RE.hs
Text/RE/Internal/NamedCaptures.lhs
The first two are trivial to make THQ-compatible; the 3rd one however makes use of heredocs
, thereby actually executes TH code:
import Text.Heredoc
scan :: String -> [Token]
scan = alex' match al oops
where
al :: [(Regex,Match String->Maybe Token)]
al =
[ mk [here|\$\{([^{}]+)\}\(|] $ ECap . Just . x_1
, mk [here|\$\(|] $ const $ ECap Nothing
, mk [here|\(\?:|] $ const PGrp
, mk [here|\(\?|] $ const PCap
, mk [here|\(|] $ const Bra
, mk [here|\\(.)|] $ BS . s2c . x_1
, mk [here|(.)|] $ Other . s2c . x_1
]
would it be possible to avoid using heredocs
and thus avoid having to execute TH code?
This contains TemplateHaskell code that hpc can't measure so it is skewing the coverage stats. This should be noted in a new section on the build-status page.
The current instructions in index.md
for loading the tutorial into ghci with cabal repl
are incorrect.
We need to fix those and add stack instructions.
The ed quasi quoter exported from Text.RE.TDFA.<t>
should be of type
SeachReplace RE <t>
not
IsRegex RE s => SearchReplace RE s
as it is the case at the moment.
The Text.RE.TDFA
and Text.RE.PCRE
are currently doing the right thing of course (which could require FlexibleContexts
but these modules are not recommended for simple usage).
We need some margins.
Each backend should provide a function that will 'escape' strings to produce REs that will match those strings.
I don't think this is provided by regex-base so we will have to add functions in Text.RE.TDFA.RE and Text.RE.PCRE.RE to do this.
(Thanks to @ezyang for the suggestion.)
Two problems:
we do not allow ${5}
or ${42}
to reference numbered captures;
we interpret $11
and $123
as captures.
Obvious fix:
captures can be referenced ${10}
, etc.
$11
to be interpreted as ${1}1
See PR #69
The sources are warning clean for GHC 7.8.4 — we should make it so the other versions of the compiler (currently 7.10 and 8.0).
"The 'RE' Type" section erroneously references Text.RE.TDFA.RE
.
When re-prep
bumps the version it checks that there is a changelog entry — it should also check that the milestone has been ticked off in the roadmap.
Curl is no longer available in the AppVeyor default environment, but we need it.
Any tests relying on external packages like tasty
and smallcheck
should not be placed in the library but in re-tests
so the library doesn't pick up false testing dependencies.
The AppVeyor badge is pointing to the wrong account — I only realise now that the build has started spontaneously failing (see #79).
Preparing to put regex into the stackage nightly build.
It would be great if we could add these but it will need some co-ordination with the regex-pcre maintainers, so it is going into the v2.0.0.0 milestone.
The default GitHub themes are too horrible — was hoping to use them as a stop-gap but they aren't fit for that even that (in the better candidates, the fork-me-on-GitHub device is far too loud).
We just need a template and stylesheet -- otherwise, it is plain GitHub pages.
Some elements in the API are not using standard Haskell conventions (camelCase, etc.) — they should.
The regexSource
method of Replace
needs to generate the text type of the class to be usable.
The reverse operations for compiling into the RE would also be useful.
It is failing with:
Text/RE/TDFA/ByteString/Lazy.hs: openBinaryFile: does not exist (No such file or directory)
To work from a Hackage tarball (as distinct from a cloned repository) the src modules must locate the Haskell source modules under src
.
The Haddocks are lacking introductory material and probably basic documentation in places.
Coveralls was broken by 751198d, the last line of .travis.yml
needing to be updated with the new targets.
The library was included with regex-examples — it should be removed.
Unfortunately, the badge that we include in the Cabal tarball will necessarily be using an outdated SVG for the Hackage button. We need to generate our own SVG.
May as well setup the version in the file while we are at it.
Adam Bergmark asked on Haskell Cafe:
Have you considered doing anything fancy to make capture groups safer to use? If i could get a compile error when i'm using the wrong number/wrongly named groups I'd be very excited.
We want to do this in the same way it was done for the PCRE macros -- by using (?:
... )
for grouping. For this regex-tdfa
will have to be extended to support pure grouping.
Some types are probably best renamed to make it clear they belong to regex:
and the Replace
should have the E
suffix replaced with an R
.
Evan Laforge expressed concern in the Haskell cafe about worried about 'any deviation from "standard" PCRE'. Of course anyone can just decline to use the non-standard construct, so that leaves us with:
a way of disabling the non-standard extensions to ensure they don't creep
into a code base (which seems a bit OTT);
ensuring that they don't interfere with any PCRE RE notation.
My understanding is that regex named captures will not interfere with any PCRE extensions, but it would be nice to get a second opinion.
Using regex for the scanners is fine on prototyping principles but we should review them with a view to rewriting in Alex.
(The tutorial collects all of these examples together.)
Travis should test the Hackage release tarball when it corresponds to the current version of regex.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.