Giter VIP home page Giter VIP logo

ocaml-re's Introduction

Description

Re is a regular expression library for OCaml. Build status

Contact

This library has been written by Jerome Vouillon ([email protected]). It can be downloaded from https://github.com/ocaml/ocaml-re

Bug reports, suggestions and contributions are welcome.

Features

The following styles of regular expressions are supported:

  • Perl-style regular expressions (module Re.Perl);
  • Posix extended regular expressions (module Re.Posix);
  • Emacs-style regular expressions (module Re.Emacs);
  • Shell-style file globbing (module Re.Glob).

It is also possible to build regular expressions by combining simpler regular expressions (module Re).

The most notable missing features are back-references and look-ahead/look-behind assertions.

There is also a subset of the PCRE interface available in the Re.Pcre module. This makes it easier to port code from that library to Re with minimal changes.

Performances

The matches are performed by lazily building a DFA (deterministic finite automaton) from the regular expression. As a consequence, matching takes linear time in the length of the matched string.

The compilation of patterns is slower than with libraries using back-tracking, such as PCRE. But, once a large enough part of the DFA is built, matching is extremely fast.

Of course, for some combinations of regular expression and string, the part of the DFA that needs to be build is so large that this point is never reached, and matching will be slow. This is not expected to happen often in practice, and actually a lot of expressions that behaves badly with a backtracking implementation are very efficient with this implementation.

The library is at the moment entirely written in OCaml. As a consequence, regular expression matching is much slower when the library is compiled to bytecode than when it is compiled to native code.

Here are some timing results (Pentium III 500Mhz):

  • Scanning a 1Mb string containing only as, except for the last character which is a b, searching for the pattern aa?b (repeated 100 times):

    • RE: 2.6s
    • PCRE: 68s
  • Regular expression example from http://www.bagley.org/~doug/shootout/ [1]

    • RE: 0.43s
    • PCRE: 3.68s

    [1] this page is no longer up but is available via the Internet Archive http://web.archive.org/web/20010429190941/http://www.bagley.org/~doug/shootout/bench/regexmatch/

  • The large regular expression (about 2000 characters long) that Unison uses with my preference file to decide whether a file should be ignored or not. This expression is matched against a filename about 20000 times.

    • RE: 0.31s
    • PCRE: 3.7s However, RE is only faster than PCRE when there are more than about 300 filenames.

ocaml-re's People

Contributors

ashinefoster avatar avsm avatar bbatsov avatar bcc32 avatar bfops avatar bradlangel avatar c-cube avatar chetmurthy avatar drup avatar dsheets avatar dwang20151005 avatar edwintorok avatar ewert-online avatar glondu avatar hhugo avatar idkjs avatar julow avatar kit-ty-kate avatar kmh11 avatar lindig avatar nbraud avatar nojb avatar obscurans avatar paurkedal avatar rgrinberg avatar samoht avatar v-gb avatar vbmithr avatar voodoos avatar vouillon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocaml-re's Issues

Installation fails whenever ocamldoc is absent

Some package managers make ocamldoc an optional package (See the centos rpm's for example). On those systems, ocaml-re will fail to install:

=-=- Processing actions -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
∗  installed base-bytes.base
[ERROR] The compilation of re failed at "ocaml setup.ml -configure --prefix /home/ec2-user/.opam/system".
Processing  2/2: [re: ocamlfind remove]
#=== ERROR while installing re.1.5.0 ==========================================#
# opam-version 1.2.2
# os           linux
# command      ocaml setup.ml -configure --prefix /home/ec2-user/.opam/system
# path         /home/ec2-user/.opam/system/build/re.1.5.0
# compiler     system (4.02.3)
# exit-code    1
# env-file     /home/ec2-user/.opam/system/build/re.1.5.0/re-20024-d2c37b.env
# stdout-file  /home/ec2-user/.opam/system/build/re.1.5.0/re-20024-d2c37b.out
# stderr-file  /home/ec2-user/.opam/system/build/re.1.5.0/re-20024-d2c37b.err
### stderr ###
# W: Field 'ocamldoc' is not set
# E: Cannot find external tool 'ocamldoc'
# E: Failure("1 configuration error")

There's no need to fail on these.

Canno't compile/install using OPAM 1.2.1

Hello, I am trying to install Core using OPAM, but it seems there is an issue with ocaml-re.
This is the stack trace installing the library:

  % opam install core                                                                   !567
The following actions will be performed:
  ∗  install re          1.7.1                [required by ppx_expect]
  ∗  install ppx_expect  v0.10.0              [required by ppx_jane]
  ∗  install ppx_jane    v0.10.0              [required by core]
  ∗  install core_kernel v0.10.0              [required by core]
  ∗  install core        v0.10.0
===== ∗  5 =====
Do you want to continue ? [Y/n] y

=-=- Gathering sources =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[core] Archive in cache
[core_kernel] Archive in cache
[ppx_expect] Archive in cache
[ppx_jane] Archive in cache
[re] Archive in cache

=-=- Processing actions -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
[ERROR] The compilation of re failed at "ocaml setup.ml -build".
Processing  1/5: [re: ocamlfind remove]
#=== ERROR while installing re.1.7.1 ==========================================#
# opam-version 1.2.2
# os           linux
# command      ocaml setup.ml -build
# path         /home/k0pernicus/.opam/system/build/re.1.7.1
# compiler     system (4.05.0)
# exit-code    1
# env-file     /home/k0pernicus/.opam/system/build/re.1.7.1/re-439-d8a4ae.env
# stdout-file  /home/k0pernicus/.opam/system/build/re.1.7.1/re-439-d8a4ae.out
# stderr-file  /home/k0pernicus/.opam/system/build/re.1.7.1/re-439-d8a4ae.err
### stdout ###
# Warning: Won't be able to compile a native plugin
# Failure: Cannot find "ocamlbuild.cmo" in ocamlbuild -where directory.
### stderr ###
# E: Failure("Command ''/home/k0pernicus/.opam/system/bin/ocamlbuild' lib/re.cma lib/re.cmxa lib/re.a lib/re.cmxs lib/re_emacs.cma lib/re_emacs.cmxa lib/re_emacs.a lib/re_emacs.cmxs lib/re_str.cma lib/re_str.cmxa lib/re_str.a lib/re_str.cmxs lib/re_posix.cma lib/re_posix.cmxa lib/re_posix.a lib/re_posix.cmxs lib/re_glob.cma lib/re_glob.cmxa lib/re_glob.a lib/re_glob.cmxs lib/re_perl.cma lib/re_perl.cmxa lib/re_perl.a lib/re_perl.cmxs lib/re_pcre.cma lib/re_pcre.cmxa lib/re_pcre.a lib/re_pcre.cmxs -cflags '-w +a-40-42-44-3-4-48 -warn-error +1..49' -tag debug' terminated with error code 2")



=-=- Error report -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The following actions were aborted
  ∗  install core        v0.10.0
  ∗  install core_kernel v0.10.0
  ∗  install ppx_expect  v0.10.0
  ∗  install ppx_jane    v0.10.0
The following actions failed
  ∗  install re 1.7.1
No changes have been performed

I tried to search what is the problem with Failure: Cannot find "ocamlbuild.cmo" in ocamlbuild -where directory., but didn't found anything relevant on the internet...

Negation in Regular Expressions

Can we have support for negation in Ocaml_re regular expressions?

To be clear, I am looking for some way to negate a sub-expression -- not just the negated character class [^..]. I tried ! and ?! and they did not seem to work.

Roshan

Substring matching + lookaround weird interaction

Substring matches are able to "peek" outside of substrings when lookaround assertions are start/begin of the re. Here's an example of what I mean:

utop[19]> let re = Re.(compile @@ seq [bol ; str "xxx"]);;
val re : Re.re = <abstr>
utop[20]> Re.execp ~pos:1 ~len:3 re "yxxx";;
- : bool = false
utop[21]> Re.execp ~pos:1 ~len:3 re "\nxxx";;
- : bool = true

This means that there's no way for the user to independently match a substring. Not really broken but counter intuitive IMHO.

cc @Drup

1.3.x and 1.4.x aren't compatible with OCaml < 4.01 due to tests

Starting in 1.3.0, the opam file declared build-test which is incompatible with OCaml < 4.01 due to use of @@ and |> in File "lib_test/test_re.ml", line 28, characters 34-36 and File "lib_test/test_pcre.ml", line 12, characters 41-43, respectively.

Could these versions be marked as 4.01+ only or patched? If they aren't patched, should they be marked as 4.01+ only and should a new version (1.4.1) be released with which it is possible to run the tests on OCaml < 4.01?

I don't have a great love for 4.00.1 but it's annoying to have my tests fail because re's tests are using some simple operators.

Re.execp apparently allocating for no reason

Hello,

Re.execp is implemented by calling exec_internal, which passes true to the argument group of match_str. In fact, that argument is always true. Why not pass false to avoid allocating a 10 element array all the time?

`compl` deemed dangerous

The interface says

val compl : t list -> t
(** Complement of union *)

but compl cannot be applied to any list of regexpes... and, more problematically, it fails at runtime! I think that this ought to be fixed—maybe adding some phantom type to t or changing compl arguments.

Unable to use zero-width assertions

let zero = Re_perl.re "0(?=a)";;  
Exception: Re_perl.Parse_error.

let zero = Re_posix.re "0(?=a)";;
Exception: Re_posix.Parse_error.

It need to be used in my project, what should be done to solve the problem?

use Bytes

many deprecation warnings right now. Depending on base-bytes as provided by ocamlfind should be ok, right?

[Feature Request] Unicode support

At a glance, this whole library seems like a very well-thought piece of software (limited scope, defined solution). Unfortunately, it does not support unicode right now. But unicode should be the standard in this millenium. So here is my proposal: Instead of using chars and strings exclusively, abstract the library over the concrete code-point and input representations. Then someone (me) could simply extend the library by providing a suitable unicode support. I understand that this kind of abstraction might yield some performance regressions, but it would yield a whole batch of new usecases.

Re_str.global_replace is buggy

Repro:

open Printf ;;

let test regexp repl text =
  let text_str =
    Str.global_replace (Str.regexp regexp) repl text
  in
  let text_re =
    Re_str.global_replace (Re_str.regexp regexp) repl text
  in
  printf "str:%s\n" text_str ;
  printf "re_:%s\n" text_re ;
;;

let _ =
  test "\\(X+\\)" "A\\1YY" "XXXXXXZZZZ" ;
;;

Output:

str:AXXXXXXYYZZZZ
re_:AXXXXXXXXZZZZ

It's quite easy to crash it too, by extending the replacement.

Incorrect FSF address in some files

Please correct FSF address in next files:
ocaml-re/lib/automata.ml
ocaml-re/lib/cset.ml
ocaml-re/lib/re_glob.ml
ocaml-re/lib/re_perl.ml
ocaml-re/lib/re.ml

And in other if it incorrect too.

latest release (1.7.1) test code is not compatible with -safe-string

I'm not sure whether this is already fixed in master, if so, please draft a new release :)

see the log at https://api.travis-ci.org/v3/job/344786629/log.txt

+ /home/travis/.opam/4.06.0/bin/ocamlfind ocamlc -c -w +a-40-42-44-3-4-48 -warn-error +1..49 -g -annot -bin-annot -I lib -I lib_test -package bytes -package oUnit -package str -I lib_test -I lib -o lib_test/re_match.cmo lib_test/re_match.ml
File "lib_test/re_match.ml", line 44, characters 25-28:
Error: This expression has type string but an expression was expected of type
         bytes
Command exited with code 2.

for current 4.06.0 compatible releases, I removed the build-test rules in ocaml/opam-repository#11477

Port tests from fort to ounit

This fort framework doesn't seem to be particularly well maintained if it's still not on OPAM. It's best to transition to just transition to oUnit.

Should we just port all the tests or create a little compatibility layer with the current tests we have?

Please clarify LICENSE

_oasis has: "License: LGPL-2.0 with OCaml linking exception"
But the individual source files and LICENSE have just the LGPL license, without the exception.
If it is really meant to have the exception could you update the LICENSE file, and the individual files? (assuming the authors will consent to that)

opam installation failing on docs

i get this output from opam

=== ERROR while installing re.1.5.0 ==========================================#
# opam-version 1.2.2
# os           darwin
# command      ocaml setup.ml -doc
# path         /Users/nbecker/.opam/4.02.3/build/re.1.5.0
# compiler     4.02.3
# exit-code    1
# env-file     /Users/nbecker/.opam/4.02.3/build/re.1.5.0/re-10285-6d07ae.env
# stdout-file  /Users/nbecker/.opam/4.02.3/build/re.1.5.0/re-10285-6d07ae.out
# stderr-file  /Users/nbecker/.opam/4.02.3/build/re.1.5.0/re-10285-6d07ae.err
### stdout ###
# Implements the semantics of shells patterns. The returned regular
# [...]
#
#     If [expand_braces] is true, braced sets will expand into multiple globs,
#     e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
#     expansion is purely textual and can be nested. Defaults to false.
# line 23, character 18:
#     e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
#                   ^
#2 error(s) encountered
# Command exited with code 1.
### stderr ###
# E: Failure("Command ''/Users/nbecker/.opam/4.02.3/bin/ocamlbuild' ./lib/re-api.docdir/index.html -tag debug' terminated with error code 10")

the contents of the .out file are as follows

/Users/nbecker/.opam/4.02.3/bin/ocamlfind ocamldoc -dump lib/re.odoc -I lib -I lib -I lib -package bytes -I lib lib/re.mli
+ /Users/nbecker/.opam/4.02.3/bin/ocamlfind ocamldoc -dump lib/re.odoc -I lib -I lib -I lib -package bytes -I lib lib/re.mli
Warning: Module type Set.S not found
Warning: Module type Set.S not found
Warning: Module type Set.S not found
Warning: Element exec_p not found
/Users/nbecker/.opam/4.02.3/bin/ocamlfind ocamldoc -dump lib/re_emacs.odoc -I lib -I lib -I lib -package bytes -I lib lib/re_emacs.mli
/Users/nbecker/.opam/4.02.3/bin/ocamlfind ocamldoc -dump lib/re_glob.odoc -I lib -I lib -I lib -package bytes -I lib lib/re_glob.mli
+ /Users/nbecker/.opam/4.02.3/bin/ocamlfind ocamldoc -dump lib/re_glob.odoc -I lib -I lib -I lib -package bytes -I lib lib/re_glob.mli
/Users/nbecker/.opam/4.02.3/build/re.1.5.0/_build/lib/re_glob.mli : Syntax error in text:
Implements the semantics of shells patterns. The returned regular
    expression is unanchored by default.

    Character '*' matches any sequence of characters and character
    '?' matches a single character.
    A sequence '[...]' matches any one of the enclosed characters.
    A sequence '[^...]' or '[!...]' matches any character *but* the enclosed characters.
    A backslash escapes the following character.  The last character of the string cannot
    be a backslash.

    [anchored] controls whether the regular expression will only match entire
    strings. Defaults to false.

    [pathname]: If this flag is set, match a slash in string only with a slash in pattern
    and not by an asterisk ('*') or a question mark ('?') metacharacter, nor by a bracket
    expression ('[]') containing a slash. Defaults to true.

    [period]: If this flag is set, a leading period in string has to be matched exactly by
    a period in pattern. A period is considered to be leading if it is the first
    character in string, or if both [pathname] is set and the period immediately follows a
    slash. Defaults to true.

    If [expand_braces] is true, braced sets will expand into multiple globs,
    e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
    expansion is purely textual and can be nested. Defaults to false.
line 23, character 18:
    e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
                  ^
/Users/nbecker/.opam/4.02.3/build/re.1.5.0/_build/lib/re_glob.mli : Syntax error in text:
Implements the semantics of shells patterns. The returned regular
    expression is unanchored by default.

    Character '*' matches any sequence of characters and character
    '?' matches a single character.
    A sequence '[...]' matches any one of the enclosed characters.
    A sequence '[^...]' or '[!...]' matches any character *but* the enclosed characters.
    A backslash escapes the following character.  The last character of the string cannot
    be a backslash.

    [anchored] controls whether the regular expression will only match entire
    strings. Defaults to false.

    [pathname]: If this flag is set, match a slash in string only with a slash in pattern
    and not by an asterisk ('*') or a question mark ('?') metacharacter, nor by a bracket
    expression ('[]') containing a slash. Defaults to true.

    [period]: If this flag is set, a leading period in string has to be matched exactly by
    a period in pattern. A period is considered to be leading if it is the first
    character in string, or if both [pathname] is set and the period immediately follows a
    slash. Defaults to true.

    If [expand_braces] is true, braced sets will expand into multiple globs,
    e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
    expansion is purely textual and can be nested. Defaults to false.
line 23, character 18:
    e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
                  ^
2 error(s) encountered
Command exited with code 1.

Streaming Re

This is a (set of) notes after a discussion with @vouillon on how to make re able to stream.

  • We should move pos and last out of the info record and pass them around explicitly, in particular in loop. Important: check spilling in the loop function.
  • Partial would give an abstract type partial containing
    • an Re.state
    • a buffer of some sort
    • the current position in the buffer
  • We would expose two functions:
    • Some function adding some new content to the buffer.
    • Some function taking partial and starting the matching again. This would be implemented using the loop function to match more things and then the Re_automaton.status function.

It should also be possible to say "The streaming is finished, you can match eol/eos/stop".

There are delicate questions of content copying when initializing and refilling the buffer. In particular, copying the matched string to initialize the buffer is clearly not acceptable.

Problems building Re with MSVC toolchain under Windows

Dear all,

when building Re with OCaml 4.02.2 under Windows (MSVC toolchain used under Cygwin), the first build attempt fails because a file "setup.obj" is created and not removed afterwards. I guess, this is due to the *.obj extension produced by the MSVC compiler.

If one removes "setup.obj" after the build failed and starts again, the compile succeeds.
Installing with "make install" succeeds in installing the library in findlib, but fails afterwards with the following message:

$ make install
./setup.exe -install
Installed C:\ocamlms\lib\site-lib\re\re.mli
Installed C:\ocamlms\lib\site-lib\re\re.cma
Installed C:\ocamlms\lib\site-lib\re\re.cmxa
Installed C:\ocamlms\lib\site-lib\re\re.lib
Installed C:\ocamlms\lib\site-lib\re\re.cmxs
Installed C:\ocamlms\lib\site-lib\re\re.cmi
Installed C:\ocamlms\lib\site-lib\re\re.cmti
Installed C:\ocamlms\lib\site-lib\re\re.cmt
Installed C:\ocamlms\lib\site-lib\re\re.annot
Installed C:\ocamlms\lib\site-lib\re\re_automata.cmx
Installed C:\ocamlms\lib\site-lib\re\re_cset.cmx
Installed C:\ocamlms\lib\site-lib\re\re.cmx
Installed C:\ocamlms\lib\site-lib\re\re_str.mli
Installed C:\ocamlms\lib\site-lib\re\re_str.cma
Installed C:\ocamlms\lib\site-lib\re\re_str.cmxa
Installed C:\ocamlms\lib\site-lib\re\re_str.lib
Installed C:\ocamlms\lib\site-lib\re\re_str.cmxs
Installed C:\ocamlms\lib\site-lib\re\re_str.cmi
Installed C:\ocamlms\lib\site-lib\re\re_str.cmti
Installed C:\ocamlms\lib\site-lib\re\re_str.cmt
Installed C:\ocamlms\lib\site-lib\re\re_str.annot
Installed C:\ocamlms\lib\site-lib\re\re_str.cmx
Installed C:\ocamlms\lib\site-lib\re\re_posix.mli
Installed C:\ocamlms\lib\site-lib\re\re_posix.cma
Installed C:\ocamlms\lib\site-lib\re\re_posix.cmxa
Installed C:\ocamlms\lib\site-lib\re\re_posix.lib
Installed C:\ocamlms\lib\site-lib\re\re_posix.cmxs
Installed C:\ocamlms\lib\site-lib\re\re_posix.cmi
Installed C:\ocamlms\lib\site-lib\re\re_posix.cmti
Installed C:\ocamlms\lib\site-lib\re\re_posix.cmt
Installed C:\ocamlms\lib\site-lib\re\re_posix.annot
Installed C:\ocamlms\lib\site-lib\re\re_posix.cmx
Installed C:\ocamlms\lib\site-lib\re\re_perl.mli
Installed C:\ocamlms\lib\site-lib\re\re_perl.cma
Installed C:\ocamlms\lib\site-lib\re\re_perl.cmxa
Installed C:\ocamlms\lib\site-lib\re\re_perl.lib
Installed C:\ocamlms\lib\site-lib\re\re_perl.cmxs
Installed C:\ocamlms\lib\site-lib\re\re_perl.cmi
Installed C:\ocamlms\lib\site-lib\re\re_perl.cmti
Installed C:\ocamlms\lib\site-lib\re\re_perl.cmt
Installed C:\ocamlms\lib\site-lib\re\re_perl.annot
Installed C:\ocamlms\lib\site-lib\re\re_perl.cmx
Installed C:\ocamlms\lib\site-lib\re\re_pcre.mli
Installed C:\ocamlms\lib\site-lib\re\re_pcre.cma
Installed C:\ocamlms\lib\site-lib\re\re_pcre.cmxa
Installed C:\ocamlms\lib\site-lib\re\re_pcre.lib
Installed C:\ocamlms\lib\site-lib\re\re_pcre.cmxs
Installed C:\ocamlms\lib\site-lib\re\re_pcre.cmi
Installed C:\ocamlms\lib\site-lib\re\re_pcre.cmti
Installed C:\ocamlms\lib\site-lib\re\re_pcre.cmt
Installed C:\ocamlms\lib\site-lib\re\re_pcre.annot
Installed C:\ocamlms\lib\site-lib\re\re_pcre.cmx
Installed C:\ocamlms\lib\site-lib\re\re_glob.mli
Installed C:\ocamlms\lib\site-lib\re\re_glob.cma
Installed C:\ocamlms\lib\site-lib\re\re_glob.cmxa
Installed C:\ocamlms\lib\site-lib\re\re_glob.lib
Installed C:\ocamlms\lib\site-lib\re\re_glob.cmxs
Installed C:\ocamlms\lib\site-lib\re\re_glob.cmi
Installed C:\ocamlms\lib\site-lib\re\re_glob.cmti
Installed C:\ocamlms\lib\site-lib\re\re_glob.cmt
Installed C:\ocamlms\lib\site-lib\re\re_glob.annot
Installed C:\ocamlms\lib\site-lib\re\re_glob.cmx
Installed C:\ocamlms\lib\site-lib\re\re_emacs.mli
Installed C:\ocamlms\lib\site-lib\re\re_emacs.cma
Installed C:\ocamlms\lib\site-lib\re\re_emacs.cmxa
Installed C:\ocamlms\lib\site-lib\re\re_emacs.lib
Installed C:\ocamlms\lib\site-lib\re\re_emacs.cmxs
Installed C:\ocamlms\lib\site-lib\re\re_emacs.cmi
Installed C:\ocamlms\lib\site-lib\re\re_emacs.cmti
Installed C:\ocamlms\lib\site-lib\re\re_emacs.cmt
Installed C:\ocamlms\lib\site-lib\re\re_emacs.annot
Installed C:\ocamlms\lib\site-lib\re\re_emacs.cmx
Installed C:\ocamlms\lib\site-lib\re\META
W: Nothing to install for findlib library 'fort_unit'
Zugriff verweigert
E: Failure("Command 'md \"C:\\Program Files (x86)\\re\"' terminated with error code 1")
Makefile:19: die Regel für Ziel "install" scheiterte
make: *** [install] Fehler 1

Finally, the INSTALL file mentions the possibility to build with "make opt", which seems to not exist as a build target.

Best regards
Martin

Use unsafe_get in tight loops?

One might get some performance improvements by using unsafe_get to access strings and arrays in Re.loop2 and Re.loop_no_mark.

A quick way to test that might be to compile the library with the -unsafe flag.

Glob */foo matches /foo

The glob code is explicitly constructed to match shell behavior, e.g. *.foo does not match ".foo". However, */foo matches "/foo", which is incorrect in a shell context.

Question about exec_partial

Hi!

The last few days I've tried to use Re.exec_partial both before the last patch (related to #11 ) and after it.
And I don't think I quite understand how it is supposed to work. Naïvely I thought that you would get the "`Partial" answer when your string is recognized as a prefix of an "accepted" pattern, but it seems that is not the case.

Here is what I had before the last patch :

# let r = Re.compile (Re.str "let");;
val r : Re.re = <abstr>
# Re.exec_partial r "l";;
- : [ `Full | `Mismatch | `Partial ] = `Mismatch
# Re.exec_partial r "let";;
- : [ `Full | `Mismatch | `Partial ] = `Full
# Re.exec_partial r "ileti";;
- : [ `Full | `Mismatch | `Partial ] = `Full 

And here is what I get now :

# let r = Re.compile (Re.str "let");;
val r : Re.re = <abstr>
# Re.exec_partial r "l";;
- : [ `Full | `Mismatch | `Partial ] = `Partial
 # Re.exec_partial r "let";;
- : [ `Full | `Mismatch | `Partial ] = `Partial
# Re.exec_partial r "ileti";;
- : [ `Full | `Mismatch | `Partial ] = `Full
# Re.exec_partial r "abc";;
- : [ `Full | `Mismatch | `Partial ] = `Partial    

So I don't know if it is buggy (I'm particularly worried by the last result) or if I completly misunderstood how it is supposed to work.

I would appreciate some pointers.

cannot install via opam on os x

this is on 4.02.3. error message:

# opam-version 1.2.2
# os           darwin
# command      ocaml setup.ml -doc
# path         /Users/nbecker/.opam/4.02.3/build/re.1.5.0
# compiler     4.02.3
# exit-code    1
# env-file     /Users/nbecker/.opam/4.02.3/build/re.1.5.0/re-34919-6d07ae.env
# stdout-file  /Users/nbecker/.opam/4.02.3/build/re.1.5.0/re-34919-6d07ae.out
# stderr-file  /Users/nbecker/.opam/4.02.3/build/re.1.5.0/re-34919-6d07ae.err
### stdout ###
# Implements the semantics of shells patterns. The returned regular
# [...]
#
#     If [expand_braces] is true, braced sets will expand into multiple globs,
#     e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
#     expansion is purely textual and can be nested. Defaults to false.
# line 23, character 18:
#     e.g. a{x,y}b{1,2} matches axb1, axb2, ayb1, ayb2.  As specified for bash, brace
#                   ^
#2 error(s) encountered
# Command exited with code 1.
### stderr ###
# E: Failure("Command ''/Users/nbecker/.opam/4.02.3/bin/ocamlbuild' ./lib/re-api.docdir/index.html -tag debug' terminated with error code 10")

Compilation broken on Windows

This is what I get when doing opam install re with the official installer:

#=== ERROR while installing re.1.7.1 ==========================================#
# opam-version 1.2.2
# os           cygwin
# command      ocaml setup.ml -build
# path         /home/mrm/.opam/system/build/re.1.7.1
# compiler     system (4.02.3)
# exit-code    1
# env-file     /home/mrm/.opam/system/build/re.1.7.1/re-2312-c61e7b.env
# stdout-file  /home/mrm/.opam/system/build/re.1.7.1/re-2312-c61e7b.out
# stderr-file  /home/mrm/.opam/system/build/re.1.7.1/re-2312-c61e7b.err
### stdout ###
# Usage C:\OCaml\bin\ocamlbuild.EXE [options] <target>
# [...]
#   -menhir <command>           Set the menhir tool (use it after -use-menhir)
#   -ocamllex <command>         Set the ocamllex tool
#   -ocamlmklib <command>       Set the ocamlmklib tool
#   -ocamlmktop <command>       Set the ocamlmktop tool
#   -ocamlrun <command>         Set the ocamlrun tool
#   --                          Stop argument processing, remaining arguments are given to the user program
#   -help                       Display this list of options
#   --help                      Display this list of options
#
### stderr ###
# E: Failure("Command 'C:\\OCaml\\bin\\ocamlbuild.EXE -classic-display -no-log -no-links -byte-plugin lib/re.cma lib/re.cmxa lib/re.a lib/re.cmxs lib/re_emacs.cma lib/re_emacs.cmxa lib/re_emacs.a lib/re_emacs.cmxs lib/re_str.cma lib/re_str.cmxa lib/re_str.a lib/re_str.cmxs lib/re_posix.cma lib/re_posix.cmxa lib/re_posix.a lib/re_posix.cmxs lib/re_glob.cma lib/re_glob.cmxa lib/re_glob.a lib/re_glob.cmxs lib/re_perl.cma lib/re_perl.cmxa lib/re_perl.a lib/re_perl.cmxs lib/re_pcre.cma lib/re_pcre.cmxa lib/re_pcre.a lib/re_pcre.cmxs -cflags '-w +a-40-42-44-3-4-48 -warn-error +1..49' -tag debug' terminated with error code 1")

I had to pin re.1.5.0, which has no build issues.

Split on first occurrence of a pattern

let re = Re.compile @@ Re.first (Re.char '\n')
let result = Re.split re "a\nb\nc"

result is now ["a"; "b"; "c"] where I would have hoped for/expected ["a"; "b\nc"].

Is there a way to achieve this with Re?

Add match module

I believe that Re.substrings is misnamed. I propose to add an Re.Match module with type t = substrings and shove all the substrings related functions in there. Obviously aliases will be maintained to preserve backwards compatibility. @vouillon thoughts?

cc @c-cube

Get rid of dumb top level dir name in tarball please?

Hi,

In case you use ocaml-$VER in git tags, could you stop using the whole tag as $VER again in tarball please?

The tarball downloaded contains use ocaml-re-ocaml-re-1.4.1 as top level of source, looks pretty dumb here, the common way should be ocaml-re-$VER.

Clearer documentation for Re.eol

I would expect Re.split Re.(compile eol) "a\nb" to return ["a"; 'b"] but it returns ["a"; "\nb"] instead. If this is intended it would be nice to have this more clearly stated in the documentation.

Installation instructions incomplete

README.md mentions make install to install the library but this target no longer exists.

What is the procedure to install the library? (without opam)

Should bos, bol, etc. match characters or positions?

I wanted to write a program that inserted a string at the beginning of every line and I was surprised by the behaviour of replace_string together with bol.

# Re.(replace_string (compile bol)) ~by:"z" "abc";;
- : string = "zbc"

Is consuming the first character intentional? Compare the behaviour of eol, which doesn't consume the last character:

# Re.(replace_string (compile eol)) ~by:"z" "abc";;
- : string = "abcz"

Similarly, other regex engines treat ^ as a position rather than a character, e.g. Python

>>> re.sub('^', 'z', 'abc')
'zabc'

and sed

$ sed 's!^!z!' <<< abc
zabc

host doc on github

it would be nice to host the generated API documentation somewhere (I'm not aware of it being hosted anywhere right now, please correct me if I'm wrong). github pages should be a simple and convenient way of solving this.

Re.split odd behaviour with separator at beginning/end

Consider the following toplevel session:

# let rex = Re.(compile (alt [char '\n'; str "\r\n"]));;
val rex : Re.re = <abstr>
# Re.split rex "hello\n\nworld";;
- : string list = ["hello"; ""; "world"]
# Re.split rex "\nhello\n\nworld\n";;
- : string list = ["hello"; ""; "world"]
# Re.split rex "\n\nhello\n\nworld\n\n";;
- : string list = [""; "hello"; ""; "world"; ""]

I understand that Re.split's proper behaviour in this case -- when a separator occurs at the very beginning or at the very end of a string -- is open to discussion. Nevertheless, the currently implemented behaviour as shown above strikes me as odd: if the number of separators at the beginning is 0, 1, and 2, the number of empty elements will be 0, 0, and 1, respectively.

I think it makes more sense for a single separator at the start to produce a list whose first element is empty. Likewise, a single separator at the end should produce an empty last element. In other words, if the number of separators at the beginning is 0, 1, and 2, the number of empty elements should also be 0, 1, and 2, respectively.

I've encountered this issue in practice while porting from OCaml-pcre to OCaml-re, and coding around it is a major PITA.

Needs functions to do replace/global replace

It is currently very frustrating to try and do string operations in OCaml. Neither stdlib, nor extlib, nor ocaml-re have functions to replace a substring with another string globally.

ocaml-re really should have one.

re.ml; xdigit

let xdigit = alt [digit; rg 'a' 'f'; rg 'A' 'Z']

I assume this is intended as a hex digit?

let xdigit = alt [digit; rg 'a' 'f'; rg 'A' 'F']

Glob pattern '*.ml' matches 'foo.mli' - is this correct?

The pattern '*.ml' compiled with Re_glob.globx matches 'foo.mli' according to Re.execp, much to my surprise. Is this correct (presumably because foo.mli includes a matching substring foo.ml)? If so, how can I make sure that a pattern matches the entire string such that *.ml would match foo.ml but not foo.mli?

*.ml matches:       foo.mli
*.ml doesn't match: foo.m
*.ml matches:       foo.mli.ml
*.ml matches:       x.ml
*.ml doesn't match: .ml 

Here is code testcase.ml to explore this:

let (@@) f x    = f x
let printf      = Printf.printf 

let matches glob str =
   if Re.execp (Re.compile @@ Re_glob.globx glob) str
   then printf "%s matches %s\n" glob str
   else printf "%s doesn't match %s\n" glob str

let () = matches "*.ml" "foo.mli"

Compiling and executing the code results in:

*.ml matches foo.mli

difference in behaviour between re_str and re

The following snippet will evaluate differently depending on what the Str module is (Str from findlib or Re_str):

  let partial_match_test () =
    let re = Str.regexp "[0-9]+K" in
    Str.string_partial_match re "12" 0

Str gives true while Re_str gives false

I've actually attempted to assemble a test suite for verifying compatibility with Str but I failed because I could not trick oasis into using str from OCaml. I'm pretty sure this line in the _oasis file creates the problems:

https://github.com/ocaml/ocaml-re/blob/master/_oasis#L28

Any tips for a work around?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.