Giter VIP home page Giter VIP logo

regex's Issues

`split` should include capture groups in results

As a web developer approaching Elm from Javascript, I expect regexes and associated functions to behave the same as in JS.

Issue: In Elm 0.18 Core/Regex, an expression containing capture groups would splice captures into the returned array. In Elm 0.19 Regex, the capture groups are omitted.

Justification: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split#Description

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array.

Examples:

---- Elm 0.19.0 ----------------------------------------------------------------
Read <https://elm-lang.org/0.19.0/repl> to learn more: exit, help, imports, etc.
--------------------------------------------------------------------------------
> import Regex
> Regex.split (Maybe.withDefault Regex.never <| Regex.fromString ",") "a,b,c,d"
["a","b","c","d"] : List String
> Regex.split (Maybe.withDefault Regex.never <| Regex.fromString "(,)") "a,b,c,d"
["a","b","c","d"] : List String
---- elm-repl 0.18.0 -----------------------------------------------------------
 :help for help, :exit to exit, more at <https://github.com/elm-lang/elm-repl>
--------------------------------------------------------------------------------
> import Regex exposing (regex, HowMany(All))
> Regex.split All (regex ",") "a,b,c,d"
["a","b","c","d"] : List String
> Regex.split All (regex "(,)") "a,b,c,d"
["a",",","b",",","c",",","d"] : List String
// This was run in Chrome 69
console.log("a,b,c,d,e".split(/,/));
console.log("a,b,c,d,e".split(/(,)/));
VM59:1 (5) ["a", "b", "c", "d", "e"]
VM59:2 (9) ["a", ",", "b", ",", "c", ",", "d", ",", "e"]

Regex.find with too lose regex

I was playing with Regex.find and I wanted to find all matches for [a-z0-9] so I wrote the regex ([a-z0-9]*).

I wrote it in Elm and ran it in the repl

> (Regex.find Regex.All (regex "([a-z0-9]*)") "simon er en banan en")
[{ match = "simon", submatches = [Just "simon"], index = 0, number = 1 }]
    : List Regex.Match

So this only return one match. I got some help, and after change the regex to [a-z0-9]+ instead of [a-z0-9]* witch now work like I wanted.

I started to wonder if this is a bug or not? I check what would happen if I tested in the browser.

"simon er en banan en".match(/([a-z0-9]*)/g) 
=> ["simon", "", "er", "", "en", "", "banan", "", "en", ""]

This is not exactly what I wanted but I think the elm method should return the same.

I checked a the source and found that Native/Regex.js#L34 is the line witch stops it from finding more matches. This line was added in https://github.com/elm-lang/core/pull/156 to stop a infinity loop.

JavaScript heap out of memory error with empty string regex

Splitting on a Regex from an empty string ("") leads to a page crash.

> import Regex
> everyCharacter = Maybe.withDefault Regex.never (Regex.fromString "")
{} : Regex.Regex
> Regex.split everyCharacter ""

<--- Last few GCs --->

[14137:0x104002a00]     6566 ms: Mark-sweep 577.3 (584.5) -> 577.3 (581.5) MB, 292.9 / 0.0 ms  (average mu = 0.474, current mu = 0.000) last resort GC in old space requested
[14137:0x104002a00]     6860 ms: Mark-sweep 577.3 (581.5) -> 577.3 (581.5) MB, 294.5 / 0.0 ms  (average mu = 0.307, current mu = 0.000) last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1ca8ebd5c01d]
Security context: 0x1cf1ef21e681 <JSObject>
    1: push [0x1cf1ef2057f1](this=0x1cf1e3004a49 <JSArray[75209227]>,0x1cf12c0029f1 <String[0]: >)
    2: /* anonymous */(aka /* anonymous */) [0x1cf1e3004a69] [/Users/tessakelly/Documents/elmoji-translator/elm-stuff/0.19.0/temp.js:~861] [pc=0x1ca8ebded7f2](this=0x1cf12c0026f1 <undefined>,n=0x1cf1e302d889 <Number inf>,re=0x1cf1e3004891 <JSRegExp <String[4]: ...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 0x10003907e node::Abort() [/usr/local/bin/node]
 2: 0x10003924f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x10019064b v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x1001905ec v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0x10043fdb4 v8::internal::Heap::UpdateSurvivalStatistics(int) [/usr/local/bin/node]
 6: 0x1004465d2 v8::internal::Heap::SetUp() [/usr/local/bin/node]
 7: 0x10042685d v8::internal::Factory::AllocateRawArray(int, v8::internal::PretenureFlag) [/usr/local/bin/node]
 8: 0x10042625d v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::Heap::RootListIndex, int, v8::internal::Object*, v8::internal::PretenureFlag) [/usr/local/bin/node]
 9: 0x1003e58f2 v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2> >::ConvertElementsWithCapacity(v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::FixedArrayBase>, v8::internal::ElementsKind, unsigned int, unsigned int, unsigned int, int) [/usr/local/bin/node]
10: 0x1003e57a5 v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2> >::GrowCapacityAndConvertImpl(v8::internal::Handle<v8::internal::JSObject>, unsigned int) [/usr/local/bin/node]
11: 0x1003e43fc v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedObjectElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)2> >::Add(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, unsigned int) [/usr/local/bin/node]
12: 0x10050ade4 v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::ShouldThrow) [/usr/local/bin/node]
13: 0x10061a73d v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::LanguageMode) [/usr/local/bin/node]
14: 0x10061d6db v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
15: 0x1ca8ebd5c01d

I think I would expect for the behavior to match Regex.never:

> import Regex
> Regex.split Regex.never ""
[""] : List String

Regression: capture groups should be able to match empty string

SSCCE:

module Main exposing (main)

import Browser
import Html exposing (Html, text)
import Regex

view : Html Never
view =
    text -- first submatch should be Just "" instead of Nothing
        ( Debug.toString
            ( Regex.find
                ( Maybe.withDefault
                    Regex.never
                    (Regex.fromString "(.*)")
                )
                ""
            )
        )

main : Program () () Never
main =
    Browser.sandbox
        { init = ()
        , view = always view
        , update = always identity
        }

Feature request: "escape" function

The old Regex library had a wrapper for Javascript's Regex.escape, which is really handy if you want to, say, find and highlight portions of a text which case-insensitively match a search term. It'd be great if this module could provide such a function! :-)

Publish for 0.18

Could you publish this package for Elm 0.18? Or @evancz's repo.

$ elm --version
0.18.0
$ elm-package install elm-lang/regex
Error: Could not find any packages named elm-lang/regex.
$ elm-package install evancz/regex
Error: Could not find any packages named evancz/regex.

I'm looking for a function String -> Maybe Regex and found this repo from https://github.com/elm-lang/core/issues/722 after the core Regex.regex let me down. I'm very new to Elm so forgive me if this is a misguided request.

Feature request: Some way to get invalid regex error message

My app takes regular expressions from users. There's no affordance for getting errors back when a regular expression is invalid. E.g. "(.*" should communicate an error back from the browser's regex implementation. It would be nice if there were a Regex.fromString version that returned a Result instead of a Maybe.

Debug.log for Regexes does not show the regex

(I imagine that this fix would likely be made in elm/core, but I thought it would be more appropriate here. Feel free to move the issue)

Problem

I was just in the middle of debugging why a string didn't match a regex, and I tried Debug.logging the regex, which gave me {}, which I found to be very unhelpful.

Creating a regex inside the Elm REPL gives the same behavior:

> import Regex
> Regex.fromString "hello"
Just {} : Maybe Regex.Regex

Expectation

I expected to see some kind of information in the log message to see what the regex is. The Node.js REPL gives the following feedback, which I believe is more helpful:

> /hello/
/hello/
> new RegExp("hello", "ig")
/hello/gi

Suggested solution

I suggest that elm/core's _Debug_toAnsiString function adds a special-case for regexes to print the regex in a useful way.

I feel like the format of the stringified version aims to be somewhat copy-pastable into Elm code (seeing examples like Dict.fromList), which makes this format a tiny bit tricky. Ideally, the format of the regex should probably be the "JavaScript regex format" (/hello/gi), but that wouldn't be valid Elm code.

A second option would be to print the code needed to create the Regex from Elm code (Regex.fromString "hello") but that would have 2 drawbacks:

  • The code would not be copy-pastable, because there is no function that directly creates a regex without wrapping it in a Maybe.
  • You'd need to add backslashes (/"hello"/ -> Regex.fromString "\"hello\""), which makes the regex harder to read, which might be counterproductive when the intent of the Debug.logging was to make it clearer what the regex is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.