Giter VIP home page Giter VIP logo

regexplain's Introduction

RegExplain

Regular expressions are tricky. RegExplain makes it easier to see what you’re doing.

Project Status: Active – The project has reached a stable, usable state and is being actively developed. CRAN_Status_Badge

RegExplain is an RStudio addin slash utility belt for regular expressions. Interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions. And more.

Inspired by RegExr.com and stringr::str_view().

Installation

Installation is easy with remotes

# install.packages("remotes")
remotes::install_github("gadenbuie/regexplain")

RegExplain in Action

Overview

regexplain selection

Regular Expressions Library

regexplain library

Try the Built-In Examples

regexplain examples

RStudio Addin

The main feature of this package is the RStudio Addin RegExplain Selection. Just select the text or object containing text (such as the variable name of a vector or a data.frame column) and run RegExplain Selection from the RStudio Addins dropdown.

regexplain in the Rstudio Addins dropdown

You can also open the addin with regexplain_gadget(). This allows you to pass text or a regular expression to the gadget, which is useful when you want to work with a regular expression in your code or environment.

regexplain_gadget(text_vector, "\\b(red|blue|green): \\d{3}")

The addin will open an interface with 4 panes where you can

  • edit the text you’ve imported
  • build up a regex expression and interactively see it applied to your text
  • test the output of common string matching and replacement functions from base and stringr
  • and refer to a helpful cheatsheet

The panes of regexplain

When you’re done, click on the Send Regex to Console to send your regex expression to… the console!

> pattern <- "\\b(red|orange|yellow|green|blue|purple|white|brown)(?:\\s(\\w+))?"

Notice that RegExplain handled the extra backslashes needed for storing the RegEx characters \b, \s, and \w. Inside the gadget you can use regular old regular expressions as you found them in the wild (hello, Stack Overflow!).

Help and Cheat Sheet

The Help tab is full of resources, guides, and R packages and includes an easy-to-navigate reference of commonly used regular expression syntax.

regexplain help windows

Open RegExplain Cheatsheet from the RStudio Addins drop down to open the regex reference page in the Viewer pane without blocking your current R session.

Import Your Text

There are two ways to get your text into RegExplain. The first way was described above: select an object name or lines of text or code in the RStudio source pane and run RegExplain Selection. To import text from a file, use RegExplain File to you import the text you want to process with regular expressions.

When importing text, RegExplain automatically reduces the text to the unique entries and limits the number of lines.

regexplain addins

Regular Expressions Library

The RegExplain gadget includes a regular expressions library in the RegEx tab. The library features common regular expressions, sourced from qdapRegex and Regex Hub, with several additional patterns.

The full library is stored as a JSON file in inst/extdata/patterns.json, feel free to contribute patterns you find useful or use regularly via pull request.

regexplain library modal

View Static Regex Results

RegExplain provides the function view_regex() that you can use as a stringr::str_view() replacement. In addition to highlighting matched portions of the text, view_regex() colorizes groups and attempts to colorize the regex expression itself as well.

text <- c("breakfast=eggs;lunch=pizza",
          "breakfast=bacon;lunch=spaghetti", 
          "no food here")
pattern <- "((\\w+)=)(\\w+).+(ch=s?p)"

view_regex(text, pattern)

Example view_regex(text, pattern).

t_nested <- "anestedgroupwithingroupexample"
r_nested <- "(a(nested)(group(within(group))(example)))"
view_regex(t_nested, r_nested)

Example of nested groups

Notes

Regular expressions are nothing if not a collection of corner cases. Trying to pass regular expressions through Shiny and HTML inputs is a bit of a labyrinth. For now, assume any issues or oddities you experience with this addin are entirely my fault and have nothing to do with the fine packages this addin is built on. If you do find an issue, please file an issue. Pull requests are welcomed!

regexplain's People

Contributors

gadenbuie avatar gegznav avatar katrinleinweber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

regexplain's Issues

Addin stopped working since RStudio upgrade

Only the "Cheatsheeth" does launch despite the error message

regexplain:::regexplain_addin()
Error in get(x, envir = ns, inherits = FALSE) : object '%AND%' not found

RStudio
Version 1.4.1103

MacBook-Air M1
r --version
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

empty textblock not correctly handled in RegEx panel

Experimenting with the toolbox I tried to change text between parentheses. A pair of parentheses without any text in between is not handled correctly in the RegEx panel. However the Output panel shows the expected output. See end of this post.

As a possible related issue I have not succeeded in copying a regex with the send RegEx to Console button. The regexplain applet (?) just ends without visible indication that a copy is done (to console, editor or clipboard).
This happens even when I try to change '2017' in '2018' with gsub with the applet: in this case the RegEx and Output panel both show the expected display but the send RegEx to Console button only ends the applet.

I am using RStudio Version 1.2.502 and R packages as listed below in sessionInfo.

Thanks for bringing this software to the public domain.

RegEx panel:
regex_panel

Output panel:
output_panel

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] shiny_1.0.5 bindrcpp_0.2.2

loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 rstudioapi_0.7 bindr_0.1.1
[4] magrittr_1.5 tidyselect_0.2.4 xtable_1.8-2
[7] R6_2.2.2 rlang_0.2.0 dplyr_0.7.4
[10] tools_3.4.1 miniUI_0.1.1 htmltools_0.3.6
[13] yaml_2.1.18 assertthat_0.2.0 digest_0.6.15
[16] tibble_1.4.2 regexplain_0.2.1 tidyr_0.8.0
[19] purrr_0.2.4 curl_3.1 glue_1.2.0
[22] mime_0.5 compiler_3.4.1 pillar_1.2.1
[25] jsonlite_1.5 httpuv_1.3.6.2 pkgconfig_2.0.1

Crazy stuff happens when non group between two groups (guessing)

Test: "(?:The )?([^ ]+) is|was|were ([^ ]+) (([^ ]+) (([^ ]+) ([^ ]+)))"

Should not look like this:

image

Maybe because regexec returns something like this:

[[4]]
[1] 22 22  0  0  0  0  0  0
attr(,"match.length")
[1] 6 3 0 0 0 0 0 0
attr(,"useBytes")
[1] TRUE

[[5]]
[1] 1 1 0 0 0 0 0 0
attr(,"match.length")
[1] 7 4 0 0 0 0 0 0
attr(,"useBytes")
[1] TRUE

Or is the regex fundamentally wrong?

pandoc: theme:boostrap does not exist

Using rocker/verse c97e22aa5090 (I think), I get the following error

> view_regex(stringr::sentences[1:10], "(?:The )?([^ ]+) (?:is|was|were) ([^ ]+) (([^ ]+) (([^ ]+) ([^ ]+)))")
pandoc: theme:bootstrap: openFile: does not exist (No such file or directory)
Error: pandoc document conversion failed with error 1

Remove dependencies on dplyr and tidyr

Needs more investigation, but I think I only use dplyr and tidyr in a one or two places and can easily refactor to remove them as dependencies. See #9 for a reason why this would be a good idea (dplyr imports other packages, etc.).

Possible issue with RStudio version 1.2

As described in #9, there may be an issue coming up with R Studio version 1.2

As a possible related issue I have not succeeded in copying a regex with the send RegEx to Console button. The regexplain applet (?) just ends without visible indication that a copy is done (to console, editor or clipboard).
This happens even when I try to change '2017' in '2018' with gsub with the applet: in this case the RegEx and Output panel both show the expected display but the send RegEx to Console button only ends the applet.

I am using RStudio Version 1.2.502 and R packages as listed below in sessionInfo.

Allow gadget to accept regex

The gadget currently accepts text input via regexplain::regex_gadget(text). This works well for the addin where the current selection is passed to the gadget.

In working with a regexp, I often cycle between the source and the gadget, and it would be useful to be able to pass a regexp to the gadget as well, i.e. regexplain::regex_gadget(text, pattern).

Reduce package size

  • Offload screencast gifs to personal website or website repo
  • Optimize pngs
  • Clean up other big files?

Support visual regex graphic?

Hi!

Though I like any initiative to make regular expressions more convenient to use, I find most solutions to lack a visual understanding of the regex itself. Textual explanations of regular expressions don't really cut it IMHO. Probably the best source for visualising regex is Debuggex, which is unfortunately down with a gateway error half of the time. Would it be possible to include (something similar to) their awesome visual graphic as perhaps a new tab to your interface?

What I mean is this:

# example regex for validating email adresses, quite hard to read
^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$

Output by Debuggex (which is mainly powered by their static/js/main.js):

image

If you want to replace or extract, say, the domain, you can immediately tell that you're looking for group 2 (\\2). Might be obvious in this case, but for more complex expressions, this really matters and can save a lot of time.

How awesome would it be, if your Shiny app could convert this:

# example regex for matching IP addresses
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

into this:

image

😄

Show parameters inside function in Send Regex To Console

When hitting "Send Regex To Console" in the Output tab, the console shows something like the following:

pattern <- "\w+ " # perl=TRUE
replacement <- "NO "

Wouldn't be better to integrate the parameters into the actual function used? In this case, it could translate into:

sub(pattern = "\w+",
replacement = "NO ",
x = readr::read_file("LICENSE"),
perl = TRUE)

Or maybe:

sub(pattern = "\w+",
replacement = "NO ",
x = YOUR_CHARACTER_VECTOR_HERE,
perl = TRUE)

Thanks for the awesome package!

Allow global searches in gadget text display

The gadget currently shows the first match in each line akin to sub() because this is how regexc() works.

A global search option would be great, most likely by adding logic to remove the first match and re-run regexc() on the remaining string.

Link to other packages, apps, resources

Throws error if replacement input is empty

To reproduce:

  • Set text and pattern
  • View str_replace so that replacement input is generated
  • Leave replacement blank
  • Click "Send to console"
  • Get error:
Warning: Error in if: missing value where TRUE/FALSE needed
Stack trace (innermost first):
    68: observeEventHandler
     4: shiny::runApp
     3: runGadget
     2: regex_gadget
     1: regexplain:::regexplain_addin
ERROR: [on_request_read] connection reset by peer

Should be quick fix to check that input$replacement is truthy.

Update to shiny may have blocked usage for this addin

Trying to use regexplain via the addin or any of the functions in the package gives the same error indicating %AND% not found.

Per our separate conversation offline, it sounds like this may be relying on a shiny component that has now changed.

regexplain::regexplain_web(text = "https://cms.nhl.bamgrid.com/images/headshots/current/168x168/8478402.jpg", pattern = ".*[/]([^.]+)[.].*")
#> Error in get(x, envir = ns, inherits = FALSE): object '%AND%' not found

Created on 2021-01-27 by the reprex package (v0.3.0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.