Giter VIP home page Giter VIP logo

slre's Introduction

SLRE: Super Light Regular Expression library

SLRE is an ISO C library that implements a subset of Perl regular expression syntax. Main features of SLRE are:

  • Written in strict ANSI C'89
  • Small size (compiled x86 code is about 5kB)
  • Uses little stack and does no dynamic memory allocation
  • Provides simple intuitive API
  • Implements most useful subset of Perl regex syntax (see below)
  • Easily extensible. E.g. if one wants to introduce a new metacharacter \i, meaning "IPv4 address", it is easy to do so with SLRE.

SLRE is perfect for tasks like parsing network requests, configuration files, user input, etc, when libraries like PCRE are too heavyweight for the given task. Developers of embedded systems would benefit most.

Supported Syntax

(?i)    Must be at the beginning of the regex. Makes match case-insensitive
^       Match beginning of a buffer
$       Match end of a buffer
()      Grouping and substring capturing
\s      Match whitespace
\S      Match non-whitespace
\d      Match decimal digit
\n      Match new line character
\r      Match line feed character
\f      Match form feed character
\v      Match vertical tab character
\t      Match horizontal tab character
\b      Match backspace character
+       Match one or more times (greedy)
+?      Match one or more times (non-greedy)
*       Match zero or more times (greedy)
*?      Match zero or more times (non-greedy)
?       Match zero or once (non-greedy)
x|y     Match x or y (alternation operator)
\meta   Match one of the meta character: ^$().[]*+?|\
\xHH    Match byte with hex value 0xHH, e.g. \x4a
[...]   Match any character from set. Ranges like [a-z] are supported
[^...]  Match any character but ones from set

Under development: Unicode support.

API

int slre_match(const char *regexp, const char *buf, int buf_len,
               struct slre_cap *caps, int num_caps, int flags);

slre_match() matches string buffer buf of length buf_len against regular expression regexp, which should conform the syntax outlined above. If regular expression regexp contains brackets, slre_match() can capture the respective substrings into the array of struct slre_cap structures:

/* Stores matched fragment for the expression inside brackets */
struct slre_cap {
  const char *ptr;  /* Points to the matched fragment */
  int len;          /* Length of the matched fragment */
};

N-th member of the caps array will contain fragment that corresponds to the N-th opening bracket in the regex, N is zero-based. slre_match() returns number of bytes scanned from the beginning of the string. If return value is greater or equal to 0, there is a match. If return value is less then 0, there is no match. Negative return codes are as follows:

#define SLRE_NO_MATCH               -1
#define SLRE_UNEXPECTED_QUANTIFIER  -2
#define SLRE_UNBALANCED_BRACKETS    -3
#define SLRE_INTERNAL_ERROR         -4
#define SLRE_INVALID_CHARACTER_SET  -5
#define SLRE_INVALID_METACHARACTER  -6
#define SLRE_CAPS_ARRAY_TOO_SMALL   -7
#define SLRE_TOO_MANY_BRANCHES      -8
#define SLRE_TOO_MANY_BRACKETS      -9

Example: parsing HTTP request line

const char *request = " GET /index.html HTTP/1.0\r\n\r\n";
struct slre_cap caps[4];

if (slre_match("^\\s*(\\S+)\\s+(\\S+)\\s+HTTP/(\\d)\\.(\\d)",
               request, strlen(request), caps, 4, 0) > 0) {
  printf("Method: [%.*s], URI: [%.*s]\n",
         caps[0].len, caps[0].ptr,
         caps[1].len, caps[1].ptr);
} else {
  printf("Error parsing [%s]\n", request);
}

Example: find all URLs in a string

static const char *str =
  "<img src=\"HTTPS://FOO.COM/x?b#c=tab1\"/> "
  "  <a href=\"http://cesanta.com\">some link</a>";

static const char *regex = "(?i)((https?://)[^\\s/'\"<>]+/?[^\\s'\"<>]*)";
struct slre_cap caps[2];
int i, j = 0, str_len = strlen(str);

while (j < str_len &&
       (i = slre_match(regex, str + j, str_len - j, caps, 2, 0)) > 0) {
  printf("Found URL: [%.*s]\n", caps[0].len, caps[0].ptr);
  j += i;
}

Output:

Found URL: [HTTPS://FOO.COM/x?b#c=tab1]
Found URL: [http://cesanta.com]

License

SLRE is released under GNU GPL v.2. Businesses have an option to get non-restrictive, royalty-free commercial license and professional support from Cesanta Software.

Super Light DNS Resolver, Mongoose web server are other projects by Cesanta Software, developed with the same philosophy of functionality and simplicity.

slre's People

Contributors

cpq avatar tempanon avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.