Giter VIP home page Giter VIP logo

go-restructure's Introduction

Struct-based argument parsing for Go

Documentation Build Status Coverage Status Go Report Card


Match regular expressions into struct fields

go get github.com/alexflint/go-restructure

This package allows you to express regular expressions by defining a struct, and then capture matched sub-expressions into struct fields. Here is a very simple email address parser:

import "github.com/alexflint/go-restructure"

type EmailAddress struct {
	_    struct{} `^`
	User string   `\w+`
	_    struct{} `@`
	Host string   `[^@]+`
	_    struct{} `$`
}

func main() {
	var addr EmailAddress
	restructure.Find(&addr, "[email protected]")
	fmt.Println(addr.User) // prints "joe"
	fmt.Println(addr.Host) // prints "example.com"
}

(Note that the above is far too simplistic to be used as a serious email address validator.)

The regular expression that was executed was the concatenation of the struct tags:

^(\w+)@([^@]+)$

The first submatch was inserted into the User field and the second into the Host field.

You may also use the regexp: tag key, but keep in mind that you must escape quotes and backslashes:

type EmailAddress struct {
	_    string `regexp:"^"`
	User string `regexp:"\\w+"`
	_    string `regexp:"@"`
	Host string `regexp:"[^@]+"`
	_    string `regexp:"$"`
}

Nested Structs

Here is a slightly more sophisticated email address parser that uses nested structs:

type Hostname struct {
	Domain string   `\w+`
	_      struct{} `\.`
	TLD    string   `\w+`
}

type EmailAddress struct {
	_    struct{} `^`
	User string   `[a-zA-Z0-9._%+-]+`
	_    struct{} `@`
	Host *Hostname
	_    struct{} `$`
}

func main() {
	var addr EmailAddress
	success, _ := restructure.Find(&addr, "[email protected]")
	if success {
		fmt.Println(addr.User)        // prints "joe"
		fmt.Println(addr.Host.Domain) // prints "example"
		fmt.Println(addr.Host.TLD)    // prints "com"
	}
}

Compare this to using the standard library regexp.FindStringSubmatchIndex directly:

func main() {
	content := "[email protected]"
	expr := regexp.MustCompile(`^([a-zA-Z0-9._%+-]+)@((\w+)\.(\w+))$`)
	indices := expr.FindStringSubmatchIndex(content)
	if len(indices) > 0 {
		userBegin, userEnd := indices[2], indices[3]
		var user string
		if userBegin != -1 && userEnd != -1 {
			user = content[userBegin:userEnd]
		}

		domainBegin, domainEnd := indices[6], indices[7]
		var domain string
		if domainBegin != -1 && domainEnd != -1 {
			domain = content[domainBegin:domainEnd]
		}

		tldBegin, tldEnd := indices[8], indices[9]
		var tld string
		if tldBegin != -1 && tldEnd != -1 {
			tld = content[tldBegin:tldEnd]
		}

		fmt.Println(user)   // prints "joe"
		fmt.Println(domain) // prints "example"
		fmt.Println(tld)    // prints "com"
	}
}

Optional fields

When nesting one struct within another, you can make the nested struct optional by marking it with ?. The following example parses floating point numbers with optional sign and exponent:

// Matches "123", "1.23", "1.23e-4", "-12.3E+5", ".123"
type Float struct {
	Sign     *Sign     `?`      // sign is optional
	Whole    string    `[0-9]*`
	Period   struct{}  `\.?`
	Frac     string    `[0-9]+`
	Exponent *Exponent `?`      // exponent is optional
}

// Matches "e+4", "E6", "e-03"
type Exponent struct {
	_    struct{} `[eE]`
	Sign *Sign    `?`         // sign is optional
	Num  string   `[0-9]+`
}

// Matches "+" or "-"
type Sign struct {
	Ch string `[+-]`
}

When an optional sub-struct is not matched, it will be set to nil:

"1.23" -> {
  "Sign": nil,
  "Whole": "1",
  "Frac": "23",
  "Exponent": nil
}

"1.23e+45" -> {
  "Sign": nil,
  "Whole": "1",
  "Frac": "23",
  "Exponent": {
    "Sign": {
      "Ch": "+"
    },
    "Num": "45"
  }
}

Finding multiple matches

The following example uses Regexp.FindAll to extract all floating point numbers from a string, using the same Float struct as in the example above.

src := "There are 10.4 cats for every 100 dogs in the United States."
floatRegexp := restructure.MustCompile(Float{}, restructure.Options{})
var floats []Float
floatRegexp.FindAll(&floats, src, -1)

To limit the number of matches set the third parameter to a positive number.

Getting begin and end positions for submatches

To get the begin and end position of submatches, use the restructure.Submatch struct in place of string:

Here is an example of matching python imports such as import foo as bar:

type Import struct {
	_       struct{}             `^import\s+`
	Package restructure.Submatch `\w+`
	_       struct{}             `\s+as\s+`
	Alias   restructure.Submatch `\w+`
}

var importRegexp = restructure.MustCompile(Import{}, restructure.Options{})

func main() {
	var imp Import
	importRegexp.Find(&imp, "import foo as bar")
	fmt.Printf("IMPORT %s (bytes %d...%d)\n", imp.Package.String(), imp.Package.Begin, imp.Package.End)
	fmt.Printf("    AS %s (bytes %d...%d)\n", imp.Alias.String(), imp.Alias.Begin, imp.Alias.End)
}

Output:

IMPORT foo (bytes 7...10)
    AS bar (bytes 14...17)

Regular expressions inside JSON

To run a regular expression as part of a json unmarshal, just implement the JSONUnmarshaler interface. Here is an example that parses the following JSON string containing a quaternion:

{
	"Var": "foo",
	"Val": "1+2i+3j+4k"
}

First we define the expressions for matching quaternions in the form 1+2i+3j+4k:

// Matches "1", "-12", "+12"
type RealPart struct {
	Sign string `regexp:"[+-]?"`
	Real string `regexp:"[0-9]+"`
}

// Matches "+123", "-1"
type SignedInt struct {
	Sign string `regexp:"[+-]"`
	Real string `regexp:"[0-9]+"`
}

// Matches "+12i", "-123i"
type IPart struct {
	Magnitude SignedInt
	_         struct{} `regexp:"i"`
}

// Matches "+12j", "-123j"
type JPart struct {
	Magnitude SignedInt
	_         struct{} `regexp:"j"`
}

// Matches "+12k", "-123k"
type KPart struct {
	Magnitude SignedInt
	_         struct{} `regexp:"k"`
}

// matches "1+2i+3j+4k", "-1+2k", "-1", etc
type Quaternion struct {
	Real *RealPart
	I    *IPart `regexp:"?"`
	J    *JPart `regexp:"?"`
	K    *KPart `regexp:"?"`
}

// matches the quoted strings `"-1+2i"`, `"3-4i"`, `"12+34i"`, etc
type QuotedQuaternion struct {
	_          struct{} `regexp:"^"`
	_          struct{} `regexp:"\""`
	Quaternion *Quaternion
	_          struct{} `regexp:"\""`
	_          struct{} `regexp:"$"`
}

Next we implement UnmarshalJSON for the QuotedQuaternion type:

var quaternionRegexp = restructure.MustCompile(QuotedQuaternion{}, restructure.Options{})

func (c *QuotedQuaternion) UnmarshalJSON(b []byte) error {
	if !quaternionRegexp.Find(c, string(b)) {
		return fmt.Errorf("%s is not a quaternion", string(b))
	}
	return nil
}

Now we can define a struct and unmarshal JSON into it:

type Var struct {
	Name  string
	Value *QuotedQuaternion
}

func main() {
	src := `{"name": "foo", "value": "1+2i+3j+4k"}`
	var v Var
	json.Unmarshal([]byte(src), &v)
}

The result is:

{
  "Name": "foo",
  "Value": {
    "Quaternion": {
      "Real": {
        "Sign": "",
        "Real": "1"
      },
      "I": {
        "Magnitude": {
          "Sign": "+",
          "Real": "2"
        }
      },
      "J": {
        "Magnitude": {
          "Sign": "+",
          "Real": "3"
        }
      },
      "K": {
        "Magnitude": {
          "Sign": "+",
          "Real": "4"
        }
      }
    }
  }
}

Index of examples

Benchmarks

See benchmarks document

go-restructure's People

Contributors

alexflint avatar imjasonh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-restructure's Issues

Add Regexp.FindAll

Should be able to use like this:

pattern := restructure.Compile(Foo{}, Options{})
var matches []Foo
pattern.FindAll(&matches)

Example with email address

I think this is a super useful library. Consider changing the example or putting a note that regex should not be used to validate an email address. In section 6.1 of rfc822, it goes a little into this suggesting that the best you can do is local-part@domain in that domain only optionally has a .. I work with a large email service provider, and the best you can do to validate an email is to send to it. Inevitably, any regex will fail aside from "does it have an at-sign?". Cheers!

ragel backend with code generation?

No so much a feature request, but perhaps an interesting direction. From a struct tag, generate a ragel machine definition that matches a []byte and stuffs the values into a struct. This would 1) use the ragel regex engine which is considerably faster than Go's, and 2) remove reflection.

Repeated sub-structs

Make it possible to write a field Foo []*SomeOtherStruct "*" and get repetitions in the slice.

use `key:value` struct tags?

I think that

type Example struct {
    MyField string `regex:"[^@]+"`
}

instead of

type Example struct {
    MyField string `[^@]+`
}

would mesh more with other libraries, standard style, etc. Worth a discussion (awesome library, btw)

Expose begin and end position of submatches

Make it possible to write a field Foo restructure.Submatch "\w+" where restructure.Submatch contains Begin, End, and Content.

How should begin and end positions for sub-structs be exposed? Perhaps special types restructure.BeginPos and restructure.EndPos that are populated with the current struct's begin/end when recognized?

Optional terminals

It should be possible to write a field Foo *string "(abc)?" and have it wind up nil if the group did not match.

license?

Hi! I couldn't find license information in the repo, could you possibly add it, or make it more prominent if already present? TIA

Capture groups in tags interfere with capture groups from your library

Modify your test example as follows (BTW, \x60 is the backtick, since I was trying to figure out how to use it inside a regex without removing the string literal backticks):

type DotExpr struct {
        _    struct{} `^`
        Head string   `(\w|\x60)+`
        Tail *DotName `?`
        _    struct{} `$`
}

Note that Head now includes a capture group. The associated test will now fail:

$ go test
--- FAIL: TestMatchNameDotName (0.00s)
        Error Trace:    restructure_test.go:28
        Error:          Not equal: "foo" (expected)
                                != "o" (actual)

--- FAIL: TestMatchNameDotNameHeadOnly (0.00s)
        Error Trace:    restructure_test.go:40
        Error:          Not equal: "head" (expected)
                                != "d" (actual)

FAIL
exit status 1
FAIL    github.com/alexflint/go-restructure     0.003s

This shows that the capture groups specified in the struct tags interfere with those from the library. It can be worked around by using non-capture groups: (?:\w|\x60)+. You should address this: either warn the user not to use capture groups, parse the regexps in restructure.Compile for capture groups and return an error if any are found, or figure out how to handle them without breaking your library.

JS port

Hey, I made a quick JS port at https://github.com/benjamingr/js-restructure

I'll add nested properties and other goodies later. Gave you credit and all :)

Just a quick fun experiment while my code was compiling and I was angry at having no internet - nothing serious.

Add int parsing ?

Hello,

Was wondering if it's possible to extract int values, here is my code that failed.

type Dummy struct {
	_          struct{} `regexp:"^"`

	DummyInt int      `regexp:"\\d"`

	_          struct{} `regexp:"$"`
}
var dummy Dummy
restructure.Find(&dummy, "1")

fmt.Println(dummy.DummyInt)

May be I'm missing some simple way to do it since I'm discovering go.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.