Giter VIP home page Giter VIP logo

nested-extraction's Introduction

nested extraction

The command line tool nested extraction, abbreviated as ne is originally designed to extract information from latex. But it can also be used to extract any content between any pair of symbols from strings. For now, its functionality will only be demonstrated within latex.

Some other command line tools, likeawk, can also extract content from a pair of symbols, but it is not easy to extract content from nested pari of symbols, which is quite common in latex. With the help ofne,the solution will be much easy and intuitive.

Usage

The input of ne can be a single string, or a file, or through pipeline. The options are:

  • -h: display help information
  • -a: the string that needed to be dealt with
  • -s: using which pair of symbols to split the string, default symbols are '{}'
  • -f: the name of file that needed to be dealt with
  • -p: print which part of string, default behavior is printing the whole string
  • -d: using which delimiter to seperate the output, default symbol is ':'
  • -r: replace the selected content with new content, exclude the pair of symbols
  • -R: replace the selected content with new content, include the pair of symbols

Example

if the single string is

\MS{1991}{\Hyperlink{abc}{H\"{o}bby} }{male}{AMerica}{e^{2x}}

and the content of input filea.txt is:

\MS{1991}{\Hyperlink{abc}{H\"{o}bby} }{male}{AMerica}{e^{2x}}
\MS{1993}{\Hyperlink{de}{Helen }{female}{AMerica}{\sum^{n}_{i=1}a_i}

Firstly, you have to decide to use which kind of input:

  • if the input is a single string,
ne -a '''\MS{1991}{\Hyperlink{abc}{H\"{o}bby} }{male}{AMerica}{e^{2x}}'''
  • if the input is the filea.txt,
ne -f a.txt
  • if the input is through pipeline,
cat a.txt | ne

Once set up the input, you shoud decide to use which pair of symbols to sepreate one string. By default, the pair of symbols are {and}, but you can also use other symbols, for example (). The way to set a new pair of symbols is

ne -s "()"

Next is the part of output.ne can parse the input and store positions of the nested pair of symbols in a tree. The index begins with 0, but it means printing the whole string, which won't be used in normal cases. Taking the following string as an example

\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}

There are five parts within each pair of{}. Especially, the second and fiveth part include nested content. Without the -p option, the program will just print the whole string. The syntax of -p is

  • if you want to print the first part, 1991
ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}'''  -p 1
  • if you want to print the first, second and third part,
ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}''' -p "1;2;3"
  • if you want to print the nested partabcandH\"{o}bby,
ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}''' -p "2-1;2-2"

When there are multiple outputs, the default delimiter is :, you can change the delimiter to; by

ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}''' -p "1;2;3" -d ";"

Other than printing the selected parts, you can substitue the selected part with new content by -rand-R, then print out the new string. The difference between-rand-Ris:-rwill only subsitute the content within the pair of symbols, but-R will subsitute the selected content and the pair of symbols.

  • if you want to substitue1991with1994
ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}''' -r "1;1994"
  • if you want to substitue{1991}with(1994)
ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}''' -R "1;(1994)"
  • if you want to substitue1991with1994and substitue the third partmalewithfemale
ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}''' -r "1;1994;3;female"
  • if you want to substitue1991with1994and delete the third partmale
ne -a '''\MS{1991} {\Hyperlink{abc}{H\"{o}bby}} {male} {AMerica} {e^{2x}}''' -r "1;1994;3;"

Warning: if you want to use-rto delete the content, don't miss the final;! Because in the program, the command argument1;1994;3; will be splited into 4 parts, and the forth part is empty. Using empty to substute content acts like delete. Without the final;, the program doesn't know how to do. Althought sometimes it may works, but it's not safe!

need to do in the future

  • add debug module
  • add modern try...except

nested-extraction's People

Contributors

zhf-0 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.