Giter VIP home page Giter VIP logo

specs2016's Introduction

specs2016

A re-writing of the specs pipeline stage from CMS, only changed quite a bit

"specs" is a command line utility for parsing and re-arranging text input. It allows re-alignment of fields, some format conversion, and re-formatting multiple lines into single lines or vice versa. Input comes from standard input, and output flows to standard output.

This version is liberally based on the CMS Pipelines User's Guide and Reference, especially chapters 16, 24, and 20.

News

10-May-2024: Version 0.9.2 is here What's New:

  • An unthreaded mode of operation
  • Compound SET statements
  • Reduced necessity of quoting complex conditions for if and while
  • Bug fixes

29-JAN-2023: Version 0.9.1 is here What's New:

  • Allow execution of the output with the --shell or -X command line parameters
  • New function pretty
  • Usability improvements for functions wordcount, fieldcount.
  • Compiler alignment
  • Bug fixes

Sources

To download your copy of specs, you can get it from github in either of two ways:

  1. Using git: git clone https://github.com/yoavnir/specs2016.git
  2. Using http: wget https://github.com/yoavnir/specs2016/archive/dev.zip

Building

If you have downloaded a git repository, first make sure to check out a stable tag such as v0.9:

git checkout v0.9.2

A simple way to get the latest stable release is to check out the stable branch and rebase to its tip:

git checkout stable
git rebase

After that, cd to the specs/src directory, and run the following three commands:

  • python setup.py
  • make some
  • sudo make install

Note: Windows does not need sudo.

Note: On some Mac machines, sudo make install will cause a warning about being the wrong user.

Known Issues

  • Regular expression grammars other than the default ECMAScript don't work except on Mac OS.
  • On Windows with Python support the appropriate dll (like python38.dll) must be in the path.

Contributing

Anyone can contribute. So far, I have written most of the code, but if you want to help, I'll be very happy. Feel free to:

  • Submit bug reports or feature requests at the Issue Tracker.
  • Help solve some existing issue.
  • Submit pull requests

Contributors

Documentation

The documentation for specs2016 exists in two places:

  • In the manpage installed with the utility on Linux and Mac OS.
  • In the docs directory.

License

specs2016 is licensed under the MIT License.

specs2016's People

Contributors

gawesomer avatar yoavnir avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

gawesomer

specs2016's Issues

while-guard

Create some mechanism to stop runaway while loops:

  • Detect when a while statement continues too many times
  • Detect it even if there are nested while loops, the inner one terminating nicely
  • Abort processing or the record when detected.

Initialize ALUValue with const reference std::string

In many places a temporary string variable is initialized just to be used as the (non-const) reference passed to the ALUValue constructor.

Now that ALUValue has a const string constructor, we can skip this temporary variable. Need to go over the many uses of ALUValue.

Control breaks

Both as a keyword and as a function. See pages 195-200 in the manual

Installation issue on Linux Mint 19

Got error message:

cp: cannot create regular file '/usr/local/share/man/man1/': Not a directory

Results of manpath:

/usr/local/man:/usr/local/share/man:/usr/share/man

Error with expression

Here's the command:
cat x | specs print "tf2d(words(1,2),'%d/%m %H:%M:%S.%6f')" 1
Sample input:

29/12 10:08:20.020450 xxxx
29/12 10:08:20.020478 yyyy

Result is zeros.

OTOH the following does work:
cat x | specs a: w1-2 . print "tf2d(a,'%d/%m %H:%M:%S.%6f')" 1

Which produces:

1577606900.02045
1577606900.020478

"make some" does not prepare manpage

some does not include specs.1.gz

This leads to the following creating specs.1.gz with root permissions, which is bad.

make some
sudo make install

Error with simpler expression

specs print "2+2" 1
Error while parsing command-line arguments: Error in expression in Token PRINT at index 1 with content <+>

specs print 2-2" 1
Runtime error. Failed assertion: computeStack.size() == 1

Pad character is not respected

Specification: specs word 1 5 pad * word 2 15

Input: First record

Expected: First*****record

Got:     First     record (there's 5 spaces between 'First' and 'record')

Support record formats

  • [--recfm F] --lrecl xx - for fixed records with size xx
  • --recfm v/v8/v16/v61/v32/v23/v64/v46 -- variable records with length field at the start. Should figure out what the default / most common is and what simple v should be
  • --recfm cr
  • --recfm lf
  • --recfm crlf
  • --recfm default (whatever std::getline does by default) -- that will also be the default.

The parser is overly zealous in detecting delimiters

Examples:

specs print "(5)" 1 --> "Error while parsing command-line arguments: Unhandled token type GROUPEND at argument 2

specs print "(5+5)" 1 --> "Error while parsing command-line arguments: Bad output placement Token LITERAL at index 2 with content <5+5>"

Allow debugging the ALU

Add two new command-line switches:

  • --debug-alu-comp for debugging the parsing and compiling of expressions.
  • --debug-alu-run for debugging the runtime evaluation of expressions.
    Both are available for now only in debug compilation pending performance checks (around version 0.5)

Support REDO

Takes the in-process output record and makes it the input record.

Check back and forth time conversion

Here on Linux I get a mismatch with the following:

specs w1-2 tf2i "%d/%m %H:%M:%S" a: ID a ti2f "%d-%b-%Y %H:%M:%S"

Origin: 09/10 17:00:02
Result: 09-Oct-2018 18:00:02

More functions (long-lived issue)

  • rand(x) - returns an integer between 0 and x-1. Zero is a special value returning a random 64-bit unsigned integer.
  • frand() - returns a FP number between 0.0 and 1.0
  • sin/cos/tan/arcsin/arccos/arctan - trig functions. Need to allow both degrees and radians?
  • statistical package (at least sum, min, max, and average)
  • frequency map
  • wordcount / fieldcount / word by index / field by index
  • first() -- true on the first line - necessary for runin
  • eof() -- true in the runout cycle

CPU time on Linux accurate only down to 0.1s

(seen in a virtual machine -- may be a non-issue)
This is using the --stats switch. Sample result:

Read  1008841 lines.
Wrote 1008841 lines.
Run Time: 30.505347 seconds.
CPU Time: 60.700000 seconds.

Trivial string expressions fail to execute

Example:
specs print @version 1
Result:
Error while parsing command-line arguments: Error in expression in Token PRINT at index 1 with content <v0.2-beta>

But if you give it some "meat" to chew on...
specs print "@version||@version" 1
results in:
v0.2-betav0.2-beta

Packaging for Linux

Need an RPM.

How do you get something into the repositories for yum and/or apt?

Improve error reporting for then/do/endif

Current message for missing "then":
Failed assertion: TokenListType__THEN == tokenVec[index+1].Type()

Current message for missing "do":
Failed assertion: TokenListType__DO == tokenVec[index+1].Type()

There is no message for missing endif, but it doesn't work.

Proposed messages:
Missing THEN after if/elseif token at index xx with condition yyy
Missing DO after while token at index xx with condition yyy
Missing endif/done token for if/while token at index xx

Add a non-threaded mode

Change the code of Reader and Writer so that they can work without a queue:

Reader::get() would directly call Reader::getNextRecord()

Writer::write() would directly call Writer::WriteOut() -- this would need some refactor, because WriteOut() pulls from the queue()

Should have switches --thread and/or --nothread. Need to guess best default based on performance measurements.

Support EOF

end-of-file processing.

Everything after an EOF token is not processed for any line.
It is processed after the last line is processed.

This includes an eof() function

multi-stream support

using select (first/second/stream) and outstream (stream)

Since this is not CMS pipelines, streams will have to be defined as files or pipes in command-line switches.

TBD: input stream 0 is stdin. Second reading will be dubbed "second". output stream 0 is stdout. Will output stream 1 be stderr, or what we define in a file?

treat record number correctly

  1. Number should force record reading
  2. Number should be available as a function.

At least the first part is a bug. Assigning to v0.2

SegFault with complex specification

Crashed before reading any records:

if "words(6,7)=='structures created'" then
        /mh:/ 1
        print "#1" nw
        /count:/ nw
        print "#0" nw
        set "#1:=word(5)"
        set "#0:=0"
endif
if "word(6)=='CombId:'" then
        set "#0+=1"
endif

Running this with ALU debug, we seem to have premature evaluation:

specs -v -f xspc --debug-alu-comp --debug-alu-run
parseAluExpression: Parsing Expression: words(6,7)=='structures created'
Parsed Expression: ALU Vector at 0x7ffee378cb70 with 8 items:
       | FUNC(words)                             |
       | (                                       |
       | Number(6)                               |
       | COMMA                                   |
       | Number(7)                               |
       | )                                       |
       | BOP(==)                                 |
       | Literal(structures created)             |
       +-----------------------------------------+

Expression to Convert to RPN: ALU Vector at 0x7ffee378cb70 with 8 items:
       | FUNC(words)                             |
       | (                                       |
       | Number(6)                               |
       | COMMA                                   |
       | Number(7)                               |
       | )                                       |
       | BOP(==)                                 |
       | Literal(structures created)             |
       +-----------------------------------------+

RPN Expression: ALU Vector at 0x1e84a90 with 5 items:
       | Number(6)                               |
       | Number(7)                               |
       | FUNC(words)                             |
       | Literal(structures created)             |
       | BOP(==)                                 |
       +-----------------------------------------+

parseAluExpression: Parsing Expression: #1
Parsed Expression: ALU Vector at 0x7ffee378c9d0 with 1 items:
       | Counter(1)                              |
       +-----------------------------------------+

Expression to Convert to RPN: ALU Vector at 0x7ffee378c9d0 with 1 items:
       | Counter(1)                              |
       +-----------------------------------------+

RPN Expression: ALU Vector at 0x1e84508 with 1 items:
       | Counter(1)                              |
       +-----------------------------------------+

parseAluExpression: Parsing Expression: #0
Parsed Expression: ALU Vector at 0x7ffee378c9d0 with 1 items:
       | Counter(0)                              |
       +-----------------------------------------+

Expression to Convert to RPN: ALU Vector at 0x7ffee378c9d0 with 1 items:
       | Counter(0)                              |
       +-----------------------------------------+

RPN Expression: ALU Vector at 0x1e84618 with 1 items:
       | Counter(0)                              |
       +-----------------------------------------+

parseAluExpression: Parsing Expression: #1:=word(5)
Parsed Expression: ALU Vector at 0x7ffee378cb70 with 6 items:
       | Counter(1)                              |
       | ASS(:=)                                 |
       | FUNC(word)                              |
       | (                                       |
       | Number(5)                               |
       | )                                       |
       +-----------------------------------------+

Expression to Convert to RPN: ALU Vector at 0x7ffee378cb70 with 4 items:
       | FUNC(word)                              |
       | (                                       |
       | Number(5)                               |
       | )                                       |
       +-----------------------------------------+

RPN Expression: ALU Vector at 0x1e855e8 with 2 items:
       | Number(5)                               |
       | FUNC(word)                              |
       +-----------------------------------------+

parseAluExpression: Parsing Expression: #0:=0
Parsed Expression: ALU Vector at 0x7ffee378cb70 with 3 items:
       | Counter(0)                              |
       | ASS(:=)                                 |
       | Number(0)                               |
       +-----------------------------------------+

Expression to Convert to RPN: ALU Vector at 0x7ffee378cb70 with 1 items:
       | Number(0)                               |
       +-----------------------------------------+

RPN Expression: ALU Vector at 0x1e85928 with 1 items:
       | Number(0)                               |
       +-----------------------------------------+

parseAluExpression: Parsing Expression: word(6)=='CombId:'
Parsed Expression: ALU Vector at 0x7ffee378cb70 with 6 items:
       | FUNC(word)                              |
       | (                                       |
       | Number(6)                               |
       | )                                       |
       | BOP(==)                                 |
       | Literal(CombId:)                        |
       +-----------------------------------------+

Expression to Convert to RPN: ALU Vector at 0x7ffee378cb70 with 6 items:
       | FUNC(word)                              |
       | (                                       |
       | Number(6)                               |
       | )                                       |
       | BOP(==)                                 |
       | Literal(CombId:)                        |
       +-----------------------------------------+

RPN Expression: ALU Vector at 0x1e85a20 with 4 items:
       | Number(6)                               |
       | FUNC(word)                              |
       | Literal(CombId:)                        |
       | BOP(==)                                 |
       +-----------------------------------------+

parseAluExpression: Parsing Expression: #0+=1
Parsed Expression: ALU Vector at 0x7ffee378cb70 with 3 items:
       | Counter(0)                              |
       | ASS(+=)                                 |
       | Number(1)                               |
       +-----------------------------------------+

Expression to Convert to RPN: ALU Vector at 0x7ffee378cb70 with 1 items:
       | Number(1)                               |
       +-----------------------------------------+

RPN Expression: ALU Vector at 0x1e858d8 with 1 items:
       | Number(1)                               |
       +-----------------------------------------+

After parsing, index = 17/17
itemGroup has 13 items:
1. IF(words(6,7)=='structures created')
2. THEN
3. {Source=/mh:/;Dest=1}
4. {Source=Expression:#1;Dest=NextWord}
5. {Source=/count:/;Dest=NextWord}
6. {Source=Expression:#0;Dest=NextWord}
7. #1:=word(5)
8. #0:=0
9. ENDIF
10. IF(word(6)=='CombId:')
11. THEN
12. #0+=1
13. ENDIF

============= evaluateExpression ==============
Expression Progress: ALU Vector at 0x1e84a90 with 5 items:
   ==> | Number(6)                               |
       | Number(7)                               |
       | FUNC(words)                             |
       | Literal(structures created)             |
       | BOP(==)                                 |
       +-----------------------------------------+

Execution Stack: ALU Stack at 0x7ffee378ca30 with 0 items:



Expression Progress: ALU Vector at 0x1e84a90 with 5 items:
       | Number(6)                               |
   ==> | Number(7)                               |
       | FUNC(words)                             |
       | Literal(structures created)             |
       | BOP(==)                                 |
       +-----------------------------------------+

Execution Stack: ALU Stack at 0x7ffee378ca30 with 1 items:
   > 6



Expression Progress: ALU Vector at 0x1e84a90 with 5 items:
       | Number(6)                               |
       | Number(7)                               |
   ==> | FUNC(words)                             |
       | Literal(structures created)             |
       | BOP(==)                                 |
       +-----------------------------------------+

Execution Stack: ALU Stack at 0x7ffee378ca30 with 2 items:
   > 7
   > 6



Segmentation fault (core dumped)

Stack:

(gdb) bt
#0  0x000000000042de54 in ProcessingState::getFromTo(int, int) (this=0x7fffffffdd40, from=0, to=0) at processing/ProcessingState.cc:183
#1  0x000000000044902a in AluFunc_range(ALUInt, ALUInt) (start=0, end=0) at utils/aluFunctions.cc:164
#2  0x000000000044930e in AluFunc_words(ALUValue*, ALUValue*) (pStart=0x672b70, pEnd=0x672a60) at utils/aluFunctions.cc:203
#3  0x000000000043e85a in AluFunction::compute(ALUValue*, ALUValue*) (this=0x6718a0, op1=0x672b70, op2=0x672a60) at utils/alu.cc:881
#4  0x0000000000442500 in evaluateExpression(std::vector<AluUnit*, std::allocator<AluUnit*> >&, ALUCounters*) (expr=..., pctrs=0x66a2e0 <g_counters>) at utils/alu.cc:1443
#5  0x0000000000419af5 in ConditionItem::apply(ProcessingState&, StringBuilder*) (this=0x671a80, pState=..., pSB=0x7fffffffdec0) at specitems/specItems.cc:421
#6  0x0000000000418131 in itemGroup::processDo(StringBuilder&, ProcessingState&, Reader*, Writer*) (this=0x7fffffffdc00, sb=..., pState=..., pRd=0x7fffffffdc20, pWr=0x0) at specitems/specItems.cc:183
#7  0x0000000000405123 in main(int, char**) (argc=0, argv=0x7fffffffe110) at test/specs.cc:144

Moving the words(6,7) outside makes this not crash

Configuration labels not read properly

cat ~/.specs
siodt: %d/%m %H:%M:%S.%6f

echo "13/12 22:16:05.736715" | specs w1-2 tf2i "%d/%m %H:%M:%S.%6f" a: id a BSWAP b: id b C2X 1
00057cecfc03090b

echo "13/12 22:16:05.736715" | specs w1-2 tf2i @siodt a: id a BSWAP b: id b C2X 1
00057ce1e068be80

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.