Giter VIP home page Giter VIP logo

Comments (6)

xtradev avatar xtradev commented on May 28, 2024

Here is the function:

( extract
= inFileName, outFileName, lineSeparator, startOffset, linePrefix, lineSuffix, hosts
.
!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix)
& fil$(!inFileName,r)
& fil$(,STR,!lineSeparator)
& fil$(,SET,!startOffset)
&   whl
  ' ( fil$:(?line.?)
    & `(@(!line:!linePrefix ?host !lineSuffix)
       & !host \n !hosts:?hosts
       )
    )
& (fil$(,SET,-1)|)
& put$(str$!hosts,!outFileName,NEW)
);

from bracmat.

xtradev avatar xtradev commented on May 28, 2024
extract'("mvps.txt","mvps.out",\r\n,1417,  "0.0.0.0 ",$(" " ?|));

$ also does not help (though it is not within a pattern...)

from bracmat.

BartJongejan avatar BartJongejan commented on May 28, 2024

The pattern for the line suffix is transferred to the extract function. So far so good. If we now look at the body of the extract function, we have a variable arg that is bound to an unevaluated expression:

"mvps.txt","mvps.out",\r\n,1417,"0.0.0.0 ",(" " ?|)

You want to keep that expression unevaluated, or at least until you have brought the patterns for the line prefix and suffix in safety. However, when you do !arg, the resulting expression
is evaluated. Worst of all, the subexpression (" " ?|) is evaluated to (" " ?).

What you want to do is pattern matching in the unevaluated expression, so you can't use !arg. You need to use macro expansion, like so

'($arg)

Use this as the subject of the match operation. '($arg) evaluates to, in this case,

="mvps.txt","mvps.out",\r\n,1417,"0.0.0.0 ",(" " ?|)

Notice the = operator. The rhs of the = operator is not evaluated and that is what we want.

In the pattern we need to match the = operator as well, so we start with that:

=?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix

Therefore instead of

!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix)

you should write

'($arg):(=?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix)

Another solution, the one I prefer, is to write

extract$("mvps.txt","mvps.out",\r\n,1417,(="0.0.0.0 "),(=" " ?|));

instead of

extract'("mvps.txt","mvps.out",\r\n,1417,"0.0.0.0 ",(" " ?|));

and to have this at the start of the body of the extract function:

!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, (=?linePrefix), (=?lineSuffix))

The reason is that this clearly indicates what exactly shouldn't be evaluated and allows the rest to be evaluated. The advantage becomes especially clear in a case like the following, where some of the parameters are complex expressions. If you do

  mvps:?basename
& extract'(str$(!basename ".txt"),str$(!basename ".out"),\r\n,1417,"0.0.0.0 ",(" " ?|));

the expression str$(!basename ".txt") is assigned to inFileName and the expression str$(!basename ".out") is assigned to outFileName. If you then do !inFileName in the body of extract, the expression str$(!basename ".txt") is evaluated. Now, if basename happens to be a locally declared variable in the extract function, you probably get the wrong result. (In your case, where basename isn't a local variable in the extract function and !inFileName isn't evaluated many times, it works fine.)

On the other hand, if you do

  mvps:?basename
& extract$(str$(!basename ".txt"),str$(!basename ".out"),\r\n,1417,(="0.0.0.0 "),(=" " ?|));

The first and second parameter are evaluated to "mvps.txt" and "mvps.out" and these values are passed to the extract function.

from bracmat.

xtradev avatar xtradev commented on May 28, 2024

Slightly intricate, so was not readily obvious to me. I overlooked the !arg part and the fact that = prevents the evaluation, but not unification (it does not prevent ?lineSuffix from unifying with the argument).

And it's doable :-) Thank you for the detailed explanation. To the point and very helpful.
Here is the whole working code:

( extract
= inFileName, outFileName, lineSeparator, startOffset, linePrefix, lineSuffix, hosts
.
!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, (=?lineSuffix))
& fil$(!inFileName,r)
& fil$(,STR,!lineSeparator)
& fil$(,SET,!startOffset)
&   whl
  ' ( fil$:(?line.?)
    & `(@(!line:!linePrefix ?host !lineSuffix)
       & !host \n !hosts:?hosts
       )
    )
& (fil$(,SET,-1)|)
& put$(str$!hosts,!outFileName,NEW)
);
extract$("mvps.txt","mvps.out",\r\n,1417,  "0.0.0.0 ",(=" " ?|));
extract$("sowc.txt","sowc.out",\r\n,3923,"127.0.0.1 ",(=(\t|" ") ?|));
)y

from bracmat.

BartJongejan avatar BartJongejan commented on May 28, 2024

Nice!

If speed matters and the input files contain many lines, you may want to "compile" the loop before evaluating it, so the variables linePrefix and lineSuffix are evaluated only once instead of each time a line has been read:

  &     
      ' ( whl
        ' ( fil$:(?line.?)
          & `( @(!line:()$linePrefix ?host ()$lineSuffix)
             & !host \n !hosts:?hosts
             )
          )
        )
    : (=?loop)
  & !loop

Here we have used macro expansion to hardwire the values of linePrefix and lineSuffix into the loop expression. The result of the macro expansion, without the = operator, is assigned to the variable loop. The iteration is performed by evaluating this variable once.

from bracmat.

xtradev avatar xtradev commented on May 28, 2024

Good tip, thank you. I achieved significant improvement with it.

The code was run over 6 hosts files, one of which is quite big:

came.txt   651212 bytes
hpho.txt 21945622 bytes 606290 lines (hpHosts file from hosts-file.net)
mvps.txt   510650 bytes
mwdl.txt    45522 bytes
sowc.txt   339907 bytes
yoyo.txt    64623 bytes

The timings are in milliseconds. The second call is made after the change:

q)\t \bracmat "get$\"extract.bra\""
28171
q)\t \bracmat "get$\"extract.bra\""
18421

from bracmat.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.