Comments (6)
Here is the function:
( extract
= inFileName, outFileName, lineSeparator, startOffset, linePrefix, lineSuffix, hosts
.
!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix)
& fil$(!inFileName,r)
& fil$(,STR,!lineSeparator)
& fil$(,SET,!startOffset)
& whl
' ( fil$:(?line.?)
& `(@(!line:!linePrefix ?host !lineSuffix)
& !host \n !hosts:?hosts
)
)
& (fil$(,SET,-1)|)
& put$(str$!hosts,!outFileName,NEW)
);
from bracmat.
extract'("mvps.txt","mvps.out",\r\n,1417, "0.0.0.0 ",$(" " ?|));
$
also does not help (though it is not within a pattern...)
from bracmat.
The pattern for the line suffix is transferred to the extract
function. So far so good. If we now look at the body of the extract
function, we have a variable arg
that is bound to an unevaluated expression:
"mvps.txt","mvps.out",\r\n,1417,"0.0.0.0 ",(" " ?|)
You want to keep that expression unevaluated, or at least until you have brought the patterns for the line prefix and suffix in safety. However, when you do !arg
, the resulting expression
is evaluated. Worst of all, the subexpression (" " ?|)
is evaluated to (" " ?)
.
What you want to do is pattern matching in the unevaluated expression, so you can't use !arg
. You need to use macro expansion, like so
'($arg)
Use this as the subject of the match operation. '($arg)
evaluates to, in this case,
="mvps.txt","mvps.out",\r\n,1417,"0.0.0.0 ",(" " ?|)
Notice the =
operator. The rhs of the =
operator is not evaluated and that is what we want.
In the pattern we need to match the =
operator as well, so we start with that:
=?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix
Therefore instead of
!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix)
you should write
'($arg):(=?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, ?lineSuffix)
Another solution, the one I prefer, is to write
extract$("mvps.txt","mvps.out",\r\n,1417,(="0.0.0.0 "),(=" " ?|));
instead of
extract'("mvps.txt","mvps.out",\r\n,1417,"0.0.0.0 ",(" " ?|));
and to have this at the start of the body of the extract function:
!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, (=?linePrefix), (=?lineSuffix))
The reason is that this clearly indicates what exactly shouldn't be evaluated and allows the rest to be evaluated. The advantage becomes especially clear in a case like the following, where some of the parameters are complex expressions. If you do
mvps:?basename
& extract'(str$(!basename ".txt"),str$(!basename ".out"),\r\n,1417,"0.0.0.0 ",(" " ?|));
the expression str$(!basename ".txt")
is assigned to inFileName
and the expression str$(!basename ".out")
is assigned to outFileName
. If you then do !inFileName
in the body of extract
, the expression str$(!basename ".txt")
is evaluated. Now, if basename
happens to be a locally declared variable in the extract
function, you probably get the wrong result. (In your case, where basename
isn't a local variable in the extract
function and !inFileName
isn't evaluated many times, it works fine.)
On the other hand, if you do
mvps:?basename
& extract$(str$(!basename ".txt"),str$(!basename ".out"),\r\n,1417,(="0.0.0.0 "),(=" " ?|));
The first and second parameter are evaluated to "mvps.txt"
and "mvps.out"
and these values are passed to the extract function.
from bracmat.
Slightly intricate, so was not readily obvious to me. I overlooked the !arg
part and the fact that =
prevents the evaluation, but not unification (it does not prevent ?lineSuffix
from unifying with the argument).
And it's doable :-) Thank you for the detailed explanation. To the point and very helpful.
Here is the whole working code:
( extract
= inFileName, outFileName, lineSeparator, startOffset, linePrefix, lineSuffix, hosts
.
!arg:(?inFileName, ?outFileName, ?lineSeparator, ?startOffset, ?linePrefix, (=?lineSuffix))
& fil$(!inFileName,r)
& fil$(,STR,!lineSeparator)
& fil$(,SET,!startOffset)
& whl
' ( fil$:(?line.?)
& `(@(!line:!linePrefix ?host !lineSuffix)
& !host \n !hosts:?hosts
)
)
& (fil$(,SET,-1)|)
& put$(str$!hosts,!outFileName,NEW)
);
extract$("mvps.txt","mvps.out",\r\n,1417, "0.0.0.0 ",(=" " ?|));
extract$("sowc.txt","sowc.out",\r\n,3923,"127.0.0.1 ",(=(\t|" ") ?|));
)y
from bracmat.
Nice!
If speed matters and the input files contain many lines, you may want to "compile" the loop before evaluating it, so the variables linePrefix
and lineSuffix
are evaluated only once instead of each time a line has been read:
&
' ( whl
' ( fil$:(?line.?)
& `( @(!line:()$linePrefix ?host ()$lineSuffix)
& !host \n !hosts:?hosts
)
)
)
: (=?loop)
& !loop
Here we have used macro expansion to hardwire the values of linePrefix
and lineSuffix
into the loop expression. The result of the macro expansion, without the =
operator, is assigned to the variable loop
. The iteration is performed by evaluating this variable once.
from bracmat.
Good tip, thank you. I achieved significant improvement with it.
The code was run over 6 hosts files, one of which is quite big:
came.txt 651212 bytes
hpho.txt 21945622 bytes 606290 lines (hpHosts file from hosts-file.net)
mvps.txt 510650 bytes
mwdl.txt 45522 bytes
sowc.txt 339907 bytes
yoyo.txt 64623 bytes
The timings are in milliseconds. The second call is made after the change:
q)\t \bracmat "get$\"extract.bra\""
28171
q)\t \bracmat "get$\"extract.bra\""
18421
from bracmat.
Related Issues (12)
- JSON to CSV challenge HOT 2
- There is a qwirk in the imbedded chunk of a license HOT 1
- Comparison to Mathematica's pattern matching capabilities? HOT 9
- No unicaseconv.c unichartypes.c files in the src folder HOT 1
- "xmlio" library mentioned on Rosetta Code HOT 4
- Cannot build bracmat in Ubuntu 22.04 HOT 4
- escape operator's rhs HOT 2
- whl problem ? HOT 1
- test failure - Unsupported platform (OpenVMS AXP) HOT 28
- [ prefix HOT 1
- rev function replaces accented characters by question marks. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bracmat.