Comments (9)
I identified an O(N^2) algorithm in cl-tree-sitter that slows parsing of files with very long child lists, and submitted a pull request to death/cl-tree-sitter to fix it.
from sel.
A shorthand was added for from-file
a little while back, and it can be called with the symbol:
(from-file 'c "~/path/to/file.c")
https://tree-sitter.github.io/tree-sitter/playground can be used to determine if this is tree-sitter generating the error. Similarly, calling cl-tree-sitter:parse-string
can be used for the same effect. In this case, it is producing the error, so this is expected.
This also appears to be generated with an older version of SEL since the root node should be an error-variation-point:
SEL/SW/TS> (from-file 'c "~/Downloads/sqlite3.c.txt")
#<C ~/Downloads/sqlite3.c.txt>
SEL/SW/TS> (genome *)
#<C-ERROR-VARIATION-POINT 8 :TEXT "/*******************...">
Switching between which variation is used on the variation point can be done with sel/sw/ts:*use-variation-point-tree*
.
from sel.
This file takes a long time to parse, so I'm going to open up an internal issue for profiling it when someone gets a chance.
from sel.
Awesome.
- Thanks for the
from-file
shorthand, I wasn't aware. - In general some of these options could use more easy discoverability. In particular I'm thinking of
*use-variation-point-tree*
andwith-attr-table
(which I ran into yesterday). I don't have any good suggestions here 😄, just thought I'd note. - Also, appreciate looking into the parse time! I'm going to move on to a smaller program for my near-term work.
from sel.
I have also identified what is likely causing the slow parsing of sqlite3.c. It's due to repeated concatenate on octet vectors when handling error nodes. In sqlite3.c, almost the entire file ends up in an error node, so N is very large (the file has >200K lines.) Nathaniel is looking at this (it should be possible to get it down to linear time.)
I will be adding some scalability tests to sel soon. Two of these illustrate the problem.
Having said that, having almost all of the file in an error node is not very useful. This may be more of a tree-sitter issue.
from sel.
The problem, aside from #if 0
around extern "C" {
and }
, is in the file is in the function winRead. In particular, there is the following code:
#if SQLITE_OS_WINCE || defined(SQLITE_WIN32_NO_OVERLAPPED)
if( winSeekFile(pFile, offset) ){
OSTRACE(("READ pid=%lu, pFile=%p, file=%p, rc=SQLITE_FULL\n",
osGetCurrentProcessId(), pFile, pFile->h));
return SQLITE_FULL;
}
while( !osReadFile(pFile->h, pBuf, amt, &nRead, 0) ){
#else
memset(&overlapped, 0, sizeof(OVERLAPPED));
overlapped.Offset = (LONG)(offset & 0xffffffff);
overlapped.OffsetHigh = (LONG)((offset>>32) & 0x7fffffff);
while( !osReadFile(pFile->h, pBuf, amt, &nRead, &overlapped) &&
osGetLastError()!=ERROR_HANDLE_EOF ){
#endif
Notice the unmatched open braces. Tree-sitter will fail on this.
I think someone could write a program that expanded the scope of ifdef/else blocks by pulling in following code (duplicating it) until all the braces, parents, etc. are balanced.
from sel.
2. In general some of these options could use more easy discoverability. In particular I'm thinking of `*use-variation-point-tree*` and `with-attr-table` (which I ran into yesterday). I don't have any good suggestions here smile, just thought I'd note.
I've opened an issue to add some documentation on variation points as there isn't anything in the manual at the moment. As a side note in regards to discoverability and so that you're aware of it, https://github.com/GrammaTech/sel/blob/master/components/configuration.lisp was recently added to help configure SEL parameters.
from sel.
scalability tests have been added to sel. They are not "actual" tests, but functions you can invoke inside time
or sb-sprof:with-profiling
forms to see where time is going in parsing on large variable-sized C programs. See test/scalability.lisp
from sel.
The configuration.lisp
support seems handy, very nice.
W.r.t. the errors, I think special bespoke handling for very common problem structures like #if 0
makes a ton of sense, and it sounds like that might be sufficient to get most (if not all) of sqlite.c to parse.
from sel.
Related Issues (20)
- Implicit dependency on sb-posix
- software-evolution-library/test fails with "return for unknown block: REST-SERVER" HOT 5
- software-evolution-library/test does not build - macro/format issue? HOT 1
- Updates to eclector break some systems
- Two systems don't build today HOT 1
- Does not build without a git checkout? HOT 1
- System "gt/full" not found HOT 2
- Error in run-rest-server and run-dump-store HOT 1
- Some systems failed to build for Quicklisp dist HOT 6
- Some systems failed to build for Quicklisp dist HOT 2
- Some systems failed to build for Quicklisp dist HOT 1
- Some systems failed to build for Quicklisp dist HOT 2
- Failed to load "tree-sitter" HOT 1
- a weird problem when performing transformation HOT 3
- `asts.ASTLanguage.Python`: invalid resulting indentation for class' docstring HOT 1
- Loading test/template.lisp fails on macro-expansion of a match clause HOT 2
- Fork CFFI HOT 4
- Some systems failed to build for Quicklisp dist HOT 1
- Request to add example code / information from Evolving Exact Decompilation (BED)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sel.