Comments (7)
So the reason scc
and the other tools do this is due to Python using triple quotes as strings.
>>> '''this is string'''.split()
['this', 'is', 'string']
Knowing the difference between a docstring or a string in the file I don't think is possible without parsing the code using an AST which is going to be incredibly slow compared to how all of the tools currently work.
I am not sure how python treats them in the interpreter but I belive it actually processes them every run which would mean that according to Python they are lines of code as well.
I belive this was raised enough with tokei which produced the treat_doc_strings_as_comments = true
setting in its toml config to stem the issue requests. Not sure if I agree with this approach myself.
If you can think of a way to reliably identify when it is a doc string vs an actual string, perhaps with some test cases I would be happy to implement this though. Some thoughts I have to help with this.
If the triple quote string starts following a newline with only white-space characters in front and ends followed by only a newline or white-space characters it is a comment
Not sure if that is exclusive enough to catch all cases though.
from scc.
The method you describe is indeed the best heuristic I can think of. If you want to be be even more strict, you can also add a contition that the previous line does not end with \
, which indicates a line continuation. That would prevent this snippet to be considered a docstring.
message = \
"""
hello
world
"""
Here is how I would rank the various strategies, from the worst to the best:
- Interpret all the triple quoted strings as code (current behavior)
- Interpret all the triple quoted strings as comments
- Your proposed heuristic
- Your proposed heuristic with the additional handling of line continuation
Of course it's possible to use a real parser, but it needs to work with various versions of Python, and be resilient to malformed code. It would probably need one of these heuristics as a fallback, anyway. And even then, I doubt a real parser would be a significant improvement over strategy number 4.
For my own usage (on my own Python code base), a tool that uses strategy 1 is perfectly useless, while strategy 2 is perfectly fine.
from scc.
I am not a huge fan of option 2 personally. I would rather implement this properly. I just need to figure out how to change the JSON language structure to support this and I will start looking at implementing.
from scc.
Attempting to implement on this branch https://github.com/boyter/scc/tree/issue62
from scc.
Since this is related to #71 I am implementing a more generic solution there and discarding the work on the branch. I will remove it eventually once I have things working.
from scc.
The JSON changes needed to support this are sitting here #76
Once all 3 pending PR's are merged in I will resume work on this issue.
from scc.
Merged in, should be all good now.
from scc.
Related Issues (20)
- Support .astro files HOT 1
- Python Library HOT 3
- Can you please support GraphQL? HOT 2
- Files ignored from upper folder HOT 3
- Python counting is wrong with "" or '' HOT 1
- Add config file
- Feature Request - Include COCOMO in sql, sql-insert output HOT 3
- Feature request - Include code in strings HOT 3
- (incorrect?) Haskell keywords for complexity checks HOT 4
- In what unit, or how can I convert the complexity value HOT 2
- ChatGPT HOT 5
- New dataset HOT 1
- How can I include all _test.go files by regex. HOT 2
- Question: Use wildcards with excludes? HOT 1
- Unable to use multiple count-as flags HOT 5
- GNUmakefile is not recognized as Makefile HOT 1
- Recognize file type base on mime type HOT 8
- --exclude-dir doesn't exclude directories HOT 4
- Shields like style HOT 2
- Different Results on Windows and Linux HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scc.