jeanralphaviles / comment_parser Goto Github PK
View Code? Open in Web Editor NEWPython module to extract comments from source code files of various types.
License: MIT License
Python module to extract comments from source code files of various types.
License: MIT License
Hi @jeanralphaviles
Is there any plans to support Ruby?
In comment_parser.py, the MIME_MAP
dictionary only recognises "text/x-python" for Python. On Mac, the MIME type is reported as "text/x-script.python" (at least by the magic
module). This ought to be handled too.
Hey,
Is there any support for C# in mind? I'm looking into processing some C# and JavaScript files in order to create some documentation.
If you don't mind I would like to help.
Thanks in advance!
I use extract_comments_from_str
only, and I know the mimetype and supply it always. However it's still complaining about "failed to find libmagic".
Can you make it only to import and call libmagic if it's actually needed?
Assembly source code usually starts comment by ";"
Also it is suggested to add simple customized comment rule (starting from a specified char, which could be set by caller".
Hi!When I tried this tool to extract c++ comments from the attached file,it only extracted 9 comments.Obviously,there are more than 9 comments....
Expression.txt
If there is a quoted path that contains a * the parser fails with UnterminatedCommentError().
I found this when parsing a file with CDK having a policy statement like:
new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
resources: [`arn:aws:logs:${environment.region}:${environment.account}:log-group/*`]
})
Would be nice if the quotes were tracked so you would know if the /* was a string rather than comment
when parsing
/* abc */ /* abc */
output is
abc */ /* abc
(?P /*(?P<multi_content>(.|\n)+?)?*/) works in this case. See https://stackoverflow.com/questions/14213848/difference-between-and
Hello,
Do you have any plans to support extracting comments from PHP files?
Many thanks
Rob
Maybe it comes from python-magic
but text/x-javascript
is deprecated. Current media type is application/javascript
.
Hi I met
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/comment_parser/comment_parser.py", line 99, in extract_comments_from_str
return parser.extract_comments(code)
File "/usr/local/lib/python3.6/dist-packages/comment_parser/parsers/c_parser.py", line 66, in extract_comments
raise common.UnterminatedCommentError()
comment_parser.parsers.common.UnterminatedCommentError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/comment_parser/comment_parser.py", line 74, in extract_comments
return extract_comments_from_str(code.read(), mime)
File "/usr/local/lib/python3.6/dist-packages/comment_parser/comment_parser.py", line 101, in extract_comments_from_str
raise ParseError(str(e))
comment_parser.comment_parser.ParseError
After a simple analysis and testing, I found it is caused by the "/*" in this statement
assertEquals("{\"Version\":\"2008-10-17\",\"Id\":\"Policy4324355464\",\"Statement\":[{\"Sid\":\"Stmt456464646477\",\"Action\":[\"s3:GetObject\"],\"Effect\":\"Allow\",\"Resource\":"
+ "[\"arn:aws:s3:::mybucket/some/path/*\"],\"Principal\":{\"AWS\":[\"*\"]}}]}", endpoint.getConfiguration().getPolicy());
ff = '''
private String extractFileName(Header header) {
if (header != null) {
String value = header.getValue();
int start = value.indexOf(FILENAME_HEADER_PREFIX);
if (start != -1) {
value = value.substring(start + FILENAME_HEADER_PREFIX.length());
int end = value.indexOf('\"');
if (end != -1) {
return value.substring(0, end);
}
}
}
return null;
}'''
xx = comment_parser.extract_comments_from_str(ff,mime='text/x-java-source')
The script seems not able to preprocess this method. Specifically the problematic instruction is the bold one.
Do you have any workarounds?
At the time of running your tool with one of my script files, I faced the following error:
AttributeError: 'str' object has no attribute 'decode'
If I remove decode, I again saw the following error:
comment_parser.comment_parser.UnsupportedError: Unsupported MIME type
I am using python 3.5.2
The README states that this project might see more features/support of languages soon and there are a bunch of PRs which would add a lot of value to the library, still they haven't been merged or commented on in quite a while.
So what is the status of the project? Can one expect new language support at any point in the future, or has this project to be considered feature frozen?
What is the best way to reactivate, add features, add missing support and ultimately get things into a release?
Is there any way of extract comments from a string? I tried this:
comment_parser.extract_comments(a_lis mime='text/x-java-source'')
However, I got:
~/anaconda3/envs/lib/python3.6/site-packages/comment_parser/comment_parser.py in extract_comments(filename, mime)
78 except common.Error as exception:
79 raise ParseError(str(exception))
---> 80 return parser.extract_comments(filename)
81
82
~/anaconda3/envs/lib/python3.6/site-packages/comment_parser/parsers/c_parser.py in extract_comments(filename)
73 return comments
74 except OSError as exception:
---> 75 raise common.FileError(str(exception))
76
FileError: [Errno 36] File name too long:```
I guess it would be very handy to be able to extract the comments from a string. In my case, `a_list` is a list which contains the code in a string format.
I'm not sure if the triple quotes officially qualify as "comments" but they are often used in the same fashion;
https://github.com/autokey/autokey/blob/develop/lib/autokey/scripting/clipboard_gtk.py
In the files in this repo all of the real documentation takes place within the triple quote python thing not the python comments. It would be nice if there was some way to add the definition of what the script considers to be a "comment"
Thanks!
Hey, sorry to post a non-issue like this, I just wanted to ask: Do you have a libmagic database for recognizing golang code?
Basically I noticed that you support a text/x-go
mime type here (and work at google haha), and so I thought it might be worth asking. I've looked around online but not managed to find any, and unfortunately I don't think I know go well enough to write my own.
Anyway, no worries if not, sorry again for the noise
Hi, this tool works well in many cases. But I found two problems.
If a file contains other encoding characters, e.g., Chinese characters and ½, an exception will occur in extract_comments method.
I added "errors='ignore'" in the following statement on my local computer, and it can ignore the above special characters and continue to extract the rest characters of a comment.
def extract_comments(filename, mime=None):
with open(filename, 'r', errors='ignore') as code:
So I think we can provide this option to users and let them determine to ignore or not.
The tool throws an exception when parser this java file. I found the cause may be the complex string in line 99.
Thanks for your tool, it helps me a lot. Hope better~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.