jeanralphaviles / comment_parser Goto Github PK

View Code? Open in Web Editor NEW

100.0 100.0 28.0 62 KB

Python module to extract comments from source code files of various types.

License: MIT License

Python 100.00%

comment-parser extract-comments python

comment_parser's People

Contributors

Stargazers

Watchers

comment_parser's Issues

Ruby support

Hi @jeanralphaviles
Is there any plans to support Ruby?

Python MIME type not recognised on Mac

In comment_parser.py, the MIME_MAP dictionary only recognises "text/x-python" for Python. On Mac, the MIME type is reported as "text/x-script.python" (at least by the magic module). This ought to be handled too.

C# Support

Hey,

Is there any support for C# in mind? I'm looking into processing some C# and JavaScript files in order to create some documentation.

If you don't mind I would like to help.

Thanks in advance!

Make libmagic an optional dependency

I use extract_comments_from_str only, and I know the mimetype and supply it always. However it's still complaining about "failed to find libmagic".

Can you make it only to import and call libmagic if it's actually needed?

SyntaxError due to missing comma in comment_parser.py

The error is at line : https://github.com/jeanralphaviles/comment_parser/blob/master/comment_parser/comment_parser.py#L34

Add support to assembly source code and customized comment rule

Assembly source code usually starts comment by ";"
Also it is suggested to add simple customized comment rule (starting from a specified char, which could be set by caller".

c++ extraction fails

Hi！When I tried this tool to extract c++ comments from the attached file,it only extracted 9 comments.Obviously,there are more than 9 comments....
Expression.txt

Javascript parsing fails with quoted path.

If there is a quoted path that contains a * the parser fails with UnterminatedCommentError().
I found this when parsing a file with CDK having a policy statement like:

new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        actions: [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ],
        resources: [`arn:aws:logs:${environment.region}:${environment.account}:log-group/*`]
      })

Would be nice if the quotes were tracked so you would know if the /* was a string rather than comment

Multiline regex should use non-greedy match

comment_parser/comment_parser/parsers/c_parser.py

Line 42 in f57fa77

(?P<multi> /\*(?P<multi_content>(.|\n)*)?\*/) |

when parsing
/* abc */ /* abc */

output is
abc */ /* abc

(?P /*(?P<multi_content>(.|\n)+?)?*/) works in this case. See https://stackoverflow.com/questions/14213848/difference-between-and

PHP Support?

Hello,

Do you have any plans to support extracting comments from PHP files?

Many thanks
Rob

Javascript media type

Maybe it comes from python-magic but text/x-javascript is deprecated. Current media type is application/javascript.

Cannot handle the strings containing "/*".

Hi I met

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/comment_parser/comment_parser.py", line 99, in extract_comments_from_str
    return parser.extract_comments(code)
  File "/usr/local/lib/python3.6/dist-packages/comment_parser/parsers/c_parser.py", line 66, in extract_comments
    raise common.UnterminatedCommentError()
comment_parser.parsers.common.UnterminatedCommentError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/comment_parser/comment_parser.py", line 74, in extract_comments
    return extract_comments_from_str(code.read(), mime)
  File "/usr/local/lib/python3.6/dist-packages/comment_parser/comment_parser.py", line 101, in extract_comments_from_str
    raise ParseError(str(e))
comment_parser.comment_parser.ParseError

After a simple analysis and testing, I found it is caused by the "/*" in this statement

        assertEquals("{\"Version\":\"2008-10-17\",\"Id\":\"Policy4324355464\",\"Statement\":[{\"Sid\":\"Stmt456464646477\",\"Action\":[\"s3:GetObject\"],\"Effect\":\"Allow\",\"Resource\":"
                + "[\"arn:aws:s3:::mybucket/some/path/*\"],\"Principal\":{\"AWS\":[\"*\"]}}]}", endpoint.getConfiguration().getPolicy());

Infinite Loop

ff = '''
private String extractFileName(Header header) {
if (header != null) {
String value = header.getValue();
int start = value.indexOf(FILENAME_HEADER_PREFIX);
if (start != -1) {
value = value.substring(start + FILENAME_HEADER_PREFIX.length());
int end = value.indexOf('\"');
if (end != -1) {
return value.substring(0, end);
}
}
}
return null;
}'''

xx = comment_parser.extract_comments_from_str(ff,mime='text/x-java-source')

The script seems not able to preprocess this method. Specifically the problematic instruction is the bold one.
Do you have any workarounds?

Mime Error

At the time of running your tool with one of my script files, I faced the following error:
AttributeError: 'str' object has no attribute 'decode'

If I remove decode, I again saw the following error:
comment_parser.comment_parser.UnsupportedError: Unsupported MIME type

I am using python 3.5.2

Status of the project

The README states that this project might see more features/support of languages soon and there are a bunch of PRs which would add a lot of value to the library, still they haven't been merged or commented on in quite a while.

So what is the status of the project? Can one expect new language support at any point in the future, or has this project to be considered feature frozen?

What is the best way to reactivate, add features, add missing support and ultimately get things into a release?

Extract comments from string?

Is there any way of extract comments from a string? I tried this:

comment_parser.extract_comments(a_lis mime='text/x-java-source'')

However, I got:

~/anaconda3/envs/lib/python3.6/site-packages/comment_parser/comment_parser.py in extract_comments(filename, mime)
     78     except common.Error as exception:
     79         raise ParseError(str(exception))
---> 80     return parser.extract_comments(filename)
     81 
     82 

~/anaconda3/envs/lib/python3.6/site-packages/comment_parser/parsers/c_parser.py in extract_comments(filename)
     73         return comments
     74     except OSError as exception:
---> 75         raise common.FileError(str(exception))
     76 

FileError: [Errno 36] File name too long:```

I guess it would be very handy to be able to extract the comments from a string. In my case, `a_list` is  a list which contains the code in a string format.

Python doc string support? (triple quotes)

I'm not sure if the triple quotes officially qualify as "comments" but they are often used in the same fashion;

https://github.com/autokey/autokey/blob/develop/lib/autokey/scripting/clipboard_gtk.py

In the files in this repo all of the real documentation takes place within the triple quote python thing not the python comments. It would be nice if there was some way to add the definition of what the script considers to be a "comment"

Thanks!

libmagic database for golang?

Hey, sorry to post a non-issue like this, I just wanted to ask: Do you have a libmagic database for recognizing golang code?

Basically I noticed that you support a text/x-go mime type here (and work at google haha), and so I thought it might be worth asking. I've looked around online but not managed to find any, and unfortunately I don't think I know go well enough to write my own.

Anyway, no worries if not, sorry again for the noise

Suggest add an option to ignore special encoding characters

Hi, this tool works well in many cases. But I found two problems.

Encoding problem

If a file contains other encoding characters, e.g., Chinese characters and ½, an exception will occur in extract_comments method.

I added "errors='ignore'" in the following statement on my local computer, and it can ignore the above special characters and continue to extract the rest characters of a comment.

def extract_comments(filename, mime=None):
    with open(filename, 'r', errors='ignore') as code:

So I think we can provide this option to users and let them determine to ignore or not.

Complex string

The tool throws an exception when parser this java file. I found the cause may be the complex string in line 99.

Thanks for your tool, it helps me a lot. Hope better~

jeanralphaviles / comment_parser Goto Github PK

comment_parser's People

Contributors

Stargazers

Watchers

Forkers

comment_parser's Issues

Recommend Projects

Recommend Topics

Recommend Org