Giter VIP home page Giter VIP logo

Comments (13)

zmajeed avatar zmajeed commented on August 21, 2024

You're right - I think gobbling any trailing stars fixes the regex

"/*"([^*]|"*"[^/])*"*"+"/"

or to make it easier to read with whitespace ignored

(?x: "/*" ( [^*] | "*"[^/] )* "*"+"/" )

I'll make the change if you think this is right

from flex.

ogdenpm avatar ogdenpm commented on August 21, 2024

from flex.

zmajeed avatar zmajeed commented on August 21, 2024

Sorry - that had a typo - I corrected it above - it should have been

(?x: "/*" ( [^*] | "*"[^/] )* "*"+"/" )

This accepts

/* text ** more text */

But it has another problem - stemming from the original regex - where it accepts invalid comments like

/* comment 1 **/ invalid missing comment start */

This is because for the intermediate **/ input, "*"[^/] consumes ** then [^*] consumes / and the lexer merrily continues to a successful match at the very last '*/`

from flex.

zmajeed avatar zmajeed commented on August 21, 2024

I feel this cannot be done with Flex regex - it seems to require some lookahead to avoid prematurely consuming the star from a comment end delimiter */

I'm actually surprised that every single basic regex solution I found online is wrong! Some of these posts are decades old

The Flex doc also has a FAQ on matching C comments - there it only has couple example patterns that are clearly labelled wrong and doesn't purport to offer a working regex - that's probably what needs to be done for this section too

The trailing context "*"/[^/] you tried can't be used inside group parentheses - that's why you got the error - I've always avoided using it for this and other limitations

from flex.

ogdenpm avatar ogdenpm commented on August 21, 2024

from flex.

ogdenpm avatar ogdenpm commented on August 21, 2024

from flex.

ogdenpm avatar ogdenpm commented on August 21, 2024

from flex.

ogdenpm avatar ogdenpm commented on August 21, 2024

from flex.

zmajeed avatar zmajeed commented on August 21, 2024

Yep - this works - now the middle part of the regex “*”+[^*/] only works for runs of stars that don't end at an end delimiter - this means an end delimiter can only be matched by the last part of the regex - and matching won't continue past the first end delimiter

Can test with grep

grep -Po '(?x: \/\* ([^*] | \*+[^*/])* \*+\/)' <<EOF
/* text ** more text */
/* some text **/
/* comment 1 *//* comment 2 */
/* comment 3 */    /* comment 4 */
/* comment 5 */
/* comment 6 */ invalid comment missing start after comment 6 */
/* comment 7 **/ invalid comment missing start after comment 7 */
/* comment 8 // **/ invalid comment missing start after comment 8 */
EOF
/* text ** more text */
/* some text **/
/* comment 1 */
/* comment 2 */
/* comment 3 */
/* comment 4 */
/* comment 5 */
/* comment 6 */
/* comment 7 **/
/* comment 8 // **/

and multiline comments

grep -z -Po '(?x: \/\* ([^*] | \*+[^*/])* \*+\/)' <<EOF
/* multiline
comment 9 */
EOF
/* multiline
comment 9 */

Incidentally the dotall flag (?s:) is not needed since there's no dot in the regex

from flex.

zmajeed avatar zmajeed commented on August 21, 2024

The C++ comment regex doesn't account for newline escapes in the comment body either - it fails for

grep -z -Po '(?x: \/ (\\ \n)* \/ [^\n]* )' <<EOF
not comment before /\\
/ multiline split delimiter \\
> comment 1
> not comment after
> EOF
/\
/ multiline split delimiter \

Adding another match for escaped newlines after comment start fixes it

grep -z -Po '(?x: \/ (\\ \n)* \/ ( (\\ \n) | [^\n] )* )' <<EOF
not comment before /\\
/ multiline split delimiter \\
comment 1
not comment after
EOF
/\
/ multiline split delimiter \
comment 1

from flex.

zmajeed avatar zmajeed commented on August 21, 2024

So the full correct regex for C and C++ comments is

("/*"([^*]|"*"[^*/])*"*"+"/")|("/"(\\\n)*"/"((\\\n)|[^\n])*)

or

(?x: ( "/*" ( [^*] | "*"+ [^*/] )* "*"+ "/" ) | ( "/" (\\ \n)* "/" ( (\\ \n) | [^\n] )* ) )

from flex.

zmajeed avatar zmajeed commented on August 21, 2024

Fix in PR 614, #614

from flex.

westes avatar westes commented on August 21, 2024

fixed by #614

from flex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.