Giter VIP home page Giter VIP logo

crumb's People

Contributors

liam-ilan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

crumb's Issues

Accidentally quadratic lexer (`parseString`)

The following code has O(n^2) complexity (n is the length of in), accidentally quadratic:

for (int i = 0; i < strlen(in); i++) {

Compile the interpreter with no optimizations and try to execute the output of the following Python program:

print("(print\"" + "a" * int(1e6) + "\")")

It will take about 10^12 operations, seconds at the very least. -O2 is typically smart enough to optimize reading strlen() on each iteration, but I would not rely on that.

A bunch of undefined behavior in the lexer

The following snippet can access array out of bounds:

crumb/src/lex.c

Lines 53 to 66 in 5ff70a2

} else if (c == '"') {
// record first char in string
int stringStart = i + 1;
// go to first char after quotes
i++;
// count to last char in string (last quote)
while (code[i] != '"' || (code[i] == '"' && code[i - 1] == '\\')) {
i++;
// error handling
if (code[i] == '\n') {

Consider a string that terminates with ". The program enters the first if statement, increases i, now code[i] is \0. Then it enters the while loop, increases i one more time, now code[i] is out-of-bounds. It does not cause crash currently for me, though. You can also see a hint by passing -fsanitize=address to gcc/clang.

Another issue is with isdigit function that expects a non-negative number, but you give it a char that can be (and typically) is signed:

crumb/src/lex.c

Lines 53 to 66 in 5ff70a2

} else if (c == '"') {
// record first char in string
int stringStart = i + 1;
// go to first char after quotes
i++;
// count to last char in string (last quote)
while (code[i] != '"' || (code[i] == '"' && code[i - 1] == '\\')) {
i++;
// error handling
if (code[i] == '\n') {
. That is undefined behavior as well. I've seen such code crash on some systems in the past because isdigit was accessing an array under the hood, and negative index was very out-of-bounds.

Make project open source?

Currently the project has no open source license, and neither do any contributions in the pull requests. Will this project be open sourced?

No grammar for string literals; impossible to end a string with backslash

It seems to me that the lexer is ad-hoc with no specific grammar for what it's trying to lex:

crumb/src/lex.c

Lines 53 to 85 in 5ff70a2

} else if (c == '"') {
// record first char in string
int stringStart = i + 1;
// go to first char after quotes
i++;
// count to last char in string (last quote)
while (code[i] != '"' || (code[i] == '"' && code[i - 1] == '\\')) {
i++;
// error handling
if (code[i] == '\n') {
printf("Syntax Error @ Line %i: Unexpected new line before string closed.\n", lineNumber);
exit(0);
}
if (code[i] == '\0') {
printf("Syntax Error @ Line %i: Unexpected end of file before string closed.\n", lineNumber);
exit(0);
}
}
// get substring and add token
char *val = malloc(i - stringStart + 1);
strncpy(val, &code[stringStart], i - stringStart);
val[i - stringStart] = '\0';
Token_push(p_headToken, parseString(val), TOK_STRING, lineNumber);
free(val);
tokenCount++;

In particular, it seems to assume that \" always terminates the string literal. It's not the standard behavior for other programming languages; one can escape the backslash with another backslash, e.g. "\\".

This project is fantastic and I appreciate you for putting this on here

Just as the title says

I only started learning C early this year (late Jan early Feb) and reading through this code I'm impressed by how much I understand obviously from the fact that you wrote such concise and easy to understand code.

I've picked up on a few memory handling tricks along the way that I didn't think of and will actually be rewriting a simple interpreter-ish project I did a few weeks back (shameless plug here) based on some of the practices I've picked up in this codebase

Thank you for sharing this project and writing neat concise code like this. I'm not done reading through everything but it's like a very good book you can't put down. I'm taking notes as I read in case I forget something

Anyway keep being awesome Liam ๐Ÿ‘๐Ÿพ

Read past buffer end in lexer

There seems to be a read-past-buffer issue in the lexer. Let's say that you have a file that ends with /, then here you peek behind the end of the buffer. This pattern is used several times. I did not look closer into the potential effects of this pattern but it might lead to crashes if by chance the lexer continues to find valid patterns, I assume.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.