aliceaml / tangojuice Goto Github PK
View Code? Open in Web Editor NEWVocabulary extraction project
Home Page: https://tangojuice.herokuapp.com/
Vocabulary extraction project
Home Page: https://tangojuice.herokuapp.com/
Running the app with the text
Das Schloss Neuschwanstein steht oberhalb von Hohenschwangau bei Füssen im südöstlichen bayerischen Allgäu. Der Bau wurde ab 1869 für den bayerischen König Ludwig II. als idealisierte Vorstellung einer Ritterburg aus der Zeit des Mittelalters errichtet. Die Entwürfe stammen von Christian Jank, die Ausführung übernahmen Eduard Riedel und Georg von Dollmann. Der König lebte nur wenige Monate im Schloss, er starb noch vor der Fertigstellung der Anlage.
\s|(([ac]+c?)*)?ca+b|acbcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\s
source lang german and “Remove proper nouns” unchecked will hang the app for a very long time (and make it unresponsive to SIGINT). Longer times can be achieved with more a
s.
Explanation:
In vocab.py
, at line 179, you run a regex replacement with a regex that is constructed from partially unsanitized user input (a word form extracted from the input text).
replacement = r"<b>" + form + r"</b>"
regex = r"\b" + form + r"\b"
html_example = re.sub(regex, replacement, word.html_example)
This means that a malicious user can inject active characters in the regex and with a correctly crafted input, make the regex engine run in catastrophic backtracking. Being unresponsive to SIGINT is I suspect caused by the backtracking happening in C code that doesn't check for signals (which makes regex
faster but in that case plays against you).
Now it isn't easy to run into this because
re
.
(
).Nevertheless, this is a security vulnerability. My suggestions are
re.escape
is_not_alpha
could have given you some protection, but it is true as soon as there is at least one alphanumeric character in its input. Tightening that condition could have been a protection, although not an ideal one.2022-01-01T09:57:45.126577+00:00 heroku[web.1]: Process running mem=617M(120.7%)
2022-01-01T09:57:45.136238+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
To reproduce error: do several Anki extractions in a row... (it doesn't always happen!)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.