reinderien / mimic Goto Github PK
View Code? Open in Web Editor NEW[ab]using Unicode to create tragedy
License: MIT License
[ab]using Unicode to create tragedy
License: MIT License
When giving file input to a *nix command, you can use command < file
rather than cat file | command
.
(…) qui in gladio occiderit, oportet eum gladio occidi.
// The Apocalypse of John, 13, 10
mimic will fail to "break" code containing unicode characters.
# coding: utf-8
print 'aaa'
print 'ąęść'
print 'bbb'
$ cat ~/Desktop/bad.py | ./mimic
# coding: utf-8
print 'aaa'
print 'Traceback (most recent call last):
File "./mimic", line 273, in <module>
main()
File "./mimic", line 267, in main
pipe(options.chance)
File "./mimic", line 242, in pipe
out.write(c)
File "/usr/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
Consider the following example:
def add_prefix(s):
return "prefix" + s #comment
Running it through mimic would create code that looks totally different under syntax colouring:
dеf add_prefix(ѕ):
rеturn "prefix" + s #comment
Another similar example is given in the current README.
A context-aware algorithm would:
def
and return
in this case)so that most editors would display the code with identical colours.
In the example above, the only tokens eligible for mangling would be identifiers (add_prefix
and s
), comment contents and contents of the string constant.
due to the original character being included in the target mutations.
Original dev machine was OSX. Going to filter out those chars that don't render in Ubuntu.
Highlighting the "wrong" characters (those like <о:U+043E>
) in output to stdout would be great, what do you think?
When installing with pip, the lights in my house flickered, and I heard a faint cackling. The smell of rotten eggs slowly grew until it was all I could focus on. Then the lights shut off, and I felt heavy breathing on the back of my neck. At some point, I lost consciousness. I woke up the next morning with a decapitated goat head on my desk, and my hard disk had gained bad sectors. Recommended fix?
From obfuscated back into ASCII space. This would be really helpful :)
Would be nice to pip install mimic..
Add a feature that pipes through some input, and points out any suspicious characters in the output.
Windows redirection to a file breaks because of some lack of support for unicode or utf-8 or something
Traceback (most recent call last):
File "C:\Program Files\Python37\Scripts\mimic-script.py", line 11, in <module>
load_entry_point('mimic==0.0.1', 'console_scripts', 'mimic')()
File "C:\Program Files\Python37\lib\site-packages\mimic-0.0.1-py3.7.egg\mimic\__init__.py", line 443, in main
File "C:\Program Files\Python37\lib\site-packages\mimic-0.0.1-py3.7.egg\mimic\__init__.py", line 311, in pipe_mimic
File "C:\Program Files\Python37\lib\site-packages\mimic-0.0.1-py3.7.egg\mimic\__init__.py", line 285, in pipe
File "C:\Program Files\Python37\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2c9f' in position 0: character maps to <undefined>
The error message, is, of course, slightly different every time, because of different homoglyphs, but in general, it consistently fails on windows.
./mimic
should probably turn into
python -m mimic
before install. Also, install is shown straight from the repo but not a local dir.
I cannot for the life of me figure out how to use mimic with 2 files one input one output.
I have tried every command I can think of idk if I installed it wrong or what. But im trying to run it from command prompt.
And I can sort of get it to work by running it using mimic -m 100
but I want to know how to take a input file and output it to another. Can someone, ANYONE, Explain to me in stupid how to work this program from hell. Before I actually loose my mind.
For instance, limit modifications to variable names and make sure all instances of mimicked variables get mimicked in the same way. With such a mode, mimicked code could get committed and pass testing without incident. Then perhaps a year later, somebody tries to add another instance of a mimicked variable and all hell breaks loose.
Many chars have been removed because they didn't render well. Reintroduce them to a separate index for reverse only.
Traceback (most recent call last):
File "./mimic", line 287, in <module>
main()
File "./mimic", line 277, in main
explain(unicode(options.char[0], 'utf-8'))
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcf in position 0: unexpected end of data
1. Some of them are only detectable by viewing the code in an improper encoding format or, by viewing the hexadecimal data.
2. It adds a good hours worth of troubleshooting before someone realizes that there is an invisible character in there code.
3. Because why not?
Thank you and have a nice day!
To move most of the fun to runtime rather than compile or IDE issues, offer a mode to only mimic strings.
Changing the base image for Docker may be worthwhile. A summary of the reasons why one may want to do that can be found here.
In my particular case, I may want to use the check functionality as part of automated tests, and a lighter image would result in faster tests across the board.
I'm submitting a pull request along this issue, and the size gain is significant.
REPOSITORY TAG IMAGE ID CREATED SIZE
VanAxe/mimic not-alpine 4796ededa128 2 minutes ago 699 MB
VanAxe/mimic alpine 3b7247c6515d 12 minutes ago 97.8 MB
python 3 3984f3aafbc9 13 days ago 690 MB
python 3-alpine b30df2468c80 8 weeks ago 88.6 MB
Homoglyphs from repo:
[" ", "!!ǃⵑ︕﹗", """", "##﹟", "$$﹩", "%%٪⁒﹪", "&&﹠", "''ʹʹ", "((﹙", "))﹚", "**⋆﹡", "++᛭﹢", ",,ˏᛧ‚", "--˗−⎼╴﹣", "..․", "//᜵⁄∕⧸", "2ᒿ", "3Ʒℨ", "4Ꮞ", "6Ꮾ", "9Ꮽ", "::ː˸։፡᛬⁚∶⠆︓﹕", ";;;︔﹔", "<<˂‹≺❮ⵦ﹤", "==═⚌﹦", ">>˃›≻❯﹥", "??︖﹖", "@@﹫", "AΑАᎪ", "BΒВᏴᗷⲂ", "CϹСᏟⅭⲤ", "DᎠᗪⅮ", "EΕЕᎬ", "Fᖴ", "GԌᏀ", "HΗНዘᎻᕼⲎ", "IΙІⅠ", "JЈᎫᒍ", "KΚᏦᛕKⲔ", "LᏞᒪⅬ", "MΜϺМᎷⅯ", "NΝⲚ", "OΟОⲞ", "PΡРᏢⲢ", "QԚⵕ", "RᎡᏒᖇ", "SЅᏚ", "TΤТᎢ", "VᏙⅤ", "WᎳᏔ", "XΧХⅩⲬ", "YΥⲨ", "ZΖᏃ", "[[", "\\∖⧵⧹﹨", "]]", "^^˄ˆᶺ⌃", "__ˍ⚊", "``ˋ`‵", "aɑа", "cϲсⅽ", "dԁⅾ", "eеᥱ", "gɡ", "hһ", "iіⅰ", "jϳј", "lⅼ", "mⅿ", "nᥒ", "oοоഠⲟ", "pрⲣ", "sѕ", "uᥙ∪", "vᴠⅴ∨⋁", "wᴡ", "xхⅹⲭ", "yуỿ", "zᴢ", "{{﹛", "||ǀᛁ⎜⎟⎢⎥⎪⎮│", "}}﹜", "~~˜⁓∼"]
As you may know, Python 3 is gaining more and more popularity. It would be relatively simple to change the incompatible parts so they are compatible with either version.
Currently, mimic
only converts the homoglyphs to explicit utf codepoints when running mimic --ϲheck
. To allow automated testing, this behavior should change to allow programmatic detection of homoglyphs.
Having an exit code greater than 0
would allow programmatic tests.
Not sure if it is a feature or if I am using Mimic completely correctly, but pip install (this repo)
triggers the error that the package is expected to test.
$ sudo pip install https://github.com/reinderien/mimic.git
[sudo] password for root:
Downloading/unpacking https://github.com/reinderien/mimic.git
Downloading mimic.git (unknown size): 10kB downloaded
Cleaning up...
Exception:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/usr/local/lib/python2.7/dist-packages/pip/commands/install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/usr/local/lib/python2.7/dist-packages/pip/req.py", line 1197, in prepare_files
do_download,
File "/usr/local/lib/python2.7/dist-packages/pip/req.py", line 1375, in unpack_url
self.session,
File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 582, in unpack_http_url
unpack_file(temp_location, location, content_type, link)
File "/usr/local/lib/python2.7/dist-packages/pip/util.py", line 627, in unpack_file
and is_svn_page(file_contents(filename))):
File "/usr/local/lib/python2.7/dist-packages/pip/util.py", line 210, in file_contents
return fp.read().decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte
troll-stopper is for Atom editor
Attempting to build the project using its Dockerfile
fails as the parent image python:2-onbuild
requires a requirements.txt
file which is not present (removed in #12).
$ git clone https://github.com/reinderien/mimic.git && cd mimic
$ docker build -t mimic .
Step 0 : FROM python:2-onbuild
# Executing 3 build triggers
Trigger 0, COPY requirements.txt /usr/src/app/
Step 0 : COPY requirements.txt /usr/src/app/
requirements.txt: no such file or directory
Once fixed, this would be great as an automated build on the Docker hub!
Without a LICENSE file in the repo, or one declared in the setup.py, this awesome library falls under the default github license, summarized here: http://stackoverflow.com/a/13669816
This is an awesome library of great utility; this would make it legit for others to use it for serious applications.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.