reinderien / mimic Goto Github PK

View Code? Open in Web Editor NEW

3.7K 3.7K 102.0 52 KB

[ab]using Unicode to create tragedy

License: MIT License

Python 100.00%

mimic's People

Contributors

Stargazers

Watchers

Forkers

gauntletwizard jim-thisplace ueg1990 silky p71 relet noscripter sa7mon paran0ids0ul azzhag wsadowski loucash lxsndl rjw57 dushmis lambder danbyrne84 argoneus proximer fredley ygra jonmajorc calebmadrigal bradparks sk4ld huslage johnjohnsp1 barak mbabuskov camsaul maygoe curtiszimmerman wallacesilva smarthi gdxn pkev lexab meteoritt yulubis tmr232 mrsupervitaliy reelsense curtisz duckhan endika m4tlch rukki majingrun benoitcompere dewanee jobava t2sc0m boramko kevwil luiseduardohdbackup ynx0 kevinmel2000 grdaneault katamaritaco owenhsu crewe sandboxorg 1600 zachlungu windperson shekkbuilder matool13 jaredburck danielsalazfer olivierh59500 universal-it-systems kl3vis vanaxe lingulist ttelford ratsiry m-2k bussiere pavelshun abhi-jha havysec ykankaya edmundchang vayct fionafibration axiom215 joshuariveramnltech vadimostanin 5l1v3r1 iveskins chaos-monkey-island jessicamulein-forks porcus danbadds38 ztwtyl07441 fdgr21z1124421xx sadsa221 akaneiroo pierdoon

mimic's Issues

Misuse of cat suggested in wiki

When giving file input to a *nix command, you can use command < file rather than cat file | command
.

mimic will fail to break code containing unicode characters

(…) qui in gladio occiderit, oportet eum gladio occidi.
// The Apocalypse of John, 13, 10

mimic will fail to "break" code containing unicode characters.

# coding: utf-8


print 'aaa'
print 'ąęść'
print 'bbb'

$ cat ~/Desktop/bad.py | ./mimic 
# coding: utf-8


print 'aaa'
print 'Traceback (most recent call last):
  File "./mimic", line 273, in <module>
    main()
  File "./mimic", line 267, in main
    pipe(options.chance)
  File "./mimic", line 242, in pipe
    out.write(c)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

Context-aware replacement algorithms that preserve syntax colouring

Consider the following example:

def add_prefix(s):
    return "prefix" + s #comment

Running it through mimic would create code that looks totally different under syntax colouring:

dеf add_prefix(ѕ):
    rеturn ＂prefix" + s #comment

Another similar example is given in the current README.

A context-aware algorithm would:

preserve keywords (def and return in this case)
preserve quotation marks
preserve parentheses
preserve symbols in contexts, in which they cannot be replaced by a homograph

so that most editors would display the code with identical colours.

In the example above, the only tokens eligible for mangling would be identifiers (add_prefix and s), comment contents and contents of the string constant.

-h percent is incorrect

due to the original character being included in the target mutations.

Several chars don't render in Linux

Original dev machine was OSX. Going to filter out those chars that don't render in Ubuntu.

Color output?

Highlighting the "wrong" characters (those like <о:U+043E>) in output to stdout would be great, what do you think?

Upon install, computer turns off.

When installing with pip, the lights in my house flickered, and I heard a faint cackling. The smell of rotten eggs slowly grew until it was all I could focus on. Then the lights shut off, and I felt heavy breathing on the back of my neck. At some point, I lost consciousness. I woke up the next morning with a decapitated goat head on my desk, and my hard disk had gained bad sectors. Recommended fix?

Add a mode which attempts to reverse the mapping.

From obfuscated back into ASCII space. This would be really helpful :)

Publish to Pip

Would be nice to pip install mimic..

Add --check

Add a feature that pipes through some input, and points out any suspicious characters in the output.

Windows command line encodings fail

Windows redirection to a file breaks because of some lack of support for unicode or utf-8 or something

Traceback (most recent call last):
  File "C:\Program Files\Python37\Scripts\mimic-script.py", line 11, in <module>
    load_entry_point('mimic==0.0.1', 'console_scripts', 'mimic')()
  File "C:\Program Files\Python37\lib\site-packages\mimic-0.0.1-py3.7.egg\mimic\__init__.py", line 443, in main
  File "C:\Program Files\Python37\lib\site-packages\mimic-0.0.1-py3.7.egg\mimic\__init__.py", line 311, in pipe_mimic
  File "C:\Program Files\Python37\lib\site-packages\mimic-0.0.1-py3.7.egg\mimic\__init__.py", line 285, in pipe
  File "C:\Program Files\Python37\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2c9f' in position 0: character maps to <undefined>

The error message, is, of course, slightly different every time, because of different homoglyphs, but in general, it consistently fails on windows.

Readme needs updating for install/execution

./mimic

should probably turn into

python -m mimic

before install. Also, install is shown straight from the repo but not a local dir.

Repo should be renamed to `ⅿⅰmіс` (instead of `mimic`)

Someone please explain to me in stupid how to use this tool.

I cannot for the life of me figure out how to use mimic with 2 files one input one output.
I have tried every command I can think of idk if I installed it wrong or what. But im trying to run it from command prompt.
And I can sort of get it to work by running it using mimic -m 100 but I want to know how to take a input file and output it to another. Can someone, ANYONE, Explain to me in stupid how to work this program from hell. Before I actually loose my mind.

"enhancement": add a mode whose output will still work.

For instance, limit modifications to variable names and make sure all instances of mimicked variables get mimicked in the same way. With such a mode, mimicked code could get committed and pass testing without incident. Then perhaps a year later, somebody tries to add another instance of a mimicked variable and all hell breaks loose.

Reintroduce debatable characters for reverse only

Many chars have been removed because they didn't render well. Reintroduce them to a separate index for reverse only.

Cannot decode non-ASCII cmdline options

Traceback (most recent call last):
  File "./mimic", line 287, in <module>
    main()
  File "./mimic", line 277, in main
    explain(unicode(options.char[0], 'utf-8'))
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcf in position 0: unexpected end of data

Add in non-printing characters

Non-Printing Unicode Characters are fun!

Reasons why you should add them:

1. Some of them are only detectable by viewing the code in an improper encoding format or, by viewing the hexadecimal data.
2. It adds a good hours worth of troubleshooting before someone realizes that there is an invisible character in there code.
3. Because why not?

List of non-printing Unicode characters:

U+200B ZERO WIDTH SPACE
U+200C ZERO WIDTH NON-JOINER
U+200D ZERO WIDTH JOINER
U+200E LEFT-TO-RIGHT MARK
U+202A LEFT-TO-RIGHT EMBEDDING
U+202C POP DIRECTIONAL FORMATTING
U+202D LEFT-TO-RIGHT OVERRIDE
U+2062 INVISIBLE TIMES
U+2063 INVISIBLE SEPARATOR
U+FEFF ZERO WIDTH NO-BREAK SPACE

Thank you and have a nice day!

Mode to mimic strings only

To move most of the fun to runtime rather than compile or IDE issues, offer a mode to only mimic strings.

Changing the base image for Docker may be worthwhile

Changing the base image for Docker may be worthwhile. A summary of the reasons why one may want to do that can be found here.

In my particular case, I may want to use the check functionality as part of automated tests, and a lighter image would result in faster tests across the board.

I'm submitting a pull request along this issue, and the size gain is significant.

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
VanAxe/mimic            not-alpine               4796ededa128        2 minutes ago       699 MB
VanAxe/mimic            alpine              3b7247c6515d        12 minutes ago      97.8 MB
python              3                   3984f3aafbc9        13 days ago         690 MB
python              3-alpine            b30df2468c80        8 weeks ago         88.6 MB

Steganography mode

Some characters do not display in Windows 7

Homoglyphs from repo:

["   ", "!！ǃⵑ︕﹗", ""＂", "#＃﹟", "$＄﹩", "%％٪⁒﹪", "&＆﹠", "'＇ʹʹ", "(（﹙", ")）﹚", "*＊⋆﹡", "+＋᛭﹢", ",，ˏᛧ‚", "-－˗−⎼╴﹣", ".．․", "/／᜵⁄∕⧸", "2ᒿ", "3Ʒℨ", "4Ꮞ", "6Ꮾ", "9Ꮽ", ":：ː˸։፡᛬⁚∶⠆︓﹕", ";；;︔﹔", "<＜˂‹≺❮ⵦ﹤", "=＝═⚌﹦", ">＞˃›≻❯﹥", "?？︖﹖", "@＠﹫", "AΑАᎪ", "BΒВᏴᗷⲂ", "CϹСᏟⅭⲤ", "DᎠᗪⅮ", "EΕЕᎬ", "Fᖴ", "GԌᏀ", "HΗНዘᎻᕼⲎ", "IΙІⅠ", "JЈᎫᒍ", "KΚᏦᛕKⲔ", "LᏞᒪⅬ", "MΜϺМᎷⅯ", "NΝⲚ", "OΟОⲞ", "PΡРᏢⲢ", "QԚⵕ", "RᎡᏒᖇ", "SЅᏚ", "TΤТᎢ", "VᏙⅤ", "WᎳᏔ", "XΧХⅩⲬ", "YΥⲨ", "ZΖᏃ", "[［", "\＼∖⧵⧹﹨", "]］", "^＾˄ˆᶺ⌃", "_＿ˍ⚊", "`｀ˋ`‵", "aɑа", "cϲсⅽ", "dԁⅾ", "eеᥱ", "gɡ", "hһ", "iіⅰ", "jϳј", "lⅼ", "mⅿ", "nᥒ", "oοоഠⲟ", "pрⲣ", "sѕ", "uᥙ∪", "vᴠⅴ∨⋁", "wᴡ", "xхⅹⲭ", "yуỿ", "zᴢ", "{｛﹛", "|｜ǀᛁ⎜⎟⎢⎥⎪⎮￨", "}｝﹜", "~～˜⁓∼"]

That's what I see in the Sublime Text:

Python 3 support

As you may know, Python 3 is gaining more and more popularity. It would be relatively simple to change the incompatible parts so they are compatible with either version.

`--check` should exit with code `> 0` when homoglyphs are found

Currently, mimic only converts the homoglyphs to explicit utf codepoints when running mimic --ϲheck. To allow automated testing, this behavior should change to allow programmatic detection of homoglyphs.

Having an exit code greater than 0 would allow programmatic tests.

UnicodeDecodeError on install through pip

Not sure if it is a feature or if I am using Mimic completely correctly, but pip install (this repo) triggers the error that the package is expected to test.

$ sudo pip install https://github.com/reinderien/mimic.git
[sudo] password for root: 
Downloading/unpacking https://github.com/reinderien/mimic.git
  Downloading mimic.git (unknown size): 10kB downloaded
Cleaning up...
Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/usr/local/lib/python2.7/dist-packages/pip/commands/install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/usr/local/lib/python2.7/dist-packages/pip/req.py", line 1197, in prepare_files
    do_download,
  File "/usr/local/lib/python2.7/dist-packages/pip/req.py", line 1375, in unpack_url
    self.session,
  File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 582, in unpack_http_url
    unpack_file(temp_location, location, content_type, link)
  File "/usr/local/lib/python2.7/dist-packages/pip/util.py", line 627, in unpack_file
    and is_svn_page(file_contents(filename))):
  File "/usr/local/lib/python2.7/dist-packages/pip/util.py", line 210, in file_contents
    return fp.read().decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte

Atom Package

troll-stopper is for Atom editor

Docker image won't build (missing requirements.txt)

Attempting to build the project using its Dockerfile fails as the parent image python:2-onbuild requires a requirements.txt file which is not present (removed in #12).

$ git clone https://github.com/reinderien/mimic.git && cd mimic
$ docker build -t mimic .
Step 0 : FROM python:2-onbuild
# Executing 3 build triggers
Trigger 0, COPY requirements.txt /usr/src/app/
Step 0 : COPY requirements.txt /usr/src/app/
requirements.txt: no such file or directory

Once fixed, this would be great as an automated build on the Docker hub!

Mimic needs a license

Without a LICENSE file in the repo, or one declared in the setup.py, this awesome library falls under the default github license, summarized here: http://stackoverflow.com/a/13669816

This is an awesome library of great utility; this would make it legit for others to use it for serious applications.