Giter VIP home page Giter VIP logo

trojan-source's Introduction

Hi there 👋   I'm Nicholas Boucher

I'm a PhD candidate at the University of Cambridge. I also work for Microsoft.

About me

Security is my primary interest 🔑, although machine learning is also pretty cool 🧠. You can find my academic site here, and my personal site here.

I've pinned some of my favorite public projects below - enjoy!

trojan-source's People

Contributors

adamchainz avatar kaminyou avatar nickboucher avatar xhdix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trojan-source's Issues

Variations on Stretched String

This is a follow-up to my posting on #8, but creating a different issue as it is more focused on the variations on the stretched string than the early return or comment out.

You were correct that I didn't mention stretched string in #8 because I assumed it would work in Ruby as well. Stretched string doesn't really interest me that much though because as you noted in your paper:

there are other, perhaps simpler, ways that an adversary can cause a string comparison to fail without visual effect....such as the Zero Width Space

What might be nice if if we could use Bidi to not only cause a conditional to fail (just like ZWSP) but to cause the condition to pass. I don't think it is possible with a string but we could stretch other things in the language to achieve this. Your paper really only mentions comments and strings as allowing Unicode but depending on the grammer other tokens can have Unicode characters.

Below are three other types of stretching that allow us to evaluate a conditional to true instead of failing the conditional. I am using Ruby because I know it well, but I am guessing some of these could be applicable to other languages. If they are of interest perhaps we can improve the examples, see how they are applicable to other languages and include them with your examples. With all of these examples the syntax highlighting somewhat gives the issue away but perhaps in a big block of other code it wouldn't be noticed.

Stretched Regex

The most obvious alternative to a stretched string literal is a stretched regular expression. I work on a real-world application where the roles are stored as a comma-separated string for historical reasons. So an admin will have:

user.roles = 'admin,manager,user'

while a regular user might have just:

user.roles = 'user'

I could conceive of a method to see if the user is an admin defined as:

def admin?
  roles =~ /admin/
end

Using that as my example scenario consider the below impl:

class User
  attr_accessor :roles

  def admin?
    @roles =~ /admin⁧⁦|user/ #⁩⁦/ # Restrict from ⁩⁩
  end
end

user = User.new
user.roles = 'user,manager'

if user.admin?
  puts 'admin!'
else
  puts 'regular user :('
end

The comment seems a bit odd with the extra |, / and # characters. But none of that should matter since everything after the # should be ignored. If you run the above you would expect it to output regular user :( but instead it outputs admin!.

With more effort we might be able to reduce the extra characters in the comment. Also in Ruby you can choose your regex deliminators if you want. These are all equivalent:

  • /admin/
  • %r[admin]
  • %r!admin!

The ability to chose your deliminator might help you choose a character to appear in the comment that is more believable.

Stretched List

Another things we can stretch is a list of strings. In Ruby that is defined as:

%w[one two three four five six]

This is just a syntactical upgrade to:

['one', 'two', 'three', 'four', 'five', 'six']

As with regex we can choose our deliminator so this is also the same:

%w!one two three four five six!

Now we can inject into our list with Bidi:

role = 'User'
privileged = %w!Admin Manager⁧⁦ User! # ⁩⁦! # Don't include ⁩⁩
if privileged.include? role
  puts 'admin!'
else
  puts 'regular user :('
end

Here I am using that feature to choose my deliminator and picking ! to make the comment more believable. ! is not normally used so I could have also just made my comment say # Don't include User] # and someone might think it was just an extra character.

Stretched Identifiers

My final variation is to stretch a identifier. In Ruby a identifier can be made of unicode characters. For example:

😡 = 'Some error message'
STDERR.puts 😡

So lets put some of our Bidi control characters in our variable name:

role⁧⁦= 'Admin' #⁩⁦ # Condition will ensure 'User' !⁩⁦ = 'User'⁩⁩
if role⁧⁦ == 'Admin'
  puts 'admin!'
else
  puts 'regular user :('
end

There might be other things you can do with this besides assignment.

Doubt regarding early-return.py example

In the early-return.py example, I had thought

then <RLI> ''' ;return

would end up being rendered as

then nruter; '''

where every character from the end of line (there's an implicit <PDI> at end of line, right? Or doesn't this end of line not count as an end of paragraph?) to the <RLI> are displayed one by one.

But I suppose that's not how it works since it gets displayed as

then return; '''

Could someone help me understand this?

That's poor editor attack vector not compiler/code/interpreter

image

That's how IntelliJ PyCharm displays the code. Not even an issue with the right code editor. Vim users should just switch back to CP437 so that codepage does not interfere with the low-speed (300-1200 baud) terminals Vi(m) was developed for and the issue goes away.

invisible-function.c and homoglyph-function.c cant be build with gcc <= 9.1, but successful with gcc 11.2

gcc-9.1:

$ gcc-9.1 -v
Using built-in specs.
COLLECT_GCC=gcc-9.1
COLLECT_LTO_WRAPPER=/usr/local/gcc-9.1/libexec/gcc/x86_64-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../gcc-9.1.0/configure -v --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --prefix=/usr/local/gcc-9.1 --enable-checking=release --enable-languages=c,c++ --disable-multilib --program-suffix=-9.1
Thread model: posix
gcc version 9.1.0 (GCC)
$ gcc-9.1 invisible-function.c -o invisible-function
invisible-function.c:8:8: error: stray ‘\342’ in program
    8 | bool is��Admin() {
      |        ^
invisible-function.c:8:9: error: stray ‘\200’ in program
    8 | bool is��Admin() {
      |         ^
invisible-function.c:8:10: error: stray ‘\213’ in program
    8 | bool is�Admin() {
      |          ^
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
    8 | bool is​Admin() {
      |           ^~~~~
invisible-function.c: In function ‘main’:
invisible-function.c:13:11: error: stray ‘\342’ in program
   13 |     if (is��Admin()) {
      |           ^
invisible-function.c:13:12: error: stray ‘\200’ in program
   13 |     if (is��Admin()) {
      |            ^
invisible-function.c:13:13: error: stray ‘\213’ in program
   13 |     if (is�Admin()) {
      |             ^
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
   13 |     if (is​Admin()) {
      |         ^~
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:11: error: expected ‘)’ before ‘Admin’
   13 |     if (is��Admin()) {
      |        ~  ^  ~~~~~
      |           )
$ gcc-9.1 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:9: error: stray ‘\320’ in program
    7 | void say�ello() {
      |         ^
homoglyph-function.c:7:10: error: stray ‘\235’ in program
    7 | void say�ello() {
      |          ^
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
    7 | void sayНello() {
      |           ^~~~
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:8: error: stray ‘\320’ in program
   12 |     say�ello();
      |        ^
homoglyph-function.c:12:9: error: stray ‘\235’ in program
   12 |     say�ello();
      |         ^
homoglyph-function.c:12:5: error: unknown type name ‘say’
   12 |     sayНello();
      |     ^~~

gcc-6.3:

$ gcc-6 -v
Using built-in specs.
COLLECT_GCC=gcc-6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
$ gcc-6 invisible-function.c -o invisible-function
invisible-function.c:8:8: error: stray ‘\342’ in program
 bool is��Admin() {
        ^
invisible-function.c:8:9: error: stray ‘\200’ in program
 bool is��Admin() {
         ^
invisible-function.c:8:10: error: stray ‘\213’ in program
 bool is�Admin() {
          ^
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
 bool is​Admin() {
           ^~~~~
invisible-function.c: In function ‘main’:
invisible-function.c:13:11: error: stray ‘\342’ in program
     if (is��Admin()) {
           ^
invisible-function.c:13:12: error: stray ‘\200’ in program
     if (is��Admin()) {
            ^
invisible-function.c:13:13: error: stray ‘\213’ in program
     if (is�Admin()) {
             ^
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
     if (is​Admin()) {
         ^~
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:14: error: expected ‘)’ before ‘Admin’
     if (is​Admin()) {
              ^~~~~
$ gcc-6 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:9: error: stray ‘\320’ in program
 void say�ello() {
         ^
homoglyph-function.c:7:10: error: stray ‘\235’ in program
 void say�ello() {
          ^
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
 void sayНello() {
           ^~~~
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:8: error: stray ‘\320’ in program
     say�ello();
        ^
homoglyph-function.c:12:9: error: stray ‘\235’ in program
     say�ello();
         ^
homoglyph-function.c:12:5: error: unknown type name ‘say’
     sayНello();
     ^~~

gcc-5.3.1:

$ gcc-5 -v
Using built-in specs.
COLLECT_GCC=gcc-5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.3.1-14ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2)
$ gcc-5 invisible-function.c -o invisible-function
invisible-function.c:8:1: error: stray ‘\342’ in program
 bool is​Admin() {
 ^
invisible-function.c:8:1: error: stray ‘\200’ in program
invisible-function.c:8:1: error: stray ‘\213’ in program
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
 bool is​Admin() {
           ^
invisible-function.c: In function ‘main’:
invisible-function.c:13:5: error: stray ‘\342’ in program
     if (is​Admin()) {
     ^
invisible-function.c:13:5: error: stray ‘\200’ in program
invisible-function.c:13:5: error: stray ‘\213’ in program
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
     if (is​Admin()) {
         ^
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:14: error: expected ‘)’ before ‘Admin’
     if (is​Admin()) {
              ^
$ gcc-5 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:1: error: stray ‘\320’ in program
 void sayНello() {
 ^
homoglyph-function.c:7:1: error: stray ‘\235’ in program
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
 void sayНello() {
           ^
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:5: error: stray ‘\320’ in program
     sayНello();
     ^
homoglyph-function.c:12:5: error: stray ‘\235’ in program
homoglyph-function.c:12:5: error: unknown type name ‘say’

gcc-4.9.2:

$ gcc-4.9 -v
Using built-in specs.
COLLECT_GCC=gcc-4.9
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.2-10+deb8u2' --with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.9 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10+deb8u2) 
$ gcc-4.9 invisible-function.c -o invisible-function
invisible-function.c:8:1: error: stray ‘\342’ in program
 bool is​Admin() {
 ^
invisible-function.c:8:1: error: stray ‘\200’ in program
invisible-function.c:8:1: error: stray ‘\213’ in program
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
 bool is​Admin() {
           ^
invisible-function.c: In function ‘main’:
invisible-function.c:13:5: error: stray ‘\342’ in program
     if (is​Admin()) {
     ^
invisible-function.c:13:5: error: stray ‘\200’ in program
invisible-function.c:13:5: error: stray ‘\213’ in program
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
     if (is​Admin()) {
         ^
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:14: error: expected ‘)’ before ‘Admin’
     if (is​Admin()) {
              ^
$ gcc-4.9 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:1: error: stray ‘\320’ in program
 void sayНello() {
 ^
homoglyph-function.c:7:1: error: stray ‘\235’ in program
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
 void sayНello() {
           ^
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:5: error: stray ‘\320’ in program
     sayНello();
     ^
homoglyph-function.c:12:5: error: stray ‘\235’ in program
homoglyph-function.c:12:5: error: unknown type name ‘say’

gcc-11.2: successful

$ gcc-11.2 -v
Using built-in specs.
COLLECT_GCC=gcc-11.2
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-linux-gnu/11.2.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../configure -v --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --enable-checking=release --enable-languages=c,c++ --disable-multilib --program-suffix=-11.2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC)
$ gcc-11.2 invisible-function.c -o invisible-function
$ gcc-11.2 homoglyph-function.c -o homoglyph-function
$ ./invisible-function 
You are an admin
$ ./homoglyph-function 
Goodbye, World!

Could we help developers detect and prevent issues of trojan source in their projects?

Hi folks,

I wanted to begin by saying thanks for sharing your research insights and raising awareness of potential threats to open source supply chain security.

Given the fact that many folks might be drawn here to the repository (or the website) and would be seeking a way to mitigate these issues, I was wondering if you'd want to update the README here and include a list of tools that help find and fix these issues?

During the weekend, I built a package to detect bidi chars in text anti-trojan-source which is just handy for Node.js / npm folks to run instead of say a regex with grep, but more importantly, the eslint-plugin-anti-trojan-source package which proactivly help developers prevent the cases of malicious commits when they happen.

Provide an example of propper use for reference

It would be interesting to see how a simple, non malicious comment is seen by today's warnings.

I would recomend a having at least

  • a copyright notice
  • and a function comment ala // ask Joe why this works
  • A multi paragraph comment explaining a concept.

Python3 returns syntax error:

  • I run the invisible-function.py in python3.7, but I get the following syntax error:

$ python3.7 invisible-function.py
File "invisible-function.py", line 7
def is_​admin():
^
SyntaxError: invalid character in identifier

$ python3.7 --version
Python 3.7.9

Perhaps should note 'invisible-function.py' does not work on 3.7 MacOs.

  • commenting-out.py does work in MacOs with Python 3.7 and print "You are an admin.". However, when looking at the code in Atom IDE, it does show the part of the code is greyed out although its not a clear indicator of a problem.

comment-out-atom-ide

Early return and comment out in languages without closing comment token?

Do the early return and comment out technique depend to the language to be able to close the comment? I noticed all the examples use those type of languages. Take Ruby for example. The only types of comments allowed in Ruby are ones that comment all the way until the end of the newline:

puts 'hello world' # here is a comment

Or for multi-line comments you can use this syntax but the =begin and =end must be at the start of a line so using =end in the middle of a line would not work:

puts 'hello world'
=begin
a multi
line comment
=end

There are a number of other languages out there that don't support /* comments with a closing token */ and I'm wondering if all those languages would be safe from the early return or comment out attack?

Tools to detect possible attacks

Are there any tools to examine the source code and point out parts where attacks are possible?

For different languages?

Maybe there are linting tools which can catch them as well?

Or tools with under-development features for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.