nickboucher / trojan-source Goto Github PK
View Code? Open in Web Editor NEWTrojan Source: Invisible Vulnerabilities
Home Page: https://trojansource.codes
License: MIT License
Trojan Source: Invisible Vulnerabilities
Home Page: https://trojansource.codes
License: MIT License
ඞ
This is a follow-up to my posting on #8, but creating a different issue as it is more focused on the variations on the stretched string than the early return or comment out.
You were correct that I didn't mention stretched string in #8 because I assumed it would work in Ruby as well. Stretched string doesn't really interest me that much though because as you noted in your paper:
there are other, perhaps simpler, ways that an adversary can cause a string comparison to fail without visual effect....such as the Zero Width Space
What might be nice if if we could use Bidi to not only cause a conditional to fail (just like ZWSP) but to cause the condition to pass. I don't think it is possible with a string but we could stretch other things in the language to achieve this. Your paper really only mentions comments and strings as allowing Unicode but depending on the grammer other tokens can have Unicode characters.
Below are three other types of stretching that allow us to evaluate a conditional to true instead of failing the conditional. I am using Ruby because I know it well, but I am guessing some of these could be applicable to other languages. If they are of interest perhaps we can improve the examples, see how they are applicable to other languages and include them with your examples. With all of these examples the syntax highlighting somewhat gives the issue away but perhaps in a big block of other code it wouldn't be noticed.
The most obvious alternative to a stretched string literal is a stretched regular expression. I work on a real-world application where the roles are stored as a comma-separated string for historical reasons. So an admin will have:
user.roles = 'admin,manager,user'
while a regular user might have just:
user.roles = 'user'
I could conceive of a method to see if the user is an admin defined as:
def admin?
roles =~ /admin/
end
Using that as my example scenario consider the below impl:
class User
attr_accessor :roles
def admin?
@roles =~ /admin|user/ #/ # Restrict from
end
end
user = User.new
user.roles = 'user,manager'
if user.admin?
puts 'admin!'
else
puts 'regular user :('
end
The comment seems a bit odd with the extra |
, /
and #
characters. But none of that should matter since everything after the #
should be ignored. If you run the above you would expect it to output regular user :(
but instead it outputs admin!
.
With more effort we might be able to reduce the extra characters in the comment. Also in Ruby you can choose your regex deliminators if you want. These are all equivalent:
The ability to chose your deliminator might help you choose a character to appear in the comment that is more believable.
Another things we can stretch is a list of strings. In Ruby that is defined as:
%w[one two three four five six]
This is just a syntactical upgrade to:
['one', 'two', 'three', 'four', 'five', 'six']
As with regex we can choose our deliminator so this is also the same:
%w!one two three four five six!
Now we can inject into our list with Bidi:
role = 'User'
privileged = %w!Admin Manager User! # ! # Don't include
if privileged.include? role
puts 'admin!'
else
puts 'regular user :('
end
Here I am using that feature to choose my deliminator and picking !
to make the comment more believable. !
is not normally used so I could have also just made my comment say # Don't include User] #
and someone might think it was just an extra character.
My final variation is to stretch a identifier. In Ruby a identifier can be made of unicode characters. For example:
😡 = 'Some error message'
STDERR.puts 😡
So lets put some of our Bidi control characters in our variable name:
role= 'Admin' # # Condition will ensure 'User' ! = 'User'
if role == 'Admin'
puts 'admin!'
else
puts 'regular user :('
end
There might be other things you can do with this besides assignment.
Reported 5 years ago on the Go repository golang/go#20209
I see on your examples that you introduce homoglyphes functions as a threat, but I am not sure it is in the scope of the Trojan Source paper, right?
However, how could it be detected?
Hi folks,
I wanted to begin by saying thanks for sharing your research insights and raising awareness of potential threats to open source supply chain security.
Given the fact that many folks might be drawn here to the repository (or the website) and would be seeking a way to mitigate these issues, I was wondering if you'd want to update the README here and include a list of tools that help find and fix these issues?
During the weekend, I built a package to detect bidi chars in text anti-trojan-source which is just handy for Node.js / npm folks to run instead of say a regex with grep, but more importantly, the eslint-plugin-anti-trojan-source package which proactivly help developers prevent the cases of malicious commits when they happen.
gcc-9.1:
$ gcc-9.1 -v
Using built-in specs.
COLLECT_GCC=gcc-9.1
COLLECT_LTO_WRAPPER=/usr/local/gcc-9.1/libexec/gcc/x86_64-linux-gnu/9.1.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../gcc-9.1.0/configure -v --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --prefix=/usr/local/gcc-9.1 --enable-checking=release --enable-languages=c,c++ --disable-multilib --program-suffix=-9.1
Thread model: posix
gcc version 9.1.0 (GCC)
$ gcc-9.1 invisible-function.c -o invisible-function
invisible-function.c:8:8: error: stray ‘\342’ in program
8 | bool is��Admin() {
| ^
invisible-function.c:8:9: error: stray ‘\200’ in program
8 | bool is��Admin() {
| ^
invisible-function.c:8:10: error: stray ‘\213’ in program
8 | bool is�Admin() {
| ^
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
8 | bool isAdmin() {
| ^~~~~
invisible-function.c: In function ‘main’:
invisible-function.c:13:11: error: stray ‘\342’ in program
13 | if (is��Admin()) {
| ^
invisible-function.c:13:12: error: stray ‘\200’ in program
13 | if (is��Admin()) {
| ^
invisible-function.c:13:13: error: stray ‘\213’ in program
13 | if (is�Admin()) {
| ^
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
13 | if (isAdmin()) {
| ^~
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:11: error: expected ‘)’ before ‘Admin’
13 | if (is��Admin()) {
| ~ ^ ~~~~~
| )
$ gcc-9.1 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:9: error: stray ‘\320’ in program
7 | void say�ello() {
| ^
homoglyph-function.c:7:10: error: stray ‘\235’ in program
7 | void say�ello() {
| ^
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
7 | void sayНello() {
| ^~~~
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:8: error: stray ‘\320’ in program
12 | say�ello();
| ^
homoglyph-function.c:12:9: error: stray ‘\235’ in program
12 | say�ello();
| ^
homoglyph-function.c:12:5: error: unknown type name ‘say’
12 | sayНello();
| ^~~
gcc-6.3:
$ gcc-6 -v
Using built-in specs.
COLLECT_GCC=gcc-6
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
$ gcc-6 invisible-function.c -o invisible-function
invisible-function.c:8:8: error: stray ‘\342’ in program
bool is��Admin() {
^
invisible-function.c:8:9: error: stray ‘\200’ in program
bool is��Admin() {
^
invisible-function.c:8:10: error: stray ‘\213’ in program
bool is�Admin() {
^
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
bool isAdmin() {
^~~~~
invisible-function.c: In function ‘main’:
invisible-function.c:13:11: error: stray ‘\342’ in program
if (is��Admin()) {
^
invisible-function.c:13:12: error: stray ‘\200’ in program
if (is��Admin()) {
^
invisible-function.c:13:13: error: stray ‘\213’ in program
if (is�Admin()) {
^
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
if (isAdmin()) {
^~
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:14: error: expected ‘)’ before ‘Admin’
if (isAdmin()) {
^~~~~
$ gcc-6 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:9: error: stray ‘\320’ in program
void say�ello() {
^
homoglyph-function.c:7:10: error: stray ‘\235’ in program
void say�ello() {
^
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
void sayНello() {
^~~~
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:8: error: stray ‘\320’ in program
say�ello();
^
homoglyph-function.c:12:9: error: stray ‘\235’ in program
say�ello();
^
homoglyph-function.c:12:5: error: unknown type name ‘say’
sayНello();
^~~
gcc-5.3.1:
$ gcc-5 -v
Using built-in specs.
COLLECT_GCC=gcc-5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.3.1-14ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2)
$ gcc-5 invisible-function.c -o invisible-function
invisible-function.c:8:1: error: stray ‘\342’ in program
bool isAdmin() {
^
invisible-function.c:8:1: error: stray ‘\200’ in program
invisible-function.c:8:1: error: stray ‘\213’ in program
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
bool isAdmin() {
^
invisible-function.c: In function ‘main’:
invisible-function.c:13:5: error: stray ‘\342’ in program
if (isAdmin()) {
^
invisible-function.c:13:5: error: stray ‘\200’ in program
invisible-function.c:13:5: error: stray ‘\213’ in program
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
if (isAdmin()) {
^
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:14: error: expected ‘)’ before ‘Admin’
if (isAdmin()) {
^
$ gcc-5 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:1: error: stray ‘\320’ in program
void sayНello() {
^
homoglyph-function.c:7:1: error: stray ‘\235’ in program
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
void sayНello() {
^
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:5: error: stray ‘\320’ in program
sayНello();
^
homoglyph-function.c:12:5: error: stray ‘\235’ in program
homoglyph-function.c:12:5: error: unknown type name ‘say’
gcc-4.9.2:
$ gcc-4.9 -v
Using built-in specs.
COLLECT_GCC=gcc-4.9
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.9/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.9.2-10+deb8u2' --with-bugurl=file:///usr/share/doc/gcc-4.9/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.9 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.9 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.9.2 (Debian 4.9.2-10+deb8u2)
$ gcc-4.9 invisible-function.c -o invisible-function
invisible-function.c:8:1: error: stray ‘\342’ in program
bool isAdmin() {
^
invisible-function.c:8:1: error: stray ‘\200’ in program
invisible-function.c:8:1: error: stray ‘\213’ in program
invisible-function.c:8:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘Admin’
bool isAdmin() {
^
invisible-function.c: In function ‘main’:
invisible-function.c:13:5: error: stray ‘\342’ in program
if (isAdmin()) {
^
invisible-function.c:13:5: error: stray ‘\200’ in program
invisible-function.c:13:5: error: stray ‘\213’ in program
invisible-function.c:13:9: error: ‘is’ undeclared (first use in this function)
if (isAdmin()) {
^
invisible-function.c:13:9: note: each undeclared identifier is reported only once for each function it appears in
invisible-function.c:13:14: error: expected ‘)’ before ‘Admin’
if (isAdmin()) {
^
$ gcc-4.9 homoglyph-function.c -o homoglyph-function
homoglyph-function.c:7:1: error: stray ‘\320’ in program
void sayНello() {
^
homoglyph-function.c:7:1: error: stray ‘\235’ in program
homoglyph-function.c:7:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ello’
void sayНello() {
^
homoglyph-function.c: In function ‘main’:
homoglyph-function.c:12:5: error: stray ‘\320’ in program
sayНello();
^
homoglyph-function.c:12:5: error: stray ‘\235’ in program
homoglyph-function.c:12:5: error: unknown type name ‘say’
gcc-11.2: successful
$ gcc-11.2 -v
Using built-in specs.
COLLECT_GCC=gcc-11.2
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-linux-gnu/11.2.0/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../configure -v --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --enable-checking=release --enable-languages=c,c++ --disable-multilib --program-suffix=-11.2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC)
$ gcc-11.2 invisible-function.c -o invisible-function
$ gcc-11.2 homoglyph-function.c -o homoglyph-function
$ ./invisible-function
You are an admin
$ ./homoglyph-function
Goodbye, World!
Do the early return and comment out technique depend to the language to be able to close the comment? I noticed all the examples use those type of languages. Take Ruby for example. The only types of comments allowed in Ruby are ones that comment all the way until the end of the newline:
puts 'hello world' # here is a comment
Or for multi-line comments you can use this syntax but the =begin
and =end
must be at the start of a line so using =end
in the middle of a line would not work:
puts 'hello world'
=begin
a multi
line comment
=end
There are a number of other languages out there that don't support /* comments with a closing token */
and I'm wondering if all those languages would be safe from the early return or comment out attack?
Are there any tools to examine the source code and point out parts where attacks are possible?
For different languages?
Maybe there are linting tools which can catch them as well?
Or tools with under-development features for this?
In the early-return.py example, I had thought
then <RLI> ''' ;return
would end up being rendered as
then nruter; '''
where every character from the end of line (there's an implicit <PDI>
at end of line, right? Or doesn't this end of line not count as an end of paragraph?) to the <RLI>
are displayed one by one.
But I suppose that's not how it works since it gets displayed as
then return; '''
Could someone help me understand this?
$ python3.7 invisible-function.py
File "invisible-function.py", line 7
def is_admin():
^
SyntaxError: invalid character in identifier
$ python3.7 --version
Python 3.7.9
Perhaps should note 'invisible-function.py' does not work on 3.7 MacOs.
It would be interesting to see how a simple, non malicious comment is seen by today's warnings.
I would recomend a having at least
// ask Joe why this works
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.