s-rah / onionscan Goto Github PK
View Code? Open in Web Editor NEWOnionScan is a free and open source tool for investigating the Dark Web.
Home Page: https://twitter.com/OnionScan
License: Other
OnionScan is a free and open source tool for investigating the Dark Web.
Home Page: https://twitter.com/OnionScan
License: Other
To be able to give a report in the user's native language, could upload the strings for the SimpleReport to e.g. transifex to get them translated into various languages, then import these translations periodically and map them on output when this is requested - or based on user's locale.
We output the raw scan data, we should also optionally output SimpleReport as JSON (both as part of the normal JSON output and as an independent blob.
I'm running Ubuntu GNOME 15.04 with golang package installed.
~/onionscan$ ./onionscan.go
./onionscan.go: line 1: package: command not found
./onionscan.go: line 3: syntax error near unexpected token newline' ./onionscan.go: line 3:
import ('
Unlike other darknet hosts, I'm running ethical/lawful website for years
and appearence of this tool makes me... nervous.
The good news is I don't use Apache, and the server is tor-exclusive.
Do you have a plan to release a online web tool that anyone can check easily(like Qualys SSLLabs)?
Also, what do you recommend for "image anonymizer"(change time & strip metadata) for Linux?
While running OnionScan today, I noted an unusual edge where a service is not on the expected port.
2016/08/11 09:57:12 ERROR: Get http://xxxxxxxxxxxxxxxx.onion/images: malformed HTTP response "SSH-2.0-OpenSSH_7.2"
Effectively, OnionScan tries to run HTTP fingerprinting on an SSH service due to the SSH daemon being bound to port 80.
The suggested enhancement that I can think of is to do some rudimentary banner parsing on connect and engage the correct fingerprinting engine based on the response.
Things such as X-Powered-By will reveal the PHP version.
OnionScan 0.2 will add a large number of scans and features. These need to be documented somewhere - main features in the README and more detailed usage examples somewhere new (to be defined...maybe readthedocs or something)
There's a 2015 research paper that presented a tool, CARONTE, that attempts to find a number of configuration issues that can cause location leaks:
https://software.imdea.org/~juanca/papers/caronte_ccs15.pdf
In particular, they talk about:
The techniques turned out to be mildly successful:
We apply CARONTE to 1,974 hidden services, fully recovering the IP address of 100 (5%) of them.
Maybe some of it is helpful to onionscan!
I have occasionally observed some onions serving traffic on the wrong port e.g. SSH on port 25 or SSH on port 5900 - these behaviors could be intentional or misconfigurations. Probably need to refactor the flow to Check Port -> Detect Protocol -> Fingerprint -> Scan.
May make sense to look for Bitcoin (etc) private keys in Wallet Import Format.
These are also represented in base58 but have different prefix bytes and sizes, so would be a relatively small change to deanonymization/check_bitcoin_addresses.go
.
This would be another CRITICAL (or at least HIGH) risk for the simple report.
I have written a testing tool for the Ricochet (https://ricochet.im) protocol, called Recoil (https://github.com/s-rah/recoil). #3 opens the idea of extending this tool past traditional web scanning - to that end it would be nice to add a little bit of protocol level detection (ssh/http/ricochet etc.) and trigger the appropriate scanning based off of that.
Recoil should be fairly easily to integrate since it is written in go too.
I've heard from more than a couple of people already who have wrapped OnionScan in python/perl/bash just to scan more than 1 domain.
Let's make this easy and provide a -f
option. Each domain on a single line.
If we come across an armored key in any form we should process it, mostly to pull out email and version string if there is one.
See #15
As a follow-on from #3, it would be interesting to have an option to connect results and scanning up to a database.
Correlating data across multiple onions can lead to some amusing results. Such as finding that a bunch of onions all share the same SSH fingerprint (hosting providers...) where the data by itself is not much help, but when you correlate it all, you end up being able to deanonymize a whole cluster of onions in one go because one of the hosts on the server is leaking something interesting.
This could be connected up to external API's such as Shodan somehow on a schedule with re-checking of things like SSH keys, etc every now and then.
This is taking up much space in main.go, we should move it and refactor it to remove some of the duplication.
bin/onionscan -torProxyAddress 127.0.0.1:9050 -verbose -jsonReport http://legionhiden4dqh4.onion/
2016/04/11 20:48:30 Starting Scan of http://legionhiden4dqh4.onion/
2016/04/11 20:48:30 This might take a few minutes..
2016/04/11 20:48:30 Error running scanner: Get http://http://legionhiden4dqh4.onion/: Can't complete SOCKS5 connection.
tor message
20:48:30 [NOTICE] Application asked to connect to port 0. Refusing. [18
x duplicates hidden]
18 SocksPort 127.0.0.1:9050 # Default: Bind to localhost:9050 for local
x connections.
tor is being started by arm
tor is latest git pull on latest yosemite
See #15
This involves another web scanner on port 443. (Also detection on ports 80)
It would be nice to be able to have some grasp over what the page is about - <tiitle>
should be good enough, and is another possible fingerprinting mechanism.
While following directory listings can be fruitful it can also be fairly expensive in terms of time and bandwidth. Provide a -d
option to limit how deep we scan (default for all with value 0)
OnionScan should support connecting to .i2p
eepsites - they can suffer the same opsec issues as tor hidden services.
Currently we assume all or nothing. It would be nice if we could configure this behavior. I can imagine a few levels (where each level includes the ones before it) like:
See #15
We can collect all external links by adding checking of html source code (optionally javascript) and find links to clearnet sites (Hight Risk) and other onion sites (Low Risk).
and so on.
What do you think about it?
At the moment any fingerprinting would have to be done by an extra process (or by hand) - would be nice to automate some of this with in the scanner - most likely requiring a database of prescanned hidden services.
Currently we drop resources larger than 2MB because of limitations with the database - regardless of which backing store we end up with in the future, we are likely always going to need to chunk blobs.
We should
Configure a max object size for downloaded resources (set probably around 10MB for default)
Split resources smaller than this into <1MB chunks and store them in the DB in a way that can be put back together later e.g.
resource:{
url net.URL
data []byte
nextChunk int //id to resource chunk containing next part of the data
}
There have been discussions and suggestions in #3 and #6 for using external services such as Tineye and Shodan for comparing fingerprints collected. As hidden services may be scanned by onionscan, possibly even by the owners of hidden services, to prevent the chances of correlation between scanners and sites, all IP accesses using onionscan should have the option of being accessed through an anonymising service.
Most sites aren't single page, we should crawl the site to find issues e.g. a PGP key being on the /contact page. Depends on #32
Proposed for @JosephGregg in #2
"Have you considered doing reverse image searches against the tineye database using their API? Just an idea... https://services.tineye.com/TinEyeAPI - of course you would be sending images from hidden service to clearnet.."
This looks like it might be an interesting avenue.
Hello, checking specific http header might be usefull. Maybe new scanner, like http_headers_scanner?
In the future it can be expanded by http injection flaws.
See OWASP recomendations: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
What do you think about it?
I feel it might be helpful to add support for popular Extensible Messaging and Presence Protocol servers.
Fingerprinting popular Extensible Messaging and Presence Protocol servers might bear a significant amount of useful data and I don't feel it's at all an unrealistic scenario. It's not uncommon for public-facing Extensible Messaging and Presence Protocol servers to also cater for access through non-public TLD special-use suffix'.
I feel it'll also help to consider collecting and parsing Extensible Messaging and Presence Protocol servers-side X.509 credentials as it bears potential for it to contain useful or identification information of other host-names or IP addresses and ascertain if other Extensible Messaging and Presence Protocol servers exist or establish potential co-hosting of other services.
It also might aid identification and correlation of public-facing Extensible Messaging and Presence Protocol servers or other services built upon prior assumptions.
Some of the new improvements e.g. spider/
and bitcoin changes have dramatically increased the timing expectations for certain sites. For example scanning for onion peers in bitcoin takes a rather long time and a user configuring that and a small timeout should likely be warned that it is a bad idea.
On top of that, we need to put some thought into why timeouts exist and how they can be helpful. Some thoughts:
Sometimes we find identifiers like Bitcoin addresses commented out in code - we still extract these because we do a very simple regex across the page snapshot. OnionScan should tell the user when we have found something in plain site verses when we have discovered it unintentionally.
This will likely involve filtering out the text as part of the spider crawl and storing it in Page - perhaps also filtering out comments into their own section too - that way we don't need the entire page snapshot.
Output should be via SimpleReport.
Expanding OnionScan to check the whole site to, for example, find encryption keys (#20) - means refactoring protocols/standard_page_scan.go to:
It would be good to support dumping a site from the database in WARC format i.e. https://www.iso.org/obp/ui/#iso:std:iso:28500:ed-1:v1:en
There are a bunch of go-lang libraries in various stats of repair for dealing with the WARC format, we should probably first evaluate their suitability.
Since this relies on native libraries being run, I'd very much like to deprecate this in favor of a pure golang solution.
I've noted that some sites seem to trigger a scan hang. I have set the time out (-timeout 1) but the scan still seems to hang right after the mod_status check.
onionscan -timeout 1 -depth 0 -verbose hafacwgmrntoolno.onion
2016/06/14 11:43:56 Starting Scan of hafacwgmrntoolno.onion
2016/06/14 11:43:56 This might take a few minutes..
2016/06/14 11:43:56 Checking hafacwgmrntoolno.onion http(80)
2016/06/14 11:43:56 Found potential service on http(80)
2016/06/14 11:43:59 HTTP response headers:
2016/06/14 11:43:59 CONTENT-TYPE : text/html
2016/06/14 11:43:59 VARY : Accept-Encoding
2016/06/14 11:43:59 X-FRAME-OPTIONS : sameorigin
2016/06/14 11:43:59 X-XSS-PROTECTION : 1; mode=block
2016/06/14 11:43:59 ACCEPT-RANGES : bytes
2016/06/14 11:43:59 X-CONTENT-TYPE-OPTIONS : nosniff
2016/06/14 11:43:59 DATE : Tue, 14 Jun 2016 18:46:51 GMT
2016/06/14 11:43:59 SERVER : Apache
2016/06/14 11:43:59 LAST-MODIFIED : Wed, 02 Sep 2015 09:26:18 GMT
2016/06/14 11:43:59 ETAG : "ac27-51ec04248c771-gzip"
2016/06/14 11:44:00 Apache mod_status Not Exposed...Good!
any idea, this might be the occurring? I'm running a new version I just grabbed from onion scan.
Include images shown on the page using CSS: url()
/ background / background-image
Check for web font imports, as these can be hosted on external sites, and be used to identify repeat users http://www.itbusiness.ca/news/44120/44120
It would be nice to specify new attacks as JSON files that can be interpreted by the scanner. E.g. something like:
{
"name":"Apache mod_status is Accessible",
"location":"/server-status",
"requirements": [
{"equals": ["http-status-code", 200]},
{"contains":["contents","Server Version: (.*)</dt>"]}
]
}
With extra reporting options and such, this would clean the reporting code up.
user@ubuntu:~/go/src/github.com/s-rah/onionscan$ go run onionscan.go
/home/user/onion/src/golang.org/x/crypto/ed25519/ed25519.go:54:66: error: reference to undefined identifier ‘crypto.SignerOpts’
func (priv PrivateKey) Sign(rand io.Reader, message []byte, opts crypto.SignerOpts) (signature []byte, err error) {
^
lool@ubuntu:~/go/src/github.com/s-rah/onionscan$
A lot of hidden services (close to 3% in my last big scan) are configured so that the .onion address serves all ports.
If SSH is being served, you can grab the key fingerprint and sometimes uncloak the HS by checking it against Shodan or your own database of scans.
Example code (in Python) to do this is here: https://github.com/0x27/ssh_keyscanner
Take the sha1/md5/whatever of each image on the front page of the site. This will feed into the fingerprint later on.
Usecases:
This would be a critical server failure.
Currently SimpleReport is the only kind of post-analytics we do. This can definitely be expanded.
Some examples of post-processing steps we likely want in the core onionscan base:
Any analytics performed by OnionScan should be modular and configurable. It might make sense for OnionScan to accept a json formatted config file detailing the exact flow that it should undertake.
At the same time, we should try to minimize the amount of code dedicated to analytics that is best performed by other dedicated applications (one example that comes to mind is stylometry that requires ML models and databases of known samples - we likely do not want to support that).
We currently only check known IRC ports we don't do much in the way of confirmation. We should connect (and in the case of IRCS pull the X.509 certificate).
We may want to consider snapshoting the IRC Welcome Message and Channel List also.
One of the dependencies, HouzuoGuo/tiedot
explicitly only works on x86_64 right now, and fails to build on ARM32:
tiedot should be compiled/run on x86-64 systems.
If you decide to compile tiedot on 32-bit systems, the following integer-smear algorithm will cau
se compilation failure
due to 32-bit interger overflow; therefore you must modify the algorithm.
Do not remove the integer-smear process, and remember to run test cases to verify your mods.
Doesn't seem that difficult to solve (possibly just replacing int by uint64), however this needs to be resolved before onionscan can run on most Android devices and such.
Edit: There's an upstream issue for this HouzuoGuo/tiedot#68 and it's being worked on, there is even a 32-bit branch but it isn't integrated into mainline yet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.