s-rah / onionscan Goto Github PK

OnionScan is a free and open source tool for investigating the Dark Web.

Home Page: https://twitter.com/OnionScan

License: Other

Go 88.73% HTML 11.27%

onionscan's Introduction

What is OnionScan?

OnionScan is a free and open source tool for investigating the Dark Web. For all the amazing technological innovations in the anonymity and privacy space, there is always a constant threat that has no effective technological patch - human error.

Whether it is operational security leaks or software misconfiguration - most often times the attacks on anonymity don't come from breaking the underlying systems, but from ourselves.

OnionScan has two primary goals:

We want to help operators of hidden services find and fix operational security issues with their services. We want to help them detect misconfigurations and we want to inspire a new generation of anonymity engineering projects to help make the world a more private place.
Secondly we want to help researchers and investigators monitor and track Dark Web sites. In fact we want to make this as easy as possible. Not because we agree with the goals and motives of every investigation force out there - most often we don't. But by making these kinds of investigations easy, we hope to create a powerful incentive for new anonymity technology (see goal #1)

Installing

A Note on Dependencies

OnionScan requires either Go 1.6 or 1.7.

In order to install OnionScan you will need the following dependencies not provided by the core go standard library:

golang.org/x/net/proxy - For the Tor SOCKS Proxy connection.
golang.org/x/net/crypto - For PGP parsing
golang.org/x/net/html - For HTML parsing
github.com/rwcarlsen/goexif - For EXIF data extraction.
github.com/HouzuoGuo/tiedot/db - For crawl database.

See the wiki for guidance.

Grab with go get

go get github.com/s-rah/onionscan

Compile/Run from git cloned source

Once you have cloned the repository into somewhere that go can find it you can run go install github.com/s-rah/onionscan and then run the binary in $GOPATH/bin/onionscan.

Alternatively, you can just do go run github.com/s-rah/onionscan.go to run without compiling.

Quick Start

For a simple report detailing the high, medium and low risk areas found with a hidden service:

onionscan notarealhiddenservice.onion

The most interesting output comes from the verbose option:

onionscan --verbose notarealhiddenservice.onion

There is also a JSON output, if you want to integrate with another program or application:

onionscan --jsonReport notarealhiddenservice.onion

If you would like to use a proxy server listening on something other that 127.0.0.1:9050, then you can use the --torProxyAddress flag:

onionscan --torProxyAddress=127.0.0.1:9150 notarealhiddenservice.onion

More detailed documentation on usage can be found in doc.

What is scanned for?

A list of privacy and security problems which are detected by OnionScan can be found here.

You can also directly configure the types of scanning that onionscan does using the scans parameter.

./bin/onionscan --scans web notarealhiddenservice.onion

Running the OnionScan Correlation Lab

If you are a researcher monitoring multiple sites you will definitely want to use the OnionScan Correlation Lab - a web interface hosted by OnionScan that allows you to discover, search and tag different identity correlations.

You can find a full guide on the OnionScan correlation lab here.

onionscan's People

Contributors

Stargazers

Watchers

Forkers

hotelzululima security-geeks johnjohnsp1 xn0px90 petertonoli hardbox meta-thrunks fo0nikens stackmutt samyoyo reelsense benheise cephurs averroes securitywarrior r4v5 vysecurity lucabongiorni anilaphale josephgregg devrimm sainslie onionscan methos2016 atplejia paran0ids0ul haykbaluyan lazycrazyowl deki0r redas23 black-tech wmswu gitcollect webvul gatomalo dgotrik foxweek mshirley 0xfff psychomad clementtrebuchet jack51706 gurupras netwrkspider bowlingb tourountzis hbxc0re tdr130 gchan c0ntr01 alessiodallapiazza av1080p sw0rrdd 1683942030 songofhack yuansec melbshark kknet qqvirus teardemon 453483289 5up3rc tzubal jijicanyu cheaphunter nealscarffery daizhongyin feixuezhi supertanglang linuxgg hxp2k6 hughker tartaruszen roggewei joeyxy edsionl shuaiyin xiju2003 frix-x nazim eniac888 29alpha olivierh59500 sec-princess zeroimpact justinvforvendetta zephyr0x01 onefrankguy thurday vinay166 cy-fir o5i5lgin1o0j kaicastledine akidburn tuian ovens neilzone todd-placher githubutilities jacksparal

onionscan's Issues

Automated Fingerprinting

At the moment any fingerprinting would have to be done by an extra process (or by hand) - would be nice to automate some of this with in the scanner - most likely requiring a database of prescanned hidden services.

Flag Unintentional Leaks in HTML Comments

Sometimes we find identifiers like Bitcoin addresses commented out in code - we still extract these because we do a very simple regex across the page snapshot. OnionScan should tell the user when we have found something in plain site verses when we have discovered it unintentionally.

This will likely involve filtering out the text as part of the spider crawl and storing it in Page - perhaps also filtering out comments into their own section too - that way we don't need the entire page snapshot.

Output should be via SimpleReport.

Depth Limiting on Directory Scanning

While following directory listings can be fruitful it can also be fairly expensive in terms of time and bandwidth. Provide a -d option to limit how deep we scan (default for all with value 0)

Grab additional header information for reporting

Things such as X-Powered-By will reveal the PHP version.

Fingerprint Images on Site

Take the sha1/md5/whatever of each image on the front page of the site. This will feed into the fingerprint later on.

Usecases:

Detecting onioncloner copies
#6
General fingerprinting of resources.

Service detection for non standard ports

While running OnionScan today, I noted an unusual edge where a service is not on the expected port.

2016/08/11 09:57:12 ERROR: Get http://xxxxxxxxxxxxxxxx.onion/images: malformed HTTP response "SSH-2.0-OpenSSH_7.2"

Effectively, OnionScan tries to run HTTP fingerprinting on an SSH service due to the SSH daemon being bound to port 80.

The suggested enhancement that I can think of is to do some rudimentary banner parsing on connect and engage the correct fingerprinting engine based on the response.

Scan CSS in <style> and <link> tags

Include images shown on the page using CSS: url() / background / background-image

Check for web font imports, as these can be hosted on external sites, and be used to identify repeat users http://www.itbusiness.ca/news/44120/44120

Consider CARONTE's ideas for location leaks

There's a 2015 research paper that presented a tool, CARONTE, that attempts to find a number of configuration issues that can cause location leaks:
https://software.imdea.org/~juanca/papers/caronte_ccs15.pdf

In particular, they talk about:

Finding "broken" links on onion pages (Section 3.3.1).
Finding unique strings on onion pages, and then google for them (Section 3.3.2).
Finding reused X.509 certificates (Section 3.3.3).

The techniques turned out to be mildly successful:

We apply CARONTE to 1,974 hidden services, fully recovering the IP address of 100 (5%) of them.

Maybe some of it is helpful to onionscan!

Extract + Fingerprint Encryption Keys

If we come across an armored key in any form we should process it, mostly to pull out email and version string if there is one.

Scan Each Page of the Site

Most sites aren't single page, we should crawl the site to find issues e.g. a PGP key being on the /contact page. Depends on #32

Look for cryptocurrency private keys?

May make sense to look for Bitcoin (etc) private keys in Wallet Import Format.

These are also represented in base58 but have different prefix bytes and sizes, so would be a relatively small change to deanonymization/check_bitcoin_addresses.go.

This would be another CRITICAL (or at least HIGH) risk for the simple report.

Extract Tracking/Analytics IDs

See #15

Google Web Analytics
Google Add Sense

(This list needs expanding)

Suggest a tool & Online Service

Unlike other darknet hosts, I'm running ethical/lawful website for years
and appearence of this tool makes me... nervous.

The good news is I don't use Apache, and the server is tor-exclusive.

Do you have a plan to release a online web tool that anyone can check easily(like Qualys SSLLabs)?

Also, what do you recommend for "image anonymizer"(change time & strip metadata) for Linux?

getting socks connections errors after it was working previously

bin/onionscan -torProxyAddress 127.0.0.1:9050 -verbose -jsonReport http://legionhiden4dqh4.onion/
2016/04/11 20:48:30 Starting Scan of http://legionhiden4dqh4.onion/
2016/04/11 20:48:30 This might take a few minutes..

2016/04/11 20:48:30 Error running scanner: Get http://http://legionhiden4dqh4.onion/: Can't complete SOCKS5 connection.

tor message

20:48:30 [NOTICE] Application asked to connect to port 0. Refusing. [18
x duplicates hidden]

18 SocksPort 127.0.0.1:9050 # Default: Bind to localhost:9050 for local
x connections.

tor is being started by arm
tor is latest git pull on latest yosemite

Extensible Messaging and Presence Protocol

I feel it might be helpful to add support for popular Extensible Messaging and Presence Protocol servers.

Fingerprinting popular Extensible Messaging and Presence Protocol servers might bear a significant amount of useful data and I don't feel it's at all an unrealistic scenario. It's not uncommon for public-facing Extensible Messaging and Presence Protocol servers to also cater for access through non-public TLD special-use suffix'.

I feel it'll also help to consider collecting and parsing Extensible Messaging and Presence Protocol servers-side X.509 credentials as it bears potential for it to contain useful or identification information of other host-names or IP addresses and ascertain if other Extensible Messaging and Presence Protocol servers exist or establish potential co-hosting of other services.

It also might aid identification and correlation of public-facing Extensible Messaging and Presence Protocol servers or other services built upon prior assumptions.

Extract Email Addresses

See #15

Design 3rd Party Database Lookups

There have been discussions and suggestions in #3 and #6 for using external services such as Tineye and Shodan for comparing fingerprints collected. As hidden services may be scanned by onionscan, possibly even by the owners of hidden services, to prevent the chances of correlation between scanners and sites, all IP accesses using onionscan should have the option of being accessed through an anonymising service.

Collect links to external sites by analysing html source code.

We can collect all external links by adding checking of html source code (optionally javascript) and find links to clearnet sites (Hight Risk) and other onion sites (Low Risk).

<imr src=?
<a href=?
<script src=?
<iframe src=?

and so on.

What do you think about it?

Refactor Standard Page Scan

Expanding OnionScan to check the whole site to, for example, find encryption keys (#20) - means refactoring protocols/standard_page_scan.go to:

Extract each check out into it's own module (we need to be able to turn these on and off - and testing)
We need to do this once per page, some of the current code assumes we will only do this check once (like grabbing page title)
We need a way to ensure we don't scan pages, images or directories multiple times (see also: #22)

Better IRC Scanning

We currently only check known IRC ports we don't do much in the way of confirmation. We should connect (and in the case of IRCS pull the X.509 certificate).

We may want to consider snapshoting the IRC Welcome Message and Channel List also.

Load domains to check from a file

I've heard from more than a couple of people already who have wrapped OnionScan in python/perl/bash just to scan more than 1 domain.

Let's make this easy and provide a -f option. Each domain on a single line.

Database Support

As a follow-on from #3, it would be interesting to have an option to connect results and scanning up to a database.

Correlating data across multiple onions can lead to some amusing results. Such as finding that a bunch of onions all share the same SSH fingerprint (hosting providers...) where the data by itself is not much help, but when you correlate it all, you end up being able to deanonymize a whole cluster of onions in one go because one of the hosts on the server is leaking something interesting.

This could be connected up to external API's such as Shodan somehow on a schedule with re-checking of things like SSH keys, etc every now and then.

Check http headers

Hello, checking specific http header might be usefull. Maybe new scanner, like http_headers_scanner?
In the future it can be expanded by http injection flaws.

See OWASP recomendations: https://www.owasp.org/index.php/List_of_useful_HTTP_headers

What do you think about it?

Internationalization for SimpleReport

To be able to give a report in the user's native language, could upload the strings for the SimpleReport to e.g. transifex to get them translated into various languages, then import these translations periodically and map them on output when this is requested - or based on user's locale.

Better Timeout Policies

Some of the new improvements e.g. spider/ and bitcoin changes have dramatically increased the timing expectations for certain sites. For example scanning for onion peers in bitcoin takes a rather long time and a user configuring that and a small timeout should likely be warned that it is a bad idea.

On top of that, we need to put some thought into why timeouts exist and how they can be helpful. Some thoughts:

Some sites are really really slow but for large scans we want to wait for them.
If the first protocol scan succeeds we probably want to ignore timeouts and keep checking since we know at least something is there.
As said above, some protocol scans are really slow and can potentially contradict and confuse user specified timeouts.
Currently timeouts are applied at two levels - on a web page basis and on a scan basis as a whole - this makes actually predicting how long a scan will take is pretty difficult.

Investigate Reverse Image Search APIs for Deanonymization

Proposed for @JosephGregg in #2

"Have you considered doing reverse image searches against the tineye database using their API? Just an idea... https://services.tineye.com/TinEyeAPI - of course you would be sending images from hidden service to clearnet.."

This looks like it might be an interesting avenue.

Getting an error trying to run onionscan

I'm running Ubuntu GNOME 15.04 with golang package installed.

~/onionscan$ ./onionscan.go
./onionscan.go: line 1: package: command not found
./onionscan.go: line 3: syntax error near unexpected token newline' ./onionscan.go: line 3:import ('

Seperate out SimpleReport into it's own file + Refactor

This is taking up much space in main.go, we should move it and refactor it to remove some of the duplication.

Scan Hanging.

I've noted that some sites seem to trigger a scan hang. I have set the time out (-timeout 1) but the scan still seems to hang right after the mod_status check.

onionscan -timeout 1 -depth 0 -verbose hafacwgmrntoolno.onion
2016/06/14 11:43:56 Starting Scan of hafacwgmrntoolno.onion
2016/06/14 11:43:56 This might take a few minutes..

2016/06/14 11:43:56 Checking hafacwgmrntoolno.onion http(80)
2016/06/14 11:43:56 Found potential service on http(80)
2016/06/14 11:43:59 HTTP response headers:
2016/06/14 11:43:59 CONTENT-TYPE : text/html
2016/06/14 11:43:59 VARY : Accept-Encoding
2016/06/14 11:43:59 X-FRAME-OPTIONS : sameorigin
2016/06/14 11:43:59 X-XSS-PROTECTION : 1; mode=block
2016/06/14 11:43:59 ACCEPT-RANGES : bytes
2016/06/14 11:43:59 X-CONTENT-TYPE-OPTIONS : nosniff
2016/06/14 11:43:59 DATE : Tue, 14 Jun 2016 18:46:51 GMT
2016/06/14 11:43:59 SERVER : Apache
2016/06/14 11:43:59 LAST-MODIFIED : Wed, 02 Sep 2015 09:26:18 GMT
2016/06/14 11:43:59 ETAG : "ac27-51ec04248c771-gzip"
2016/06/14 11:44:00 Apache mod_status Not Exposed...Good!

any idea, this might be the occurring? I'm running a new version I just grabbed from onion scan.

Update README.md / Docs for Onionscan 0.2

OnionScan 0.2 will add a large number of scans and features. These need to be documented somewhere - main features in the README and more detailed usage examples somewhere new (to be defined...maybe readthedocs or something)

Configuration for Attacks instead of Hard Coding

It would be nice to specify new attacks as JSON files that can be interpreted by the scanner. E.g. something like:

{
    "name":"Apache mod_status is Accessible",
    "location":"/server-status",
    "requirements": [
        {"equals": ["http-status-code", 200]},
        {"contains":["contents","Server Version: (.*)</dt>"]}
    ]
}

With extra reporting options and such, this would clean the reporting code up.

Add Option to Output Simple Report in JSON Format

We output the raw scan data, we should also optionally output SimpleReport as JSON (both as part of the normal JSON output and as an independent blob.

SSH-key checking

A lot of hidden services (close to 3% in my last big scan) are configured so that the .onion address serves all ports.

If SSH is being served, you can grab the key fingerprint and sometimes uncloak the HS by checking it against Shodan or your own database of scans.

Example code (in Python) to do this is here: https://github.com/0x27/ssh_keyscanner

Error getting Onionscan

Getting Onionscan, I have the following error on Debian 8.5 x64 3.16.7, go 1.3.3:

can't 'go run onionscan.go'

user@ubuntu:~/go/src/github.com/s-rah/onionscan$ go run onionscan.go

golang.org/x/crypto/ed25519

/home/user/onion/src/golang.org/x/crypto/ed25519/ed25519.go:54:66: error: reference to undefined identifier ‘crypto.SignerOpts’
func (priv PrivateKey) Sign(rand io.Reader, message []byte, opts crypto.SignerOpts) (signature []byte, err error) {
^
lool@ubuntu:~/go/src/github.com/s-rah/onionscan$

onionscan doesn't compile on 32-bit (dependency issue)

One of the dependencies, HouzuoGuo/tiedot explicitly only works on x86_64 right now, and fails to build on ARM32:

tiedot should be compiled/run on x86-64 systems.
If you decide to compile tiedot on 32-bit systems, the following integer-smear algorithm will cau
se compilation failure
due to 32-bit interger overflow; therefore you must modify the algorithm.
Do not remove the integer-smear process, and remember to run test cases to verify your mods.

Doesn't seem that difficult to solve (possibly just replacing int by uint64), however this needs to be resolved before onionscan can run on most Android devices and such.

Edit: There's an upstream issue for this HouzuoGuo/tiedot#68 and it's being worked on, there is even a 32-bit branch but it isn't integrated into mainline yet.

Extract HTTPS Certificates

See #15

This involves another web scanner on port 443. (Also detection on ports 80)

Get rid of libexif-dev Dependency.

Since this relies on native libraries being run, I'd very much like to deprecate this in favor of a pure golang solution.

Split Resources into Chunks for Storing

Currently we drop resources larger than 2MB because of limitations with the database - regardless of which backing store we end up with in the future, we are likely always going to need to chunk blobs.

We should

Configure a max object size for downloaded resources (set probably around 10MB for default)
Split resources smaller than this into <1MB chunks and store them in the DB in a way that can be put back together later e.g.
```
  resource:{
 url net.URL
 data []byte
 nextChunk int //id to resource chunk containing next part of the data
 }
```

Extract Page Title

It would be nice to be able to have some grasp over what the page is about - <tiitle> should be good enough, and is another possible fingerprinting mechanism.

Detect if the tor proxy is Available and if not log an error on startup

Protocol Detection/Correction

I have occasionally observed some onions serving traffic on the wrong port e.g. SSH on port 25 or SSH on port 5900 - these behaviors could be intentional or misconfigurations. Probably need to refactor the flow to Check Port -> Detect Protocol -> Fingerprint -> Scan.

Analytics Framework

Currently SimpleReport is the only kind of post-analytics we do. This can definitely be expanded.

Some examples of post-processing steps we likely want in the core onionscan base:

Use the new Database functionality to compare reports to automatically find correlations.
Automatically identify new onion services.
Perform site-specific follow on actions e.g. if we know that a site stores all user profiles under /user?name={username} we can use that information to prepare an even more detailed report.

Any analytics performed by OnionScan should be modular and configurable. It might make sense for OnionScan to accept a json formatted config file detailing the exact flow that it should undertake.

At the same time, we should try to minimize the amount of code dedicated to analytics that is best performed by other dedicated applications (one example that comes to mind is stylometry that requires ML models and databases of known samples - we likely do not want to support that).

Extract Bitcoin Address from Page

See #15

Scan Intensity e.g Fingerprint verses Full Scan

Currently we assume all or nothing. It would be nice if we could configure this behavior. I can imagine a few levels (where each level includes the ones before it) like:

Just Fingerprint the Service: Open Protocols, Front Page images/keys, SSH Fingerprint etc.
Check for Major Flaws e.g. Localhost Bypasses.
Deeper fingerprint of all web pages / other indepth protocol analysis.
Full blown invasive scan, follow all directories, check every image etc.

Add Port Scanning & Eventing Base

To nicely implement #3 & #7 we should first have OnionScan scan various ports, and if a successful connection is made then fire off a goroutine to go run the associated tests. This would then allow us to more easily build out other protocols.

Support WARC Output for Onions

It would be good to support dumping a site from the database in WARC format i.e. https://www.iso.org/obp/ui/#iso:std:iso:28500:ed-1:v1:en

There are a bunch of go-lang libraries in various stats of repair for dealing with the WARC format, we should probably first evaluate their suitability.

Attempt to grab /private_key

This would be a critical server failure.

Add detecting and scanning of Ricochet Services

I have written a testing tool for the Ricochet (https://ricochet.im) protocol, called Recoil (https://github.com/s-rah/recoil). #3 opens the idea of extending this tool past traditional web scanning - to that end it would be nice to add a little bit of protocol level detection (ssh/http/ricochet etc.) and trigger the appropriate scanning based off of that.

Recoil should be fairly easily to integrate since it is written in go too.

Add Support for i2p

OnionScan should support connecting to .i2p eepsites - they can suffer the same opsec issues as tor hidden services.

s-rah / onionscan Goto Github PK

onionscan's Introduction

What is OnionScan?

Installing

A Note on Dependencies

Grab with go get

Compile/Run from git cloned source

Quick Start

What is scanned for?

Running the OnionScan Correlation Lab

onionscan's People

Contributors

Stargazers

Watchers

Forkers

onionscan's Issues

golang.org/x/crypto/ed25519

Recommend Projects

Recommend Topics

Recommend Org