eth0izzle / needl Goto Github PK

View Code? Open in Web Editor NEW

561.0 19.0 55.0 9.63 MB

Take back your privacy. Lose yourself in the haystack.

Home Page: https://www.darkport.co.uk

License: MIT License

Python 100.00%

privacy python

needl's Introduction

Needl

Take back your privacy. Lose yourself in the haystack.

Your ISP is most likely tracking your browsing habits and selling them to marketing agencies (albeit anonymised). Or worse, making your browsing history available to law enforcement at the hint of a Subpoena. Needl will generate random Internet traffic in an attempt to conceal your legitimate traffic, essentially making your data the Needle in the haystack and thus harder to find. The goal is to make it harder for your ISP, government, etc to track your browsing history and habits.

It's not perfect. But it's a start. Have an idea? Get involved!

Implemented modules:

Google: generates a random search string, searches Google and clicks on a random result.
Alexa: visits a website from the Alexa Top 1 Million list. (warning: contains a lot of porn websites)
Twitter: generates a popular English name and visits their profile; performs random keyword searches
DNS: produces random DNS queries from the Alexa Top 1 Million list.
Spotify: random searches for Spotify artists

Module ideas:

WhatsApp
Facebook Messenger

Installation

Needl should work pretty much any Linux system with Python 3.0+ installed.

cd /opt
git clone https://github.com/eth0izzle/needl.git
pip3 install -r requirements.txt
Download ChromeDriver for your platform (requires Chrome) and place in ./data.
python3 needl.py

Usage

Needl runs as a daemon and will happily sit in the background chomping away 24/7, 365. Each module (task) has scheduled actions, for example random DNS queries will happen every 1 to 3 minutes. You can configure the intervals within ./data/settings.yaml.

usage: needl.py [-h] [--datadir DATADIR] [-d] [-v] [--logfile LOGFILE]
                [--pidfile PIDFILE]

Take back your privacy. Lose yourself in the haystack.

optional arguments:
  -h, --help         show this help message and exit
  --datadir DATADIR  Data directory
  -d, --daemon       Run as a deamon
  -v, --verbose      Increase logging
  --logfile LOGFILE  Log to this file. Default is stdout.
  --pidfile PIDFILE  Save process PID to this file. Default is /tmp/needl.pid.
                     Only valid when running as a daemon.

F.A.Qs

Why not just use a VPN/Tor? And you should! Needl does not protect your legitimate traffic in any way. It simply generates more.
By using Needl will my legitimate traffic be hidden/protected/safe? No. This isn't the goal of Needl. It's purpose is to generate more traffic to make it harder to identify your legitimate traffic. There's no evidence to suggest this actually works - it's a proof of concept.
Can [insert service here] differentiate between Needl and my legitimate requests? In theory, yes. [insert service here] can track you with Cookies, Session data or algorithms. Needl will tackle this in the future.
Where are your tests?!? Submit a pull request. Please.

Contributing

Check out the issue tracker and see what tickles your fancy.

Fork it, baby!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request

License

MIT. See LICENSE

needl's People

Contributors

Stargazers

Watchers

needl's Issues

Reddit module

Module that clicks on random content on /r/all.

Dockerize Needl

I'm not familiar with Docker but this looks like this would be a good starting point: https://runnable.com/docker/python/dockerize-your-python-application

error when launching

Traceback (most recent call last):
  File "needl.py", line 2, in <module>
    import daemon, daemon.pidfile
ModuleNotFoundError: No module named 'daemon.pidfile'; 'daemon' is not a package

• Win10 x64 1809 ltsc
• Python 3.9 x64
• i installed Chrome 87.0.4280.66 + put ChromeDriver from https://chromedriver.storage.googleapis.com/index.html?path=87.0.4280.20/ (chromedriver_win32.zip) into \needl\data\

Write upstart/init scripts

Currently Needl will daemonize when passing in the --daemon flag. We should create the appropriate scripts so the OS can handle running it as a service.

We need to explore the different options and their pros/cons. Upstart, init.d or service?

Randomise when tasks run

Currently tasks run on a fixed time basis, i.e very 2 minutes. The user should be able to configure a random time period for a task to run between, i.e. every 2 to 4 minutes in order to reduce predictability of the tool.

Implement Windows 10 telemetry module

Error with webdriver

I folowed al the steps, but I get this when executing:

(ERROR) [init->_run_job]: Task failed: Traceback (most recent call last):
File "/opt/needl/needl/schedule/init.py", line 84, in _run_job
ret = job.run()
File "/opt/needl/needl/schedule/init.py", line 336, in run
ret = self.job_func()
File "/opt/needl/needl/tasks/google.py", line 17, in search
browser = utils.get_browser()
File "/opt/needl/needl/utils.py", line 60, in get_browser
return webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)
File "/home/hernan/.local/lib/python3.5/site-packages/selenium/webdriver/chrome/webdriver.py", line 62, in init
self.service.start()
File "/home/hernan/.local/lib/python3.5/site-packages/selenium/webdriver/common/service.py", line 96, in start
self.assert_process_still_running()
File "/home/hernan/.local/lib/python3.5/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /opt/needl/data/chromedriver unexpectedly exited. Status code was: 127

Is there anything else that I should do to the driver?
Thanks!

Disable HTTPs?

As Needl's sole purpose is to generate random traffic so it can be captured and stored, shall we disable HTTPs project wide? i.e when searching Google.

Thoughts?

Filter unwanted/inappropriate websites

Moved from #13.

Some of the modules have a possibility of generating traffic that could be harmful if not outright incriminating. Without some kind of "safe mode", users could be putting themselves at real risk.

Possible methods:

Filter by profanity in URL and other descriptors.
Use already safe methods like Google Safe search.
Filter by top-level domains (.org, .gov, .tech tend to be safer bets)

This could have the effect of making traffic less believable, any making any real "unsafe" traffic stand out more. This should certainly not be the default.

Write documentation

Mostly to explain what each module does and the options under data/settings.yaml.

RFC: PEP 370

PEP: https://www.python.org/dev/peps/pep-0370/
Explanation: https://stackoverflow.com/questions/21055859/what-are-the-risks-of-running-sudo-pip

Thoughts?

Simulate a web browser more effectively

When Needl requests a website (i.e. Google) it will download the requested URL only. We should make every effort to behave like a web browser by downloading any of the websites resources such as images, CSS and JavaScript files.

Should we consider using a headless browser testing framework such as PhantomJS?

brainstorming limitations and features

which may or may not be existing/need refining/in the works...

randomized (but not TOO randomized) intervals... as below, general pattern mimicry would be ideal; exactly randomly between 1-10 seconds is not; humans are not just gravel, also rocks and boulders.
customizeable word lists
- as crazy as it sounds, a chrome plugin to record actual searches and thereby use real starting data for mimicry might be effective (again, obfuscation vs privation)
variety of request types, ie POST, PATCH, DELETE... more tricky, but filtering vs GETS would be the first thing I'd do looking for real human logs
controlled variety of 'quest' depth. google+1click and then google something completely unrelated+1click is not convincing.

eg, my computer visiting 1000 random websites per day at 5 pages per minute is not going to be anywhere near convincing, given i visit a handful of sites in bursts normally (with that pattern already having been logged)

abstracted:

usage patterns that are not static randomness, but sporadic and clumpy, reasonably nonlinear
mimicry of actual/personalizeable trends in content

really abstracted:

better to make a handful of knitting needles than a busload of thumbtacks

I've said enough. Please close issue and destroy Github after reading.
🍺

Google task css selectors

I think the css selectors for the Google task might need updating to match their new Google results page.

On search, this is the error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="tsf"]"}

If clickthrough is enabled, this is the error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="resultStats"]"}

This is with selenium/chromedriver 88.0.4324.187.