Giter VIP home page Giter VIP logo

needl's Introduction

Needl

Take back your privacy. Lose yourself in the haystack.

Your ISP is most likely tracking your browsing habits and selling them to marketing agencies (albeit anonymised). Or worse, making your browsing history available to law enforcement at the hint of a Subpoena. Needl will generate random Internet traffic in an attempt to conceal your legitimate traffic, essentially making your data the Needle in the haystack and thus harder to find. The goal is to make it harder for your ISP, government, etc to track your browsing history and habits.

It's not perfect. But it's a start. Have an idea? Get involved!

Demo

Implemented modules:

  • Google: generates a random search string, searches Google and clicks on a random result.
  • Alexa: visits a website from the Alexa Top 1 Million list. (warning: contains a lot of porn websites)
  • Twitter: generates a popular English name and visits their profile; performs random keyword searches
  • DNS: produces random DNS queries from the Alexa Top 1 Million list.
  • Spotify: random searches for Spotify artists

Module ideas:

  • WhatsApp
  • Facebook Messenger

Installation

Needl should work pretty much any Linux system with Python 3.0+ installed.

  1. cd /opt
  2. git clone https://github.com/eth0izzle/needl.git
  3. pip3 install -r requirements.txt
  4. Download ChromeDriver for your platform (requires Chrome) and place in ./data.
  5. python3 needl.py

Usage

Needl runs as a daemon and will happily sit in the background chomping away 24/7, 365. Each module (task) has scheduled actions, for example random DNS queries will happen every 1 to 3 minutes. You can configure the intervals within ./data/settings.yaml.

usage: needl.py [-h] [--datadir DATADIR] [-d] [-v] [--logfile LOGFILE]
                [--pidfile PIDFILE]

Take back your privacy. Lose yourself in the haystack.

optional arguments:
  -h, --help         show this help message and exit
  --datadir DATADIR  Data directory
  -d, --daemon       Run as a deamon
  -v, --verbose      Increase logging
  --logfile LOGFILE  Log to this file. Default is stdout.
  --pidfile PIDFILE  Save process PID to this file. Default is /tmp/needl.pid.
                     Only valid when running as a daemon.

F.A.Qs

  1. Why not just use a VPN/Tor? And you should! Needl does not protect your legitimate traffic in any way. It simply generates more.

  2. By using Needl will my legitimate traffic be hidden/protected/safe? No. This isn't the goal of Needl. It's purpose is to generate more traffic to make it harder to identify your legitimate traffic. There's no evidence to suggest this actually works - it's a proof of concept.

  3. Can [insert service here] differentiate between Needl and my legitimate requests? In theory, yes. [insert service here] can track you with Cookies, Session data or algorithms. Needl will tackle this in the future.

  4. Where are your tests?!? Submit a pull request. Please.

Contributing

Check out the issue tracker and see what tickles your fancy.

  1. Fork it, baby!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request

License

MIT. See LICENSE

needl's People

Contributors

blitzkraft avatar eth0izzle avatar foobarquaxx avatar zitryss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

needl's Issues

Write upstart/init scripts

Currently Needl will daemonize when passing in the --daemon flag. We should create the appropriate scripts so the OS can handle running it as a service.

We need to explore the different options and their pros/cons. Upstart, init.d or service?

Randomise when tasks run

Currently tasks run on a fixed time basis, i.e very 2 minutes. The user should be able to configure a random time period for a task to run between, i.e. every 2 to 4 minutes in order to reduce predictability of the tool.

Error with webdriver

I folowed al the steps, but I get this when executing:

(ERROR) [init->_run_job]: Task failed: Traceback (most recent call last):
File "/opt/needl/needl/schedule/init.py", line 84, in _run_job
ret = job.run()
File "/opt/needl/needl/schedule/init.py", line 336, in run
ret = self.job_func()
File "/opt/needl/needl/tasks/google.py", line 17, in search
browser = utils.get_browser()
File "/opt/needl/needl/utils.py", line 60, in get_browser
return webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options)
File "/home/hernan/.local/lib/python3.5/site-packages/selenium/webdriver/chrome/webdriver.py", line 62, in init
self.service.start()
File "/home/hernan/.local/lib/python3.5/site-packages/selenium/webdriver/common/service.py", line 96, in start
self.assert_process_still_running()
File "/home/hernan/.local/lib/python3.5/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service /opt/needl/data/chromedriver unexpectedly exited. Status code was: 127

Is there anything else that I should do to the driver?
Thanks!

Disable HTTPs?

As Needl's sole purpose is to generate random traffic so it can be captured and stored, shall we disable HTTPs project wide? i.e when searching Google.

Thoughts?

Filter unwanted/inappropriate websites

Moved from #13.

Some of the modules have a possibility of generating traffic that could be harmful if not outright incriminating. Without some kind of "safe mode", users could be putting themselves at real risk.

Possible methods:

  • Filter by profanity in URL and other descriptors.
  • Use already safe methods like Google Safe search.
  • Filter by top-level domains (.org, .gov, .tech tend to be safer bets)

This could have the effect of making traffic less believable, any making any real "unsafe" traffic stand out more. This should certainly not be the default.

Write documentation

Mostly to explain what each module does and the options under data/settings.yaml.

Simulate a web browser more effectively

When Needl requests a website (i.e. Google) it will download the requested URL only. We should make every effort to behave like a web browser by downloading any of the websites resources such as images, CSS and JavaScript files.

Should we consider using a headless browser testing framework such as PhantomJS?

brainstorming limitations and features

which may or may not be existing/need refining/in the works...

  • randomized (but not TOO randomized) intervals... as below, general pattern mimicry would be ideal; exactly randomly between 1-10 seconds is not; humans are not just gravel, also rocks and boulders.
  • customizeable word lists
    • as crazy as it sounds, a chrome plugin to record actual searches and thereby use real starting data for mimicry might be effective (again, obfuscation vs privation)
  • variety of request types, ie POST, PATCH, DELETE... more tricky, but filtering vs GETS would be the first thing I'd do looking for real human logs
  • controlled variety of 'quest' depth. google+1click and then google something completely unrelated+1click is not convincing.

eg, my computer visiting 1000 random websites per day at 5 pages per minute is not going to be anywhere near convincing, given i visit a handful of sites in bursts normally (with that pattern already having been logged)


abstracted:

  • usage patterns that are not static randomness, but sporadic and clumpy, reasonably nonlinear
  • mimicry of actual/personalizeable trends in content

really abstracted:

  • better to make a handful of knitting needles than a busload of thumbtacks

I've said enough. Please close issue and destroy Github after reading.
🍺

Google task css selectors

I think the css selectors for the Google task might need updating to match their new Google results page.

On search, this is the error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="tsf"]"}

If clickthrough is enabled, this is the error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="resultStats"]"}

This is with selenium/chromedriver 88.0.4324.187.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.