Giter VIP home page Giter VIP logo

lookyloo / lookyloo Goto Github PK

View Code? Open in Web Editor NEW
655.0 19.0 80.0 5.29 MB

Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

Home Page: https://www.lookyloo.eu

License: Other

Python 69.07% CSS 0.99% JavaScript 6.07% HTML 23.69% Dockerfile 0.14% Shell 0.04%
information-security privacy web-security dfir capture scraping lookyloo

lookyloo's People

Contributors

adrima01 avatar adulau avatar antoniabk avatar arhamyss avatar buildbricks avatar cudeso avatar dependabot[bot] avatar docarmorytech avatar dssecret avatar fafnerkeyzee avatar felalex57 avatar numbuh474 avatar of-cag avatar rafiot avatar steveclement avatar sw-mschaefer avatar th4nat0s avatar vmdhhh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lookyloo's Issues

CSV export

Will be nice ( yes again )....

To have the capacity to export the data, json is an option but most of the time CSV is the usable by most people.

HIT, Called by, [type... javascript, cookie, etc..]

vala :)

BS4 missing from requirements

In a pristine Debian stable python3 installation lookyloo is not able to start since the Beautiful Soup 4 python module is missing from the requirements.

Integration of URL Abuse

The goal is to asynchronously fire requests to URL Abuse after the scraping is over and while the tree is displayed:

  • Every URL will be sent to every relevant endpoints
  • Every domain will be resolved and sent to every relevant endpoints

Export all domains

It would be nice to export all the domains at once to compare them between runs.

Mockups

  • Heritable display of tree node (two types: URL & type) -> need to represent inheritance from host-name node
  • Confirmation box for save

Collapse/expand tree/pop up window ambiguity

expand/collapse tree current links to windows, but text controls pop up window. Put text and tree circle on the same horizontal rule, and give them both a similar border, drop the inheritance like from between them. (or possibly from the right hand side of the new border?)

Scraping improvements

  • Proxy support
  • Pass a pre-generated cookie
  • Initial referrer
  • Locale of the browser
  • Login creds <= how to pass them properly in the webpage will be challenging (solved by passing a valid cookie)

Search box for UUID (hostname or url node)

Each Node (hostname tree and URL tree) has a UUID, adding a searchbox to put a UUID in in he main page -> load the tree and put a red box around the node.

Dependencies:

  • Dump a pickled tree to keep the UUIDs after first generation
  • For each pickle, dump the list of all UUIDs (Hostname/URL) in the directory for searching later

Requirements:

  • Force delete pickle for a tree (needs confirm box)

Errors when setting up lookyloo.service

Hello,
Is anyone able to share their copy of /etc/systemd/system/lookyloo.service ?

Here is mine:

[Unit]
Description=uWSGI instance to serve lookyloo
After=network.target

[Service]
User=root
Group=root
WorkingDirectory=/opt/lookyloo
Environment=PATH="/usr/bin/python"
ExecStart=/opt/lookyloo/bin/start.py
Environment=LOOKYLOO_HOME=/opt/lookyloo

[Install]
WantedBy=multi-user.target

And I'm getting the following error:

# sudo systemctl status lookyloo
● lookyloo.service - uWSGI instance to serve lookyloo
   Loaded: loaded (/etc/systemd/system/lookyloo.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2019-04-04 13:47:44 CEST; 2min 48s ago
  Process: 3857 ExecStart=/opt/lookyloo/bin/start.py (code=exited, status=126)
 Main PID: 3857 (code=exited, status=126)

Apr 04 13:47:44 server systemd[1]: Started uWSGI instance to serve lookyloo.
Apr 04 13:47:44 server systemd[1]: lookyloo.service: Main process exited, code=exited, status=126/n/a
Apr 04 13:47:44 server start.py[3857]: /usr/bin/env: ‘python3’: Not a directory
Apr 04 13:47:44 server systemd[1]: lookyloo.service: Failed with result 'exit-code'.

SVG interactions

Main hostname tree:

  • click on icon (i.e. JS) -> displays box with all URLs loading a JS
  • click on hostname -> display all the related URLs (same format as hostnames: line 1: URL, Line 2: icons)

Overlay box:

  • click on icon (i.e. JS) -> download the content

Add basic user agent support

A few user agents, and free text box for folks who want to shoot themselves in the foot. (with a link to info on user agents so they can avoid their feet if they like)

show redirects vertically rather than horizontally?

Because they don't return resources to the browser I think redirects are qualitatively different from other reference types like script and css sources and iframes, but they currently manifest in the same way as depth in the tree. Since redirects typically happen before resources are loaded there would generally be lots of extra vertical space available in the earlier parts of the tree, so perhaps they could be oriented vertically to emphasize this difference? For example cnn.com (https://lookyloo.circl.lu/tree/5ea5cebb-9223-42db-bdeb-34543b237b05) shows

cnn.com --> www.cnn.com --> www.cnn.com --> edition.cnn.com --> ... resources ...

would it be possible to get them to render more like this

cnn.com
   V
www.cnn.com
   V
www.cnn.com
   V
edition.cnn.com --> ... resources ...

Docker-compose failes on initializing Async-scraper

Hi,

today I wanted to setup a docker container and faced the following issue. All previous 16/19 steps went well. Could someone have a look and advise how to fix it? Thank you.

Step 17/19 : run nohup pipenv run async_scrape.py
---> Running in 0197ffd4a2bc
Loading .env environment variables…
09:06:05 AsyncScraper INFO:Initializing AsyncScraper
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 538, in connect
sock = self._connect()
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 861, in _connect
sock.connect(self.path)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/bin/async_scrape.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/root_lookyloo/lookyloo/bin/async_scrape.py", line 36, in
m = AsyncScraper()
File "/root_lookyloo/lookyloo/bin/async_scrape.py", line 24, in init
self.lookyloo = Lookyloo(loglevel=loglevel, only_global_lookups=only_global_lookups)
File "/root_lookyloo/lookyloo/lookyloo/lookyloo.py", line 45, in init
if not self.redis.exists('cache_loaded'):
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/client.py", line 1307, in exists
return self.execute_command('EXISTS', *names)
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/client.py", line 836, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 1071, in get_connection
connection.connect()
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 543, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 2 connecting to unix socket: /root_lookyloo/lookyloo/cache/cache.sock. No such file or directory.
ERROR: Service 'lookyloo' failed to build: The command '/bin/sh -c nohup pipenv run async_scrape.py' returned a non-zero code: 1

Option to disable or rename session cookies

LookyLoo sets a session cookie (boringly named session). This is an issue if LookyLoo is being used behind a reverse proxy with an access authorization system that also happens to set a cookie named session -- the effect is that:

  1. request comes to the reverse proxy; reverse proxy does its magic and sets its session cookie to persist the authorization status;
  2. request is sent further to the upstream (i.e. LookyLoo).
  3. LookyLoo sets its own session cookie, since the one set by the reverse proxy does not conform to whatever LookyLoo expects
  4. response is returned to the client -- with the LookyLoo session cookie overwriting the reverse proxy cookie
  5. upon the next request, the whole dance starts over

This results in no session persistence and LookyLoo not working properly behind such a reverse proxy. It would be swell if it were possible to change the name of the session cookie set by LookyLoo so as not to clash with potential reverse proxy.

The cookie seems not necessary -- blocking Set-Cookie on the reverse proxy (so that it does not reach the browser) does not seem to result in loss of functionality.


For the record, a quick and dirty workaround for nginx is:

  1. make sure the reverse proxy session cookie is not sent back to LookyLoo upstream;
  2. make sure that any Set-Cookie header set by LookyLoo is blocked from reaching the user browser.

There does not seem to be a way of modifying cookie headers sent to upstreams directly in nginx config), so point 1. would either have to use Lua (like in our case) or some other method; point 2. can be done with proxy_hide_header Set-Cookie; nginx config directive.

A Folding search

Hello,

It would be nice to have a "search" which will find and unfold only the relevant path to the result of the search.

MISP Integration

Lookups:

  • Domains
  • URLs & Part of URL
  • Hashes of JS/exe, ...
  • Cookies

Push:

  • Domains
  • URLs & Part of URL
  • Any content (JS/exe, ...)
  • Cookies

Link overlay box to source node

When the user clicks on a hostname, or an icon, it loads an overlay box that can be moved around.

The box needs to be connected to the originating node.

Documentation: where does LookyLoo keep the scraped data

It would be helpful to have information where does LookyLoo keep the scraped data -- this would be required, for example, to set up volume-mounts in the docker volume so that scraped data persists across containers being recreated.

Screenshots

It would be an amazing improvement if screenshots of each of the HTML pages retrieved in the process of scraping were available via the interface for inspection (this would be very informative when researching a targeting phishing attack, for instance).

Duplicates

  • Same cookies set by multiple websites
  • Same JavaScript / Executable / Json / ...

Missing icons

File types:

  • Text
  • Audio
  • Empty content
  • POSTed in request
  • CSS
  • JSON
  • HTML
  • EXE
  • Image
  • Font
  • octet-stream
  • Video
  • Livestream
  • Link comes from an Iframe
  • No Mimetype (empty string)
  • No known type (no corresponding icon)
  • Suspected phishing (#190) -> fish + question mark?

Buttons:

  • Download URL content
  • Display URLs related to the domain

Nginx Gateway Timeout

Hello,

I am running Lookyloo in Production, and have nginx running.

Whenever I submit a URL for scanning, I get a page returned saying:

504 Gateway Time-out
nginx/1.14.0 (Ubuntu)

Here is the settings under vim /etc/nginx/sites-enabled/lookyloo

server {
    listen 80;
    server_name lookyloo;

    location / {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 10;
        proxy_read_timeout 10;
        proxy_pass http://localhost:5100/;
    }
}

I can't find a solution to this issue, are you able to assist?

Add collections

The possibility to "group" scan results.

Perhaps via tags or similar.

e.g: cdn.foo.example could be a group of all the sites using that cdn.

But perhaps thinking about "real" correlations would be more efficient.

Report lookup redirects to index despite tree_uuid created

I observed the following behavior using https://www.circl.lu/urlabuse/

  1. Go to https://www.circl.lu/urlabuse/
  2. Insert a Link and hit Run lookup
  3. Click the Link 'See on Lookyloo'
  4. You are redirected to the index

The link contains a valid tree_uuid but it seems that lookup_report_dir doesn't return a valid report_dir and thus redirects you to the index.

After some moments the report is viewable.

Expected behavior:
Show an in progress notice while keeping the url intact to enable manuel refresh (F5) or redirect to the finished report once it is done.

Anonymous submit.

It will be nice to have a "don't remember me " button which allow the scanned website to not be published. ( PORN^WGDPR need )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.