Giter VIP home page Giter VIP logo

docs's Introduction

Lookyloo icon

Lookyloo is a web interface that captures a webpage and then displays a tree of the domains, that call each other.

Gitter

What's in a name?!

Lookyloo ...

Same as Looky Lou; often spelled as Looky-loo (hyphen) or lookylou

1. A person who just comes to look.
2. A person who goes out of the way to look at people or something, often causing crowds and disruption.
3. A person who enjoys watching other people's misfortune. Oftentimes car onlookers that stare at a car accidents.

In L.A., usually the lookyloos cause more accidents by not paying full attention to what is ahead of them.

Source: Urban Dictionary

No, really, what is Lookyloo?

Lookyloo is a web interface that allows you to capture and map the journey of a website page.

Find all you need to know about Lookyloo on our documentation website.

Here's an example of a Lookyloo capture of the site github.com Screenshot of Lookyloo capturing Github

REST API

The API is self documented with swagger. You can play with it on the demo instance.

Installation

Please refer to the install guide.

Python client

pylookyloo is the recommended client to interact with a Lookyloo instance.

It is avaliable on PyPi, so you can install it using the following command:

pip install pylookyloo

For more details on pylookyloo, read the overview docs, the documentation of the module itself, or the code in this GitHub repository.

Notes regarding using S3FS for storage

Directory listing

TL;DR: it is slow.

If you have namy captures (say more than 1000/day), and store captures in a s3fs bucket mounted with s3fs-fuse, doing a directory listing in bash (ls) will most probably lock the I/O for every process trying to access any file in the whole bucket. The same will be true if you access the filesystem using python methods (iterdir, scandir...))

A workaround is to use the python s3fs module as it will not access the filesystem for listing directories. You can configure the s3fs credentials in config/generic.json key s3fs.

Warning: this will not save you if you run ls on a directoy that contains a lot of captures.

Versioning

By default, a MinIO bucket (backend for s3fs) will have versioning enabled, wich means it keeps a copy of every version of every file you're storing. It becomes a problem if you have a lot of captures as the index files are updated on every change, and the max amount of versions is 10.000. So by the time you have > 10.000 captures in a directory, you'll get I/O errors when you try to update the index file. And you absolutely do not care about that versioning in lookyloo.

To check if versioning is enabled (can be either enabled or suspended):

mc version info <alias_in_config>/<bucket>

The command below will suspend versioning:

mc version suspend <alias_in_config>/<bucket>

I'm stuck, my file is raising I/O errors

It will happen when your index was updated 10.000 times and versioning was enabled.

This is how to check you're in this situation:

  • Error message from bash (unhelpful):
$ (git::main) rm /path/to/lookyloo/archived_captures/Year/Month/Day/index
rm: cannot remove '/path/to/lookyloo/archived_captures/Year/Month/Day/index': Input/output error
  • Check with python
from lookyloo.default import get_config
import s3fs

s3fs_config = get_config('generic', 's3fs')
s3fs_client = s3fs.S3FileSystem(key=s3fs_config['config']['key'],
                                secret=s3fs_config['config']['secret'],
                                endpoint_url=s3fs_config['config']['endpoint_url'])

s3fs_bucket = s3fs_config['config']['bucket_name']
s3fs_client.rm_file(s3fs_bucket + '/Year/Month/Day/index')
  • Error from python (somewhat more helpful):
OSError: [Errno 5] An error occurred (MaxVersionsExceeded) when calling the DeleteObject operation: You've exceeded the limit on the number of versions you can create on this object
  • Solution: run this command to remove all older versions of the file
mc rm --non-current --versions --recursive --force <alias_in_config>/<bucket>/Year/Month/Day/index

Contributing to Lookyloo

To learn more about contributing to Lookyloo, see our contributor guide.

Code of Conduct

At Lookyloo, we pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. You can access our Code of Conduct here or on the Lookyloo docs site.

Support

  • To engage with the Lookyloo community contact us on Gitter.
  • Let us know how we can improve Lookyloo by opening an issue.
  • Follow us on Twitter.

Security

To report vulnerabilities, see our Security Policy.

Credits

Thank you very much Tech Blog @ willshouse.com for the up-to-date list of UserAgents.

License

See our LICENSE.

docs's People

Contributors

adrima01 avatar buildbricks avatar dependabot[bot] avatar fafnerkeyzee avatar felalex57 avatar keleranv avatar rafiot avatar wikijm avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

docs's Issues

Merging the install guide on one single page

What is changing?

I find that having two pages for the install guide is confusing. We should merge the components and lookyloo itself on one single page.

How will this impact users?

Improve clarity.

To which release does this apply?

  • Current release
  • [] Future release

Update Templates

Describe the change

(A clear and concise description of what you want to happen)
Remove suggested text. Comment it out

How will this impact users?

Make it easier to review

To which release does this apply?

  • Current release
  • Future release

Additional context

(Add any other context or screenshots about the feature request here)

Add page for scraping tutorial and testing

What is changing?

Nothing for the user of the platform, but there are now two new repositories (both are very WiP):

How will this impact users?

It won't.

To which release does this apply?

  • Current release
  • Future release

Check for typos and inconsistencies in docs

Expected behavior

Documentation is error-free.

Actual behavior

Some typos in the documentation.

Steps to reproduce the problem

  1. Navigate to lookyloo docs site

Additional information

Update links in README

Expected behavior

Links in README direct to correct page

Actual behavior

Links lead to a 404 page

Steps to reproduce the problem

  1. Navigate to README.
  2. Click link to page.

Additional information

(Add any other context about the problem here.)

Fix broken links in README

Expected behavior

Links in README direct to correct page

Actual behavior

Links lead to a 404 page

Steps to reproduce the problem

  1. Navigate to README.
  2. Click links to pages.

Additional information

(Add any other context about the problem here.)

Add update page

Describe the change

There is now an update script (bin/update.py), that an admin can use to update a Lookyloo instance.

How will this impact users?

It makes it easier to update everything at once.

The way to call it is

poetry run update

Note: --yes will update everything without user interaction.

Note: it needs to be called at the end of a fresh install: 32cfd76

Add Style Guide for Docs

What is changing?

(Please include as many details as possible)
Add style guide for documentation changes

How will this impact users?

Make writing docs for the site consistent and should be added to contrib guide.

To which release does this apply?

  • Current release
  • Future release

Context

Link to associated PRs or issues from other repos here.

Additional information

[general term usage] Always use capture instead of scrape

Describe the change

We need to make sure that we always say "capture" everywhere in the documentation an in the application: lookyloo does much more than scraping a page and it is misleading to use scrape/scraping.

Right now, we interchangeably use capture and scrape.

Create documentation outlining how lookyloo works for different use cases

What is changing?

Introducing people to Looklyloo with a few example use cases:

  1. Investigating malicious site activity
  2. Looking for tracking/ad infrastructure
  3. Investing for any kind of unified infrastructure over disparate sites
  4. Educating students on how the web works, "under the hood"
  5. Tracking the behavior and maintenance of a complex website

How will this impact users?

Context

Additional information

Rename main branch from master to main

What is changing?

Every repository in the Lookyloo org is using main as the main branch, in order to be consistent, we should move the docs to use the same scheme.

How will this impact users?

They will need to rename their branch locally, as described here: https://www.hanselman.com/blog/EasilyRenameYourGitDefaultBranchFromMasterToMain.aspx

To which release does this apply?

  • Current release

Additional information

We need to update the configuration of the repository on github too, and remove the existing master branch to avoid confusion.

Fix edit this page link (top right of the page)

Describe the change

It currently points to file://<repo_root>/modules/ROOT/pages/index.adoc, should point to the relevant page on github.

How will this impact users?

Will make this link usable.

To which release does this apply?

  • Current release
  • [] Future release

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.