crawley

Crawls web pages and prints any link it can find.

features

fast html SAX-parser (powered by golang.org/x/net/html)
small (<3000 SLOC), idiomatic, 100% test covered codebase
grabs most of useful resources urls (pics, videos, audios, forms, etc...)
found urls are streamed to stdout and guranteed to be unique (with fragments omitted)
scan depth (limited by starting host and path, by default - 0) can be configured
can crawl rules and sitemaps from robots.txt
brute mode - scan html comments for urls (this can lead to bogus results)
make use of HTTP_PROXY / HTTPS_PROXY environment values
directory-only scan mode (aka fast-scan)
user-defined cookies, in curl-compatible format (i.e. -cookie "ONE=1; TWO=2" -cookie "ITS=ME" -cookie @cookie-file)
user-defined headers, same as curl: -header "ONE: 1" -header "TWO: 2" -header @headers-file

installation

binaries for Linux, FreeBSD, macOS and Windows

Archlinux User Repository

Crawley is available in the AUR. Linux distributions with access to it can obtain the package from here. You can also use your favourite AUR helper to install it, e. g. paru -S crawley-bin.

usage

crawley [flags] url

possible flags:

-brute
    scan html comments
-cookie value
    extra cookies for request, can be used multiple times, accept files with '@'-prefix
-delay duration
    per-request delay (0 - disable) (default 150ms)
-depth int
    scan depth (-1 - unlimited)
-dirs string
    policy for non-resource urls: show / hide / only (default "show")
-header value
    extra headers for request, can be used multiple times, accept files with '@'-prefix
-headless
    disable pre-flight HEAD requests
-help
    this flags (and their defaults) description
-robots string
    policy for robots.txt: ignore / crawl / respect (default "ignore")
-silent
    suppress info and error messages in stderr
-skip-ssl
    skip ssl verification
-user-agent string
    user-agent string
-version
    show version
-workers int
    number of workers (default - number of CPU cores)

5l1v3r1 / crawley Goto Github PK

crawley's Introduction

crawley

features

installation

Archlinux User Repository

usage

crawley's People

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent