Giter VIP home page Giter VIP logo

crawley's Introduction

Build Go Report Card Maintainability Test Coverage

License Go Version Release Downloads Mentioned in Awesome Go

crawley

Crawls web pages and prints any link it can find.

features

  • fast html SAX-parser (powered by golang.org/x/net/html)
  • small (<3000 SLOC), idiomatic, 100% test covered codebase
  • grabs most of useful resources urls (pics, videos, audios, forms, etc...)
  • found urls are streamed to stdout and guranteed to be unique (with fragments omitted)
  • scan depth (limited by starting host and path, by default - 0) can be configured
  • can crawl rules and sitemaps from robots.txt
  • brute mode - scan html comments for urls (this can lead to bogus results)
  • make use of HTTP_PROXY / HTTPS_PROXY environment values
  • directory-only scan mode (aka fast-scan)
  • user-defined cookies, in curl-compatible format (i.e. -cookie "ONE=1; TWO=2" -cookie "ITS=ME" -cookie @cookie-file)
  • user-defined headers, same as curl: -header "ONE: 1" -header "TWO: 2" -header @headers-file

installation

  • binaries for Linux, FreeBSD, macOS and Windows

Archlinux User Repository

Crawley is available in the AUR. Linux distributions with access to it can obtain the package from here. You can also use your favourite AUR helper to install it, e. g. paru -S crawley-bin.

usage

crawley [flags] url

possible flags:

-brute
    scan html comments
-cookie value
    extra cookies for request, can be used multiple times, accept files with '@'-prefix
-delay duration
    per-request delay (0 - disable) (default 150ms)
-depth int
    scan depth (-1 - unlimited)
-dirs string
    policy for non-resource urls: show / hide / only (default "show")
-header value
    extra headers for request, can be used multiple times, accept files with '@'-prefix
-headless
    disable pre-flight HEAD requests
-help
    this flags (and their defaults) description
-robots string
    policy for robots.txt: ignore / crawl / respect (default "ignore")
-silent
    suppress info and error messages in stderr
-skip-ssl
    skip ssl verification
-user-agent string
    user-agent string
-version
    show version
-workers int
    number of workers (default - number of CPU cores)

crawley's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.