Giter VIP home page Giter VIP logo

web-wanderer's Introduction

GoLang Web Scraper

This is a GoLang application that allows you to scrape web content from one or more URLs. It retrieves the HTML content of the specified URLs and saves it to local files. Additionally, it can extract metadata from the HTML, such as the number of links and images present on the page.

Prerequisites

Before running this application, make sure you have the following installed:

  • GoLang (version 1.16 or higher)

Usage

To use this application, follow the steps below:

  1. Clone the repository or copy the code into a local file.

  2. Open a terminal or command prompt and navigate to the directory containing the Go file.

  3. Build the application by running the following command:

    go build
  4. Run the application with the desired command-line arguments. The supported options are:

    • --metadata: Enables metadata mode, which extracts additional information from the HTML content.

    • <URLs>: Provide one or more URLs as command-line arguments, separated by spaces.

    Example usage:

    ./web-scraper --metadata https://www.example.com https://www.another-example.com

    Replace web-scraper with the name of the built executable file.

  5. The application will retrieve the HTML content of the specified URLs, save it to local files, and display metadata (if enabled). The HTML content will be saved as <hostname>.html, and the associated resources (images, stylesheets, etc.) will be saved in a folder named <hostname>_content.

Features

  • Fetches HTML content from one or more URLs.

  • Saves HTML content to local files.

  • Extracts metadata from HTML, including the number of links and images.

  • Automatically downloads associated resources (images, stylesheets, etc.) and updates the HTML with the correct local URLs.

Customization

You can customize the behavior of the application by modifying the Go code. For example, you can add additional metadata extraction logic or enhance the resource downloading process.

License

This application is open-source and distributed under the MIT License.

Disclaimer

This application is provided as-is without any warranty. Use it at your own risk.

web-wanderer's People

Contributors

ecostack avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.