Giter VIP home page Giter VIP logo

0999ad / docker-telegram-scraper-over-tor Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 68 KB

This Dockerized Python project is tailored for scraping content from designated Telegram channels using predefined keywords, with Tor network integration for enhanced privacy. It highlights Selenium for robust web scraping and Flask for a dynamic web dashboard.

Python 73.50% HTML 24.93% Shell 1.57%

docker-telegram-scraper-over-tor's Introduction

TeleScrape

Dockerised - Enhanced Telegram Channel Scraper using TOR and a Flask Dashboard for results

Currently, tested and run on Ubuntu 22.04.4 LTS

Legal Disclaimer

This software is designed solely for educational and research purposes and should be used with ethical considerations in mind. Users are responsible for ensuring their activities comply with local laws and regulations. The authors of this software bear no responsibility for any misuse or potential damages arising from its use. It's imperative to adhere to the terms of service of any platforms interacted with through this tool.

Overview

TeleScrape is an advanced tool for extracting content from Telegram channels, emphasizing user privacy through Tor integration and providing real-time insights via a dynamic Flask dashboard. It eschews the need for Telegram's API by utilizing Selenium for web scraping, offering a robust solution for data gathering from public Telegram channels.

Key Features

  • Enhanced Privacy: Routes all scraping through the Tor network to protect user anonymity.
  • Keyword-Driven Scraping: Fetches channel content based on user-defined keywords, focusing on relevant data extraction.
  • Interactive Web Dashboard: Utilizes Flask to present scraping results dynamically, with real-time updates and insights.
  • Efficient Parallel Processing: Employs concurrent scraping to expedite data collection from multiple channels simultaneously.
  • User-Friendly Customization: Designed for easy adaptability to specific requirements, supporting straightforward modifications and extensions.

Table of Contents

Features

  • Automated scraping of specified Telegram channels.
  • Privacy-focused scraping through the Tor network.
  • Interactive web dashboard to display scraping results and update configurations.
  • Continuous scraping with the ability to start and stop processes through the dashboard.

Prerequisites

Before you begin, ensure you have Docker installed on your system. Visit Docker's official installation guide to get started.

Installation

Follow these steps to get TeleScrape up and running on your machine:

  1. Clone the repository:
    git clone https://github.com/yourusername/Docker-Telegram-Scraper-over-TOR
    cd Docker-Telegram-Scraper-over-TOR/

Rename your Dockerfile to reflect your OS Distro

  1. Build the Docker image:

    sudo docker build -t telescraper .
  2. Run the Docker container:

    sudo docker run -p 8081:8081 telescraper

if you encounter an issue try the following

   sudo docker run -p 8081:8081 -v $(pwd):/app telescraper

This command runs the application and makes it accessible via localhost on port 8081.

Usage

Once the application is running, you can access the dashboard by navigating to http://localhost:8081 in your web browser. Here you can:

  • View scraped data from Telegram channels.
  • Update keywords and channels for scraping.
  • Restart the scraping process.

Configuration

To customize the scraping process:

  • Edit the keywords.txt to modify or add new keywords.
  • Update the bespoke_channels.txt to add or remove Telegram channels.

Contributing

Contributions are welcome! If you have suggestions for improvements or bug fixes, please feel free to:

  • Fork the repository.
  • Create a new branch (git checkout -b feature-branch).
  • Make your changes.
  • Submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Support

If you encounter any problems or have any queries about deploying the application, please open an issue in the GitHub repository.

docker-telegram-scraper-over-tor's People

Contributors

0999ad avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

ali930ali

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.