Giter VIP home page Giter VIP logo

0999ad / telegram-scraper-over-tor Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 116 KB

This advanced Python-based project is designed for scraping content from specified Telegram channels using predefined keywords, with enhanced privacy through Tor network integration. It showcases the capabilities of Selenium for robust web scraping and Flask for serving a dynamic web-based dashboard.

Python 73.66% HTML 26.34%

telegram-scraper-over-tor's Introduction

TeleScrape

Enhanced Telegram Channel Scraper using TOR and a Flask Dashboard for results

Legal Disclaimer

This software is designed solely for educational and research purposes and should be used with ethical considerations in mind. Users are responsible for ensuring their activities comply with local laws and regulations. The authors of this software bear no responsibility for any misuse or potential damages arising from its use. It's imperative to adhere to the terms of service of any platforms interacted with through this tool.

Overview

TeleScrape is an advanced tool for extracting content from Telegram channels, emphasizing user privacy through Tor integration and providing real-time insights via a dynamic Flask dashboard. It eschews the need for Telegram's API by utilizing Selenium for web scraping, offering a robust solution for data gathering from public Telegram channels.

Key Features

  • Enhanced Privacy: Routes all scraping through the Tor network to protect user anonymity.
  • Keyword-Driven Scraping: Fetches channel content based on user-defined keywords, focusing on relevant data extraction.
  • Interactive Web Dashboard: Utilizes Flask to present scraping results dynamically, with real-time updates and insights.
  • Efficient Parallel Processing: Employs concurrent scraping to expedite data collection from multiple channels simultaneously.
  • User-Friendly Customization: Designed for easy adaptability to specific requirements, supporting straightforward modifications and extensions.

Technical Details

Prerequisites

  • Python 3.x
  • Flask
  • BeautifulSoup4
  • Selenium
  • Requests

Setting Up

  1. Python 3.x Installation: Verify Python 3.x is installed on your system.
  2. Dependencies: Install the required Python packages using pip.
    pip install flask beautifulsoup4 selenium requests pysocks
  3. Tor Configuration: Install Tor locally and ensure it's configured to run a SOCKS proxy on localhost:9050.
  4. WebDriver Setup: Ensure the Chrome WebDriver is installed and properly configured in the script's path settings.

Project Structure

  • TeleScrape.py: The main script, encapsulating the scraping logic, Flask application, and Tor setup.
  • keywords.txt: Text file listing the keywords for content scraping.
  • /templates: Folder containing HTML templates for the Flask-based dashboard.

Getting Started

  1. Keyword Configuration: Populate keywords.txt with your desired keywords.
  2. Script Execution: Launch TeleScrape.py to start scraping and activate the Flask dashboard.
    python TeleScrape.py
  3. Dashboard Navigation: Access http://127.0.0.1:8081/ on your browser to view the scraping progress and results live.

Dashboard Highlights

  • Real-Time Refresh: Automatically updates to display the latest scraping data.
  • Keyword Visualization: Keywords and matches are highlighted within the content for better clarity.
  • Adaptive Design: Ensures a consistent experience across various devices and resolutions.

Screenshot 2024-04-10 at 15 38 31

Contributing

Contributions are highly appreciated! If you have improvements or suggestions, please fork this repository, commit your changes, and submit a pull request for review.

License

This project is distributed under the MIT License, fostering widespread use and contribution by providing a lenient framework for software distribution and modification.

telegram-scraper-over-tor's People

Contributors

0999ad avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.