Giter VIP home page Giter VIP logo

pymediadump's Introduction

All of this has started as simple script to download flash games from newgrounds...

Description:

This is a python-powered tool to download required media files from sites. Dont set your hopes high tho - it only supports grabbing stuff thats accessible via page's html source (for more advanced needs check youtubedl or something like that) and only if you will feed it with search rules to find the data you are looking for (but I've already provided you with some examples and also made guide regarding how to write your own, so it shouldnt be difficult to start with). If I wont lose interest, someday this may get more features and maybe even gui, but for now - thats what you get. Just a mere cli web scraper, configurable with .ini

Dependencies:

  • python 3.8+
  • python-requests

Example Usage:

pmd-cli https://www.newgrounds.com/portal/view/746618 -d ./ng-games

QA:

Q: - BUT WHY ARE YOU USING REGULAR EXPRESSIONS? THEY ARE SO BAD, USE BS4 INSTEAD

A: - Well... As I said in title, this project has started as tool to download flash games from newgrounds (coz adobe broke their browser extension and the only way to play spicy pixel stuff right now is via standalone flash player... or at least thats how things are on linux). And direct download links to these are hidden inside in-line javascript of game's webpage. And, since bs4 cant into js - I'd need to use regular expressions anyway (or something like slimit, which will bloat list of external dependencies even more). Surely I could go for it, but the thing is - I have no clues how to design download rules for that. Thus I went for "if it works - it works". I may change my opinion later, if necessary - but for now this nasty tool is fueled with regexp power.

License:

GPLv3

pymediadump's People

Contributors

alex-eg avatar moonburnt avatar

Watchers

 avatar

Forkers

alex-eg

pymediadump's Issues

Code Refactoring

This bot has been written eternity ago and looks like crap. It could be nice to at least restyle it to match pep-8. And probably also fix some rudimentary things here and there

Plugin-based scrapers

Current approach to do parsing is extremely rudimentary, coz regexp-driven parsers are somewhat limited and dont work with some sites.
Thus it could be nice to rework whole mechanism of data processing handing into plugin-driven system, where each download rule can specify, which exact parser (default regexp, bs4, maybe selenium) it wants to use

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.