Giter VIP home page Giter VIP logo

Comments (3)

eSoares avatar eSoares commented on July 17, 2024

I have been looking at the different implementations possibilites, and hame some notes that can help to someone who wants to implement this.
Unfortunately I'm not skilled enough nor have the time to master it, at the moment, to implement in the current project a good solution.

Some possible implementations notes:

  1. One possible approach is implement a behaviour, similar to ExtractLinks, that extracts M3U8 media segments. The JS in the browser could use something like (this parser)[https://github.com/globocom/m3u8].

  2. An alternative approach is when processing the content downloader by the browser, if the content is a M3U8 playlist, parse it and download the content.

  3. Make an extra step/tool to read warc files and for each M3U8 playlist there, download the media content, append it to the warc (or generate a new warc with the same requests as the original warc + the media content).

In my opinion, solution 1 is the cleanest. Solution 3 is the dirtiest since downloads related content at two different points in time.

from crocoite.

PromyLOPh avatar PromyLOPh commented on July 17, 2024

I wouldn’t consider option 3 “dirty”. In fact, it’s pretty clean and you can easily add a conversion record to the WARC containing the full video downloaded by, say “youtube-dl”, and referencing the original M3U8. Another option would be to click all play buttons for <video> and <audio> tags, wait until every one of those finishes playing, limit the network speed, rinse and repeat.

from crocoite.

eSoares avatar eSoares commented on July 17, 2024

The option 3 can be easily implemented using FFMPEG, the download of the various elements and the complete media record output. The only missing piece is appending the content to the warc file.

The latest option "clicking all play buttons" is the closest to the normal web browsing execution. But is the slowest, since the browser downloads media on demand and would take the time of the slowest media in the page.
On the upside, would be generic across all media types.

Ofc, all option need to be careful about encounter live-streams that never end.

from crocoite.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.