Giter VIP home page Giter VIP logo

youtube-scraper's Introduction

Youtube Data Scraper Java Library

IMPORTANT NOTE: I no longer have time to support this project. At the moment (Sep 2021) everything seems to be working fine. But if the YouTube API changes, some functions may break at any moment.

This library is Java 11 Youtube public API HTTP client. Basically this tool is designed to retrieve and store in a database all publicly available comments for any youtube channel by a provided channel Id.

Links to related repositories:
Youtube Scraper SpringBoot Web App.
Youtube Scraper Web App Angular Client.

Main features:

  • supported endpoints:
    • youtube.com/channel/%s
    • youtube.com/%s/videos
    • youtube.com/watch
    • youtube.com/browse_ajax
    • youtube.com/comment_service_ajax
    • youtube.com/youtubei/v1/browse
  • supported entities:
    • channel metadata
    • channel microformat
    • channel video
    • video comment / reply
  • fetching data:
    • channel metadata by channel ID
    • videos list by channel ID
    • comments and replies by video ID
    • comments and replies by channel ID
  • supported comment fetching modes:
    • top comments first
    • newest comments first
  • store data:
    • H2 database / Hibernate
    • filesystem

Usage examples

See the youtubescraper.examples package for up-to-date examples.

Get channel metadata:

    String channelId = "UCksTNgiRyQGwi2ODBie8HdA";
    YoutubeChannelMetadataClient channelHttpClient = new YoutubeChannelMetadataClient(channelId);
    System.out.println(channelHttpClient.getChannelMetadata());
    System.out.println(channelHttpClient.getChannelMicroformat());
    System.out.println(channelHttpClient.getChannelHeader());
    System.out.println(channelHttpClient.getChannelMetadata().getVanityChannelUrl());
    System.out.println(channelHttpClient.getChannelHeader().getSubscriberCountText());
    System.out.println(channelHttpClient.getChannelVanityName());

Get video list by channel ID:

    String channelId = "UCksTNgiRyQGwi2ODBie8HdA";
    ChannelVideosCollector collector = new ChannelVideosCollector(channelId);
    ChannelVideosDTO channel = collector.call();
    List<VideoDTO> videos = channel.getVideos();
    for (int i = 0; i < videos.size(); i++) {
        VideoDTO video = videos.get(i);
        System.out.println(String.format("%s [%s] %s", i + 1, video.getVideoId(), video.getTitle()));
    }

Print comments to console by a video ID:

    String videoId = "ipAnwilMncI";
    Runnable runner = CommentRunnerFactory.newInstance(
            videoId,
            new CommentConsolePrinter(new CommentHumanReadableFormatter()),
            CommentOrderCfg.TOP_FIRST,
            CommentIteratorCfg.newInstance(1000, 10)
    );
    runner.run();

Get comments by list of video IDs:

    String[] ids = {
            "D2bB1bz9Z9s", "LqihfRVj8hM", "_oaSgmoy9aA", "lIlSNpLkO-A", "XQ_cQ9I7_YA",
            "Dtk2xgBZTec", "pEr1TtCB7_Y", "NMg6DQSO5VE", "bhE2RaN4VcI", "pJJE7R8xteQ"
    };
    CustomExecutorService executor = CustomExecutorService.newInstance();
    Arrays.stream(ids).map(videoId -> newDefaultFileAppender(videoId, CommentOrderCfg.NEWEST_FIRST)).forEach(executor::submit);
    executor.awaitAndTerminate();

Get all channel comments by channel ID:

    String channelId = "UCksTNgiRyQGwi2ODBie8HdA";
    ChannelVideosCollector collector = new ChannelVideosCollector(channelId);
    ChannelVideosDTO channelVideos = collector.call();
    CustomExecutorService executor = CustomExecutorService.configure()
            .numberOfThreads(10).timeout(Duration.ofMinutes(10)).toBuilder().build();
    channelVideos.getVideos().stream().map(
            v -> newDefaultFileAppender(v.getVideoId(), CommentOrderCfg.NEWEST_FIRST)
    ).forEach(executor::submit);
    executor.awaitAndTerminate();

Store channel comments to a database:

    String channelId = "UCksTNgiRyQGwi2ODBie8HdA";
    HibernateChannelRunner.newBuilder(channelId)
            .withExecutor(20, Duration.ofHours(1))
            .processAllChannelComments().build().call();

Technology Stack

Component Technology
Runtime Java 11
Http client java.net.http.HttpClient, Brotli decoder
Data mapping Jackson, ModelMapper
Data persistence Hibernate 5, H2 database, PostgreSQL

youtube-scraper's People

Contributors

alexshavlovsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.