sokomishalov / skraper Goto Github PK

Kotlin/Java library and cli tool for scraping posts and media from various sources with neither authorization nor full page rendering (Facebook, Instagram, Twitter, Youtube, Tiktok, Telegram, Twitch, Reddit, 9GAG, Pinterest, Flickr, Tumblr, Coub, Vimeo, IFunny, VK, Odnoklassniki, Pikabu)

License: Apache License 2.0

Kotlin 98.93% Shell 0.47% Batchfile 0.01% Procfile 0.08% Java 0.51%

scraper facebook twitter instagram reddit youtube 9gag pinterest ifunny pikabu

skraper's Introduction

Skraper

~~Here should be some fancy logo~~

Overview

Kotlin/Java library and cli tool which allows scraping and downloading posts, attachments, other meta from more than 10 sources without any authorization or full page rendering. Based on jsoup, jackson and kotlin-coroutines.

Repository contains:

Cli tool
Kotlin library
Telegram bot

Current list of implemented sources:

Bugs

Unfortunately, each web-site is subject to change without any notice, so the tool may work incorrectly because of that. If that happens, please let me know via an issue.

Cli tool

Cli tool allows to:

download media with flag --media-only from almost all presented sources.
scrape posts meta information

Requirements:

Java: 1.8 +
Maven (optional)

Build tool

./mvnw clean package -DskipTests=true

Usage:

./skraper --help

usage: [-h] PROVIDER PATH [-n LIMIT] [-t TYPE] [-o OUTPUT] [-m]
       [--parallel-downloads PARALLEL_DOWNLOADS]

optional arguments:
  -h, --help                                show this help message and exit

  -n LIMIT, --limit LIMIT                   posts limit (50 by default)

  -t TYPE, --type TYPE                      output type, options: [log, csv, json, xml, yaml]

  -o OUTPUT, --output OUTPUT                output path

  -m, --media-only                          scrape media only

  --parallel-downloads PARALLEL_DOWNLOADS   amount of parallel downloads for media items if
                                            enabled flag --media-only (4 by default)


positional arguments:
  PROVIDER                                  skraper provider, options: facebook, instagram,
                                            twitter, youtube, tiktok, telegram, twitch, reddit,
                                            9gag, pinterest, flickr, tumblr, ifunny, vk, pikabu,
                                            vimeo, odnoklassniki, coub

  PATH                                      path to user/community/channel/topic/trend

Examples:

./skraper 9gag /hot 
./skraper reddit /r/memes -n 5 -t csv -o ./reddit/posts
./skraper instagram /explore/tags/memes -t json
./skraper flickr /photos/harrythehawk -t yaml
./skraper pinterest /levato/meme -t xml
./skraper youtube /user/JetBrainsTV/videos --media-only -n 2

Kotlin Library

Distribution

Maven:

<dependency>
    <groupId>ru.sokomishalov.skraper</groupId>
    <artifactId>skrapers</artifactId>
    <version>x.y.z</version>
</dependency>

Gradle kotlin dsl:

implementation("ru.sokomishalov.skraper:skrapers:x.y.z")

Usage

Instantiate specific scraper

As mentioned before, the provider implementation list is:

After that usage as simple as is:

val skraper = InstagramSkraper(client = OkHttpSkraperClient())

Important moment: it is highly recommended to not use DefaultBlockingSkraperClient . There are some more efficient, non-blocking and resource-friendly implementations for SkraperClient. To use them you just have to put required dependencies in the classpath.

Current http-client implementation list:

DefaultBlockingClient: simple java.net.* blocking api implementation
OkHttpSkraperClient: okhttp3 implementation
SpringReactiveSkraperClient: spring-webflux client implementation
KtorSkraperClient: ktor-client-jvm implementation

Available methods

Each scraper is a class which implements Skraper interface:

interface Skraper {
    val client: SkraperClient
    fun getPosts(path: String): Flow<Post>
    suspend fun getPageInfo(path: String): PageInfo?
    fun supports(media: Media): Boolean
    suspend fun resolve(media: Media): Media
}

Also, there are some provider-specific kotlin extensions for implementations. You can find them out at the provider implementation package.

Usage from plain Java

There is an out-of-box java interop utility class ru.sokomishalov.skraper.util.JavaInterop:

class Example {
    public static void main(String[] args) {
      Skraper skraper = new InstagramSkraper();
      List<Post> posts = JavaInterop.limitedFlow(skraper.getPosts("/memes.video"), 10);
      PageInfo info = JavaInterop.callBlocking(cont -> skraper.getPageInfo("/memes.video", cont));
    }
}

Scrape user/community/channel/topic/trend posts

To scrape the latest posts for specific user, channel or trend use skraper like that:

suspend fun main() {
    val skraper = FacebookSkraper()
    val posts = skraper.getUserPosts(username = "memes").take(2).toList() // extension for getPosts()
    // or 
    val postsDetected = Skrapers.getPosts(url = "https://facebook.com/memes") // aggregating singleton
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(posts))
}

Received data structure is similar to each other provider's. Output data example:

[
  {
    "id": "5029851093699104",
    "text": "gotta love em!",
    "publishedAt": 1580744400000,
    "statistics": {
      "likes": 79,
      "comments": 3
    },
    "media": [
      {
        "url": "https://facebook.com/memes/posts/5029851093699104?__xts__%5B0%5D=68.ARA2yRI2YnlXQRKX7Pdphh8ztgvnP11aYE_bZFPNmqLpJZLhwJaG24gDPUTiKDLv-J_E09u2vLjCXalpmEuGSmVR0BkVtcng_i6QV8x5e-aZUv0Mkn1wwKLlhp5NNH6zQWKlqDqRjZrwvcKeUi0unzzulRCHRvDIrbz2leM6PLescFySwMYbMmKFc7ctqaC_F7nJ09Ya0lz9Pqaq_Rh6UsNKom6fqdgHAuoHV894a3QRuyY0BC6fQuXZLOLbRIfEVK3cF9Z5UQiXUYruCySF-WpQEV0k72x6DIjT6B3iovYFnBGHaji9VAx2PByZ-MDs33D1Hz96Mk-O1Pj7zBwO6FvXGhkUJgepiwUOVd0q-pV83rS5EhjtPFDylNoNO2xkDUSIi483p49vumVPWtmab8LX1V6w2anf55kh6pedCXcH3D8rBjz8DaTBnv995u9kk5im-1-HdAGQHyKrCZpaA0QyC-I4oGsCoIJGck3RO8u_SoHcfe2tKjTgPe6j9p1D&__tn__=-R",
        "aspectRatio": 0.864,
        "duration": 10860.000000000
      }
    ]
  },
  {
    "id": "4990218157662398",
    "text": "Interesting",
    "publishedAt": 1580742000000,
    "statistics": {
      "likes": 3092,
      "comments": 514
    },
    "media": [
      {
        "url": "https://scontent.fhrk1-1.fna.fbcdn.net/v/t1.0-0/p526x296/52333452_10157743612509879_529328953723191296_n.png?_nc_cat=1&_nc_ohc=oNMb8_mCbD8AX-w9zeY&_nc_ht=scontent.fhrk1-1.fna&oh=ca8a719518ecfb1a24f871282b860124&oe=5E910D0C",
        "aspectRatio": 0.8960573476702509
      }
    ]
  }
]

You can see the full model structure for posts and others here

Scrape user/community/channel/topic/trend info

It is possible to scrape user/channel/trend info for some purposes:

suspend fun main() {
    val skraper = TwitterSkraper()
    val pageInfo = skraper.getUserInfo(username = "memes") // extension for `getPageInfo()`
    // or 
    val pageInfoDetected = Skrapers.getPageInfo(url = "https://twitter.com/memes") // aggregating singleton
    println(JsonMapper().writerWithDefaultPrettyPrinter().writeValueAsString(pageInfo))
}

Output:

{
  "nick": "memes",
  "name": "Memes.com",
  "description": "http://memes.com is your number one website for the funniest content on the web. You will find funny pictures, funny memes and much more.",
  "statistics": {
    "posts": 10848,
    "followers": 154718
  },
  "avatar": {
    "url": "https://pbs.twimg.com/profile_images/824808708332941313/mJ4xM6PH_normal.jpg"
  },
  "cover": {
    "url": "https://abs.twimg.com/images/themes/theme1/bg.png"
  }
}

Resolve provider relative url

Sometimes you need to know direct media link:

suspend fun main() {
    val skraper = InstagramSkraper()
    val info = skraper.resolve(Video(url = "https://www.instagram.com/p/B-flad2F5o7/"))
    val serializer = JsonMapper().writerWithDefaultPrettyPrinter()
    println(serializer.writeValueAsString(info))
}

Output:

{
  "url": "https://scontent-amt2-1.cdninstagram.com/v/t50.2886-16/91508191_213297693225472_2759719910220905597_n.mp4?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=104&_nc_ohc=27bC52qar_oAX-7J2Zh&oe=5EC0BC52&oh=0aafee2860c540452b76e7b8e336147d",
  "aspectRatio": 0.8010012515644556,
  "thumbnail": {
    "url": "https://scontent-amt2-1.cdninstagram.com/v/t51.2885-15/e35/91435498_533808773845524_5302421141680378393_n.jpg?_nc_ht=scontent-amt2-1.cdninstagram.com&_nc_cat=100&_nc_ohc=8gPAcByc6YAAX_kDBWm&oh=5edf6b9d90d606f9c0e055b7dbcbfa45&oe=5EC0DDE8",
    "aspectRatio": 0.8010012515644556
  }
}

Download media

There is "static" method which allows to download any media from all known implemented sources:

suspend fun main() {
    val tmpDir = Files.createTempDirectory("skraper").toFile()

    val testVideo = Skrapers.download(
        media = Video("https://youtu.be/fjUO7xaUHJQ"),
        destDir = tmpDir,
        filename = "Gandalf"
    )

    val testImage = Skrapers.download(
        media = Image("https://www.pinterest.ru/pin/89509111320495523/"),
        destDir = tmpDir,
        filename = "Do_no_harm"
    )

    println(testVideo)
    println(testImage)
}

Output:

/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Gandalf.mp4
/var/folders/sf/hm2h5chx5fl4f70bj77xccsc0000gp/T/skraper8377953374796527777/Do_no_harm.jpg

Telegram bot

To use the bot follow the link.

skraper's People

Contributors

Stargazers

Watchers

skraper's Issues

twitter

hi,
it seems twitter modules can not scrape followers and following list.yes?

FacebookSkraper grabs media, but no text from a shared post

I am not sure if this is a bug or it is intended to be like that but, for example, take this post:

skraper returns this:

{
  "id" : "4346072205438814",
  "text" : "What a wonderful moment this morning at Waller Elementary. We LOVE our military kids and are grateful for their service and the service of their parents!",
  "publishedAt" : 1618250078,
  "rating" : 161,
  "commentsCount" : 2,
  "viewsCount" : 0,
  "media" : [ {
    "url" : "https://external.ficn4-1.fna.fbcdn.net/safe_image.php?d=AQGoUDzXwoNh4UmX&w=476&h=249&url=https%3A%2F%2Fwww.mypanhandle.com%2Fwp-content%2Fuploads%2Fsites%2F88%2F2021%2F04%2FSOTVO.00_00_19_21.Still001.jpg%3Fw%3D1280&cfs=1&upscale=1&fallback=news_d_placeholder_publisher&ccb=3-4&_nc_hash=AQGw4aGBDe67H8FP",
    "aspectRatio" : 1.9112903225806452
  } ]
}

The grabbed media is from the re-shared post, but there is no text - in this case that would be the partial url. There is also no way to tell if a post has media that they uploaded vs media from something that they re-shared.

Downloading VK limited functionality.

Downloading VK-Communities isnt working properly. Sometimes no posts are detected and sometimes a maximum of four or five is being downloaded.

The getUserPosts from InstagramSkrapper

Got this error
Tried with username : thedoodlestories
Exception in thread "main" com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false') at [Source: (byte[])"<meta id="viewport" name="viewport" content="[truncated 264379 bytes]; line: 1, column: 2]

Number of posts always the same for facebook scrape

Seems like when I put in this command

.\skraper facebook MuziekcentrumKinkyStar -n 200 -t json -o ..\SocialMediaArchvingMeemoo\skraper\

Still only results in a json file with " only " data from 19 posts. How do I increase this?
On Windows and on Linux.

Youtube front-page breaking changes

Implement Vimeo

TikTok Scraper not working returning null data

val client = OkHttpSkraperClient()
val sktr = TikTokSkraper(client)
Log.d("TAG", "getProfileFromTiktok: ${sktr.getPageInfo(path = "/@$userName")}")

I tried this above code but it returned null data I tried another type of data too but got the same issue.

Implement imgur skraper

https://imgur.com/

Can't resolve import ru.sokomishalov.skraper.model.AttachmentType.IMAGE

Hello, trying to run the example, however, I can't seen to be able to find AttachmentType (neither IMAGE, nor VIDEO).

Has this changed? Thanks for the help

Issue when scraping from VK Channels / Groups

Hi!

I'm having some issues with scraping from VK channels / groups.

Currently, I'm getting the following:

Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Unsupported field: Year (through reference chain: java.util.ArrayList[0])
        at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)
        at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:373)
        at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:338)
        at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:123)
        at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:79)
        at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serialize(IndexedListSerializer.java:18)
        at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
        at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
        at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1572)
        at com.fasterxml.jackson.databind.ObjectWriter._writeValueAndClose(ObjectWriter.java:1273)
        at com.fasterxml.jackson.databind.ObjectWriter.writeValueAsString(ObjectWriter.java:1140)
        at ru.sokomishalov.skraper.cli.Serialization$CSV.serialize(Serialization.kt:99)
        at ru.sokomishalov.skraper.cli.Main.persistMeta(Main.kt:97)
        at ru.sokomishalov.skraper.cli.Main.access$persistMeta(Main.kt:1)
        at ru.sokomishalov.skraper.cli.Main$main$1.invoke(Main.kt:54)
        at ru.sokomishalov.skraper.cli.Main$main$1.invoke(Main.kt:39)
        at com.xenomachina.argparser.SystemExitExceptionKt.mainBody(SystemExitException.kt:74)
        at com.xenomachina.argparser.SystemExitExceptionKt.mainBody$default(SystemExitException.kt:72)
        at ru.sokomishalov.skraper.cli.Main.main(Main.kt:39)
Caused by: java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: Year
        at java.base/java.time.Instant.getLong(Instant.java:604)
        at java.base/java.time.format.DateTimePrintContext$1.getLong(DateTimePrintContext.java:205)
        at java.base/java.time.format.DateTimePrintContext.getValue(DateTimePrintContext.java:308)
        at java.base/java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2763)
        at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2402)
        at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2402)
        at java.base/java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2402)
        at java.base/java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1849)
        at java.base/java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1823)
        at ru.sokomishalov.skraper.cli.Serialization$CSV$serialize$csvModule$1$1.serialize(Serialization.kt:72)
        at ru.sokomishalov.skraper.cli.Serialization$CSV$serialize$csvModule$1$1.serialize(Serialization.kt:66)
        at com.fasterxml.jackson.databind.ser.impl.IndexedListSerializer.serializeContents(IndexedListSerializer.java:119)
        ... 15 more

Scraping from users is fine - no errors being received. But I'm unsure whether I am invoking the wrong command as VK groups channels do not have a ID string.

Thanks in advance.

Facebook video can't be displayed in VideoView

Thumbnail is null and url refers to a fb page with the video rather then the video itself. Not sure this can be fixed tho (or am I missing something?).

Facebook "shared" posts

It's not enough information in shared posts. Have to be researched if it's possible to extend.

TwitterSkraper returns HTTP response code: 403

TwitterSkrapper Returns response code: 403.

Twitter And Youtube

Thank you for creating a good library.

I need to check a few things that didn't work well.

First, it seems that both getUserInfo() and getPageInfo() do not work with TwitterSkraper. PageInfo always becomes null.

Also, while YoutubeSkraper was successful in retrieving PageInfo, all properties of PageStatistics were null.

Is this something that can be addressed?

Instagram Hashtag Search

can we get data Instagram Hashtag Search?

Cil

Facebook page doesn't return any posts

Hello,

been wondering if you could help me out with this. When I run your FB scraper on this page: https://www.facebook.com/arrowsostrava/ it doesn't return any posts. Any thought on where the problem could be or how to fix it?
Thanks.

Skraper

Is there a way to use a proxy?

Either via the CLI tool or by passing arguments to OkHttpSkraperClient?

Event image not available in 0.4.2

Hey, in previous versions post like these provided Image medium:

Currently both text and medium are null.

question about inegration

Hi,

thanks for such great work.
Reading about possibility's of this plugin i see that can do a lot!

Can you tell me what would be the best proper way to connect it with my app written in PHP.

Regards!

0.8.0 Roadmap

Docker

How can I run this inside a docker container ?
What are the requirements
I can spend some time creating a docker image If anyoane can help me with the requirements to run this

how to use this library in java? and thow use proxy

Thank you.

error

Exception in thread "main" java.lang.NoClassDefFoundError: io/ktor/client/HttpClientJvmKt
	at ru.sokomishalov.skraper.client.ktor.KtorSkraperClient.<clinit>(KtorSkraperClient.kt:86)
	at org.example.Main.main(Main.java:11)
Caused by: java.lang.ClassNotFoundException: io.ktor.client.HttpClientJvmKt
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
	... 2 more

Process finished with exit code 1

main.java

package org.example;

import com.google.gson.Gson;
import kotlinx.coroutines.flow.Flow;
import ru.sokomishalov.skraper.client.ktor.KtorSkraperClient;
import ru.sokomishalov.skraper.model.Post;
import ru.sokomishalov.skraper.provider.facebook.FacebookSkraper;

public class Main {
    public static void main(String[] args) {
        KtorSkraperClient httpClient = new KtorSkraperClient();
        FacebookSkraper skraper = new FacebookSkraper(httpClient);
        Flow<Post> posts=  skraper.getPosts("memes");

        System.out.println( new Gson().toJson(posts));
    }
}

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>spider-java-test</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>21</maven.compiler.source>
        <maven.compiler.target>21</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>ru.sokomishalov.skraper</groupId>
            <artifactId>skrapers</artifactId>
            <version>0.12.1</version>
        </dependency>

        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>2.10.1</version>
        </dependency>
    </dependencies>
</project>

InstagramSkraper

The getUserPosts from InstagramSkrapper doesn't work anymore. I didn't have any problems last week but since today I get this error:
Exception in thread "main" com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false') at [Source: (byte[])"<!DOCTYPE html><html class="_9dls" lang="en" dir="ltr"><head><link data-default-icon="https://static.cdninstagram.com/rsrc.php/v3/yI/r/VsNE-OHk_8a.png" rel="icon" sizes="192x192" href="https://static.cdninstagram.com/rsrc.php/v3/yI/r/VsNE-OHk_8a.png" /><meta name="robots" content="noarchive, noimageindex" /><meta charset="utf-8" /><meta name="apple-mobile-web-app-status-bar-style" content="default" /><meta name="mobile-web-app-capable" content="yes" /><meta id="viewport" name="viewport" content="[truncated 264379 bytes]; line: 1, column: 2]

Can't import via gradle

Hey, have issue with importing the library to my android app.

When using
implementation("com.github.sokomishalov.skraper:skrapers:0.4.0")

I get following error:

Checked jitpack.io which says I should use implementation 'com.github.SokoMishaLov:skraper:0.4.0' but when I do that, it can't find any package.
Thanks for the help!

0.9.0 Roadmap

add tags field to the post
add author and scrape it (where possible)
provide interface replacement for uri/path

Scrape single post from instagram

How can I scrape a single post from instagram and other supported social media sites. Like from the following url:
https://www.instagram.com/foodfusionpk/p/CglWx2Ttsyo/

can not download video tiktok

suspend fun main() {
    val tmpDir = Files.createTempDirectory("skraper").toFile()

    val testVideo = Skrapers.download(
        media = Video("https://www.tiktok.com/@binsaqr/video/7216643985916415233"),
        destDir = tmpDir,
        filename = "Gandalf"
    )

    

    println(testVideo)
    println(testImage)
}

consider including the facebook account id in the scraped posts

It is useful to get certain things, e.g.

https://graph.facebook.com/{facebook_account_id}/picture?type=large returns the profile picture (no auth) and https://www.facebook.com/{facebook_account_id}/videos/{post_id}/ returns a video

Twitter scraper

Is this working for twitter since it recently removed all third party applications to use it's api and none of the libraries in python work now for scraping twitter data, do let me know your pov on this!

9Gag can fetch only 10 posts

Hello,

Thanks for great tool.

I have an issue and 9gag parser fetch only 10 posts each time, different paths checked with -n 20 and --limit 20 it is no matter.

E.g.

./skraper ninegag /hot -t log -n 20
./skraper ninegag /coronavirus -t log -n 30
./skraper ninegag /coronavirus -t log --limit 50

All returns:
Fetched 10 posts. Saved to: /home/developer/skraper-master/ninegag/coronavirus_0 4062020_123420.log

Serializing Post class to disk

Hey, just curious, is it possible to save fetched Post class to disk? Either via serialization, or as JSON or some such.
Thanks!

More Crypto

Join Dream to Earn app, and earn crypto by tracking your sleeping. Get 10% extra earnings for me and you by using my link https://dreamtoearnapp.com/?invite_code=o2wX1kA0ReeqO5L8 or download the app and add my invite code o2wX1kA0ReeqO5L8 directly.

Custom scrapers

Hello, is it/would it be possible to write your own scrapers, or to extend existing scrapers?
Currently trying to inherit from FacebookSkraper but this doesn't seem possible right now.
Thank you!

FacebookSkraper returns no data

Hello, I've been trying to update skraper version as the older one I was using (without Flow to fetch posts) wasn't working anymore.

I added 0.11.0 version to my project, however, getPosts method currently doesn't emit any data.
As far as I can tell here's something to reproduce the issue.

path = "dracibrno"

Full path that is being fetched is https://m.facebook.com/dracibrno/posts. getPosts finished during the first iteration in FacebookSkraper.kt:57 line nextPath = nextPage ?: break as nextPage == null.

fetchResult is unfortunately too long to copy here.
document variable seems to be filled correctly.

Any help on how to fix the issue would be highly appreciated!

Twitter front-page banned

We should bypass it somehow

Pulling Twitter profile doesn't seem to work

I might be doing something stupid, but I just tried building skraper and pulling a Twitter profile:

skraper [master]$ ./skraper twitter ElonMusk -n 5
Skraper 0.9.1-SNAPSHOT started

.../sokomishalov/skraper/twitter/ElonMusk_10052022_105216.log
skraper [master]$ ./skraper twitter /ElonMusk -n 5
Skraper 0.9.1-SNAPSHOT started

.../sokomishalov/skraper/twitter/ElonMusk_10052022_105231.log
skraper [master]$

In both cases (ElonMusk vs. /ElonMusk), the log is empty and I get no output and no errors.

Am I missing something basic about how to use the tool?

Gradle problem.

Hello, i have problem with gradle.

sokomishalov / skraper Goto Github PK

skraper's Introduction

Skraper

Overview

Bugs

Cli tool

Kotlin Library

Distribution

Usage

Instantiate specific scraper

Available methods

Usage from plain Java

Scrape user/community/channel/topic/trend posts

Scrape user/community/channel/topic/trend info

Resolve provider relative url

Download media

Telegram bot

skraper's People

Contributors

Stargazers

Watchers

Forkers

skraper's Issues

Recommend Projects

Recommend Topics

Recommend Org