hitomarukonpaku / twspace-crawler Goto Github PK

View Code? Open in Web Editor NEW

184.0 13.0 40.0 614 KB

Script to monitor & download Twitter Spaces 24/7

TypeScript 97.92% JavaScript 1.88% Dockerfile 0.20%

twitter twitter-spaces twitter-api ffmpeg nodejs discord typescript

twspace-crawler's Introduction

HitomaruKonpaku

twspace-crawler's People

Contributors

Stargazers

Watchers

twspace-crawler's Issues

One-time download on an ended Space not properly downloading

I noticed that on a space that has already ended and there was a recording still available, the following command did not successfully download the recording: twspace-crawler --env ./.env --config ./config.json --id <space_id>. Specifically, it did nothing after it successfully got the chat access token.

I managed to download the recording with no problem after passing in --force. Is there no check for a Space's state to see if it is Ended and by not passing in --force it immediately goes to monitoring the Space instead of retrieving the recording?

Request failed with status code 403

Hi, I get this error:

[ERROR] [UserManager] getUsersByUsernames: Request failed with status code 403 | {"requestId":"667a0f54-c312-4e81-91b5-67792c142147","response":{"data":{"client_id":"27283493","detail":"When authenticating requests to the Twitter API v2 endpoints, you must use keys and tokens from a Twitter developer App that is attached to a Project. You can create a project via the developer portal.","registration_url":"https://developer.twitter.com/en/docs/projects/overview","title":"Client Forbidden","required_enrollment":"Appropriate Level of API Access","reason":"client-not-enrolled","type":"https://api.twitter.com/2/problems/client-forbidden"}}}

The error started today. It worked before and I tried to use a new project.

Selectively download only [Audio] Files

Hello there,
is it possible to make my script to only Selectively download only [Audio] Files?
I don't need either text chat nor the jsonl file ?

How to get data ?

I'm currently trying to test your service.
I installed the application. However, I tried to simple execute node ./dist/index.js --user myTwitterAccount.
then tried to open twitter space and closed it. But nothing was downloaded nor anything happen any ideas?

no longer supported

Can't install it.

Space Downloading

I am quite upset about what Qelon has done to Twitter. I used to be able to download spaces but I am no longer able. I wanted this to work, I went through and followed the instructions for setup. Is it just me, or are there truly issues? When I initiate a space download I am met with this: getSpaceMetadata: Invalid value "undefined" for header "x-csrf-token" | {"requestId" Any help would be appreciated. Thank you kindly.

Feature Request: Record Space By Attended Not Necessarily Hosted

I know I can noti's if someone's in a space - so is there any way to record spaces by attendance rather than hosting? Like record all the spaces I show up as an attendee to?

Suggestion

Suggestion.

Add Node 14 install commands to the documentation.

We are thousands of programmers who never used Nodejs before. And it can clearly be done incorrectly.

Install Node 14:

sudo apt install curl
curl -sL https://deb.nodesource.com/setup_14.x | sudo bash -
sudo apt install -y nodejs

Feature request: Use UserID instead of username

The username could change while the userID will always stay the same

why

2023-10-27T12:43:34.602Z | [INFO] [UserListWatcher] Watching...
2023-10-27T12:43:35.668Z | [INFO] [SpaceWatcher@1ZkJzjPZPzqJv] Watching...
2023-10-27T12:43:35.670Z | [INFO] [SpaceWatcher@1ZkJzjPZPzqJv] Space url: https://twitter.com/i/spaces/1ZkJzjPZPzqJv
2023-10-27T12:43:36.072Z | [ERROR] [SpaceWatcher@1ZkJzjPZPzqJv] getAudioSpaceByRestId: Request failed with status code 404

401 error

--env .env with proper credentials. I'm getting 401 error:

getUserByScreenName: Request failed with status code 401

Undefined Space Title

In the Discord webhook, the title of the space is being displayed as undefined. This happens when a space does not have a title and therefore no title key within the space's data.

See

twspace-crawler/src/utils/SpaceUtil.ts

Line 18 in f1428fc

return audioSpace?.metadata?.title

Twitter's front-end sets the space's title in such cases to the display name of the space owner. We could do the same by falling back on

twspace-crawler/src/utils/SpaceUtil.ts

Line 26 in f1428fc

return audioSpace?.metadata?.creator_results?.result?.legacy?.name

Error while fetching twitter spaces

When I'm fetching the audioSpaces, from https://api.twitter.com/graphql/xjTKygiBMpX44KU8ywLohQ/AudioSpaceById

The api returns:

{"errors":[{"message":"The following features cannot be null: responsive_web_uc_gql_enabled, responsive_web_enhance_cards_enabled, responsive_web_edit_tweet_api_enabled, spaces_2022_h2_spaces_communities, tweetypie_unmention_optimization_enabled, verified_phone_label_enabled, longform_notetweets_consumption_enabled, vibe_api_enabled, spaces_2022_h2_clipping, interactive_text_enabled, responsive_web_graphql_timeline_navigation_enabled, tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled, view_counts_everywhere_api_enabled, responsive_web_text_conversations_enabled, graphql_is_translatable_rweb_tweet_is_translatable_enabled, standardized_nudges_misinfo, responsive_web_twitter_blue_verified_badge_is_enabled, view_counts_public_visibility_enabled","extensions":{"name":"BadRequestError","source":"Client","code":336,"kind":"Validation","tracing":{"trace_id":"5aa0e1ad62b688ff"}},"code":336,"kind":"Validation","name":"BadRequestError","source":"Client","tracing":{"trace_id":"5aa0e1ad62b688ff"}}]}

But the parameters that are mentioned are not null.
I'm calling:

const { data } = await axios.get(url, {
        headers,
        params: {
          variables: {
            id: spaceId,
            isMetatagsQuery: true,
            withSuperFollowsUserFields: true,
            withDownvotePerspective: false,
            withReactionsMetadata: false,
            withReactionsPerspective: false,
            withSuperFollowsTweetFields: true,
            withReplays: true,
          },
          features: {
            spaces_2022_h2_clipping: true,
            spaces_2022_h2_spaces_communities: true,
            responsive_web_twitter_blue_verified_badge_is_enabled: true,
            verified_phone_label_enabled: false,
            view_counts_public_visibility_enabled: true,
            longform_notetweets_consumption_enabled: false,
            tweetypie_unmention_optimization_enabled: true,
            responsive_web_uc_gql_enabled: true,
            vibe_api_enabled: true,
            responsive_web_edit_tweet_api_enabled: true,
            graphql_is_translatable_rweb_tweet_is_translatable_enabled: true,
            view_counts_everywhere_api_enabled: true,
            standardized_nudges_misinfo: true,
            tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled: false,
            responsive_web_graphql_timeline_navigation_enabled: true,
            interactive_text_enabled: true,
            responsive_web_text_conversations_enabled: false,
            responsive_web_enhance_cards_enabled: false,
          },
        },
      })
      ```

Transcription cc.txt

hello,
you must have noticed that the subtitles file generally remains empty at 0kb. to get the cc you have to go to the Space and activate it via the ... "activate subtitles" from there this unlocks the problem for the current Space. is it possible via the code to modify it so that the api asks the cc to activate without needing to do it on Space? THANKS

Is there a way to download just the audio?

Is there a way to download just the audio? Monitoring, master url extraction goes fine, but it stops at the chat access token.

Not overwriting existing file when monitoring and using `--force`

I am still trying to figure out how to properly use this tool as I was under the impression it downloads the space in real-time. When I tried it first using twspace-crawler --env ./.env --config ./config.json --id <space_id> it found the space but did not start downloading it yet. After taking a closer look at the README and reading it in full I misunderstood the functionality.

It's written "Monitor user(s) indefinitely, wait for live Space and download when Space ended". I tried passing in --force to force it to download the space which it did. However, it kept fetching the space and attempting to re-download it as the terminal kept monitoring the same user. Furthermore, the downloaded file did not seem to be re-written after all those calls.

The reason for this I believe is due to the ffmpeg command. ffmpeg by default does not overwrite an existing file and asks you either to rename the file or overwrite it via -y flag which twspace-crawler does not use.

It would be nice if either

--overwrite
- We allow overwriting with ffmpeg although I would guess there could be any repercussions with this e.g. terminating mid re-write
- If a file with the same name exists append "temp" the new file, download it, compare the sizes, and keep the file with the larger size
--download-segments
- Download the space's stream in segments and concat them into one file when the space has ended. Delete the segments once concatenated (optional?)

[bug] Only captions download if space has a comma in name.

Due to bad ffmpeg metadata formatting. I have a patch, this is a tracking issue.

List ffmpeg requirement in readme.md

or use @ffmpeg-installer/ffmpeg

Request failed with status code 503

node -v
v16.11.1

2021-12-17T08:19:13.781Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading captions to 211217081913 (1YqKDqaobPAGV) CC.jsonl
2021-12-17T08:19:13.815Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 1
2021-12-17T08:19:15.466Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 2
2021-12-17T08:19:16.061Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 3
2021-12-17T08:19:16.668Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 4
2021-12-17T08:19:17.314Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 5
2021-12-17T08:19:18.092Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 6
2021-12-17T08:19:18.464Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 7
2021-12-17T08:19:18.978Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 8
2021-12-17T08:19:19.531Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 9
2021-12-17T08:19:20.125Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 10
2021-12-17T08:19:20.800Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 11
2021-12-17T08:19:21.437Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 12
2021-12-17T08:19:22.047Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 13
2021-12-17T08:19:22.748Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 14
2021-12-17T08:19:23.333Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 15
2021-12-17T08:19:23.971Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 16
2021-12-17T08:19:24.373Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 17
2021-12-17T08:19:25.020Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 18
2021-12-17T08:19:25.616Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 19
2021-12-17T08:19:26.347Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 20
2021-12-17T08:19:26.917Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 21
2021-12-17T08:19:27.366Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 22
2021-12-17T08:19:28.792Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 23
2021-12-17T08:19:29.727Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 24
2021-12-17T08:19:30.539Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 25
2021-12-17T08:19:31.157Z | [�[32mINFO�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Downloading chunk 26
2021-12-17T08:19:32.357Z | [�[31mERROR�[39m] [SpaceCaptionsDownloader@1YqKDqaobPAGV] Request failed with status code 503

Is this case normal? It always happens for running lives, sometimes for ended lives.

(HELP) TypeError: Cannot read property '0' of null

twspace-crawler --url https://twitter.com/i/spaces/1ZkJzbWyoqZJv
2022-02-13T20:15:48.881Z | [INFO] ================================================================================
2022-02-13T20:15:48.883Z | [INFO] Version: 1.11.1
2022-02-13T20:15:48.884Z | [INFO] Starting in url mode | {"url":"https://twitter.com/i/spaces/1ZkJzbWyoqZJv"}
2022-02-13T20:15:48.884Z | [VERBOSE] [SpaceDownloader] Playlist path: "/home/USR/twdl/download/2202132015.m3u8"
2022-02-13T20:15:48.885Z | [VERBOSE] [SpaceDownloader] Audio path: "/home/USR/twdl/download/2202132015.m4a"
(node:30337) UnhandledPromiseRejectionWarning: TypeError: Cannot read property '0' of null
at Function.getFinalPlaylistName (/usr/lib/node_modules/twspace-crawler/dist/utils/PeriscopeUtil.js:9:42)
at Function.getFinalPlaylistUrl (/usr/lib/node_modules/twspace-crawler/dist/apis/PeriscopeApi.js:21:71)
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at async SpaceDownloader.download (/usr/lib/node_modules/twspace-crawler/dist/modules/SpaceDownloader.js:33:32)
(Use node --trace-warnings ... to show where the warning was created)
(node:30337) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:30337) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I don't know how to fix such problem
if someone faced it or if I installed something wrong please guide me to fix it

Captions not downloading - likely due to rate limiting

for a while, i'd have to try 4 or 5 times before captions would actually download, now they never do, they fail with 503... I think i remember when looking deeper something to do with rate limiting, so maybe the code needs to be updated to honor the rate limiting.

Problem with monitoring ID

Hi,

when i try twspace-crawler --user Name i have a problem:

root@raspberrypi:/home/pi# twspace-crawler --user test_test
2022-06-04T00:21:05.142Z | [INFO] ================================================================================
2022-06-04T00:21:05.149Z | [INFO] Version: 1.11.8
2022-06-04T00:21:05.654Z | [INFO] Starting in user mode | {"userCount":1,"users":["test_test"]}
2022-06-04T00:21:05.728Z | [INFO] [UserWatcher@test_test] Watching...

and after nothin...

I do it with a real name (here i replace it with test for the question...) :)

Why i never seen when people start space ?

Thank you

Cannot read properties of undefined (reading 'config')

Just installed it but couldn't get it to download my own Twitter Spaces. What am I doing wrong?

$ twspace-crawler --id 1lPKqbYrAonGb
2024-03-13T01:16:55.448Z | [INFO] ================================================================================
2024-03-13T01:16:55.448Z | [INFO] Version: 1.12.9
2024-03-13T01:16:56.036Z | [INFO] Starting in space id mode | {"id":"1lPKqbYrAonGb"}
2024-03-13T01:16:56.036Z | [INFO] [SpaceWatcher@1lPKqbYrAonGb] Watching...
2024-03-13T01:16:56.036Z | [INFO] [SpaceWatcher@1lPKqbYrAonGb] Space url: https://twitter.com/i/spaces/1lPKqbYrAonGb
2024-03-13T01:16:56.039Z | [ERROR] [SpaceWatcher@1lPKqbYrAonGb] getAudioSpaceById: Cannot read properties of undefined (reading 'config')
2024-03-13T01:16:57.572Z | [ERROR] [SpaceWatcher@1lPKqbYrAonGb] getAudioSpaceByRestId: Request failed with status code 404
2024-03-13T01:16:57.573Z | [ERROR] [SpaceWatcher@1lPKqbYrAonGb] AudioSpace metadata not found
2024-03-13T01:16:57.579Z | [INFO] [SpaceWatcher@1lPKqbYrAonGb] Retry watch in 10000ms
2024-03-13T01:17:07.580Z | [INFO] [SpaceWatcher@1lPKqbYrAonGb] Watching...
2024-03-13T01:17:07.581Z | [INFO] [SpaceWatcher@1lPKqbYrAonGb] Space url: https://twitter.com/i/spaces/1lPKqbYrAonGb
2024-03-13T01:17:07.585Z | [ERROR] [SpaceWatcher@1lPKqbYrAonGb] getAudioSpaceById: Cannot read properties of undefined (reading 'config')
2024-03-13T01:17:08.513Z | [ERROR] [SpaceWatcher@1lPKqbYrAonGb] getAudioSpaceByRestId: Request failed with status code 404
2024-03-13T01:17:08.514Z | [ERROR] [SpaceWatcher@1lPKqbYrAonGb] AudioSpace metadata not found
2024-03-13T01:17:08.515Z | [INFO] [SpaceWatcher@1lPKqbYrAonGb] Retry watch in 10000ms

Unable to monitor users with undescore in the twitter handle

Hi,

I noticed that users with undescores are not scanned by the crawler.

I would be happy to help (but not sure how to setup a devenv and how to capture data for unit testing).

Thibault

Status 200 with message Forbidded

    data: { code: 200, message: 'Forbidden.' }

When I'm hitting the https://api.twitter.com/1.1/guest/activate.json with the bearer token AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs=1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA

It returns forbidden. Can I get some help? I might be missing something.

^ SyntaxError: Unexpected token . at Module._compile (internal/modules/cjs/loader.js:723:23)

Hello there 🌵

Firstly, thanks for all your hard work 🧑‍💼

I suspect you did a typo or commented something out that is the cause of the error.

I'm not a pro at programming yet so I may be wrong though.

I followed the install instructions + installed node with sudo apt install node (The Ubuntu console gave me the command)

Ubuntu 20.04
Python 3.9
node --version: v10.19.0
ffmpeg version 4.4.1-0 ubuntu1~~20.04.sav0 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 9 (Ubuntu 9.3.017ubuntu1~~20.04)
New anaconda environment
Git clone repo a few hours ago.

The error:
Person_1 etc. = left out for the purpose of OPSEC

(twspace-crawler) judo@judo-VM:~/twspace-crawler$ node ./dist/index.js --user Person_1,Person_2,Person_3,Person_4
/home/judo/twspace-crawler/dist/utils/PeriscopeUtil.js:18
        return data.match(chunkIndexPattern)?.map((v) => Number(v)) || [];
                                             ^

SyntaxError: Unexpected token .
    at Module._compile (internal/modules/cjs/loader.js:723:23)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
    at Function.Module._load (internal/modules/cjs/loader.js:585:3)
    at Module.require (internal/modules/cjs/loader.js:692:17)
    at require (internal/modules/cjs/helpers.js:25:18)
    at Object.<anonymous> (/home/judo/twspace-crawler/dist/apis/PeriscopeApi.js:8:25)
    at Module._compile (internal/modules/cjs/loader.js:778:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
(twspace-crawler) judo@judo-VM:~/twspace-crawler$ node ./dist/index.js --user RichardHeartWin,FundingGym,Hexologist31,HexOrca
/home/judo/twspace-crawler/dist/utils/PeriscopeUtil.js:18
        return data.match(chunkIndexPattern)?map((v) => Number(v)) || [];
                                                                        ^

Captions 0Kb

Hey,

since few days all files CC captions are empty... do you know what is the problem ?
Thank you

User undefined after Space ended

User undefined due to user changed protected status mid stream

Space metadata right after started

{"rest_id":"1vAxRkbpqXkKl","state":"Running","media_key":"28_1507007548024516614","created_at":1648133572933,"started_at":1648133575936,"updated_at":1648133576596,"is_employee_only":false,"is_locked":false,"conversation_controls":0,"total_participated":0,"total_replay_watched":0,"creator_results":{"result":{"__typename":"User","id":"VXNlcjoxMzQ5NzA1MTI2Mzg3MjI0NTc5","rest_id":"1349705126387224579","affiliates_highlighted_label":{},"has_nft_avatar":false,"legacy":{"created_at":"Thu Jan 14 13:09:26 +0000 2021","default_profile":true,"default_profile_image":false,"description":"ひなゆです。illust。基本創作たまに版権。ﾊﾟｰｶｰとｻﾒとｼｬﾁとﾈｺ多いです。お仕事のご連絡は[email protected] かDMまで！サブ垢→@tsukino_63 skeb→ https://t.co/9QFh1EwGPq","entities":{"description":{"urls":[{"display_url":"skeb.jp/@tsukino_hinayu","expanded_url":"https://skeb.jp/@tsukino_hinayu","url":"https://t.co/9QFh1EwGPq","indices":[100,123]}]},"url":{"urls":[{"display_url":"pixiv.net/users/76920612","expanded_url":"https://www.pixiv.net/users/76920612","url":"https://t.co/dTUqj7Ozjg","indices":[0,23]}]}},"fast_followers_count":0,"favourites_count":5997,"followers_count":5192,"friends_count":221,"has_custom_timelines":true,"is_translator":false,"listed_count":77,"location":"水族館と猫カフェ","media_count":35,"name":"雛夕 月ﾉ。@固ツイ把握お願いします💦","normal_followers_count":5192,"pinned_tweet_ids_str":["1494054740388290563"],"profile_banner_extensions":{"mediaColor":{"r":{"ok":{"palette":[{"percentage":67.93,"rgb":{"blue":232,"green":232,"red":232}},{"percentage":23.75,"rgb":{"blue":128,"green":123,"red":117}},{"percentage":2.27,"rgb":{"blue":58,"green":48,"red":44}},{"percentage":1.98,"rgb":{"blue":178,"green":162,"red":117}},{"percentage":0.62,"rgb":{"blue":36,"green":37,"red":92}}]}}}},"profile_banner_url":"https://pbs.twimg.com/profile_banners/1349705126387224579/1641301487","profile_image_extensions":{"mediaColor":{"r":{"ok":{"palette":[{"percentage":51.92,"rgb":{"blue":218,"green":219,"red":220}},{"percentage":27.24,"rgb":{"blue":65,"green":63,"red":62}},{"percentage":9.61,"rgb":{"blue":131,"green":132,"red":119}},{"percentage":4.97,"rgb":{"blue":110,"green":114,"red":125}},{"percentage":0.42,"rgb":{"blue":135,"green":135,"red":173}}]}}}},"profile_image_url_https":"https://pbs.twimg.com/profile_images/1484483261548023810/GXFXn3VG_normal.jpg","profile_interstitial_type":"","protected":false,"screen_name":"tsukino_hinayu","statuses_count":7468,"translator_type":"none","url":"https://t.co/dTUqj7Ozjg","verified":false,"withheld_in_countries":[]}}}}

Space metadata after ended

{"rest_id":"1vAxRkbpqXkKl","state":"Ended","title":"無言多めさぎょー。ｽﾋﾟｰｶｰ相互さんのみ","media_key":"28_1507007548024516614","created_at":1648133572933,"started_at":1648133575936,"ended_at":"1648148556421","updated_at":1648148557700,"is_employee_only":false,"is_locked":false,"conversation_controls":0,"total_participated":58,"total_replay_watched":0,"creator_results":{"result":{"__typename":"UserUnavailable","reason":"Protected"}}}

Add webhooks for dozens of platforms via matterbridge

Matterbridge is a tool that works as a non-platform-specific chat bridge between many communications platforms. It preserves threading/attachments/etc as much as it can.

I see that you support discord, but you could use the matterbridge API to support everything they support :)

https://github.com/42wim/matterbridge/wiki/Api

Thanks for the great tool!

after the new update got some issues i cant fix it myself

i start the twspace crawler and it only print the message like this ,please help
2023-07-01T08:33:20.986Z | [ERROR] [UserManager] getUserByScreenName: Invalid value "undefined" for header "x-csrf-token" | {"username":"achan_UGA"}

NO CAPTIONS

As of right now, the script that extracts the captions needs to be fixed. This might be because the JSON from which the captions are usually extracted no longer contains the captions.

Support passing extra arguments to ffmpeg

Title

Spaces viewable only in the Twitter app

I've created a space that can only be viewed on the Twitter app, but on the mobile web and PC web it looks like a deleted space, and I can only hear and see it on the app. Is there a solution?..

Error 403 since 2 days

Hello, since 2 days i have this message...
Do you know what is the problem ? i don't have problem since 1 year but since 2 days i havethis message...
(i try to do another account but it's similar)
It'll be impossible with free account developers ?
Thank you.

[ERROR] [UserManager] getUsersByUsernames: Request failed with status code 403 | {"requestId":"16fbde9b-5874-423c-bfc5-7528f0eqa215","response":{"data":{"client_id":"27841039","detail":"When authenticating requests to the Twitter API v2 endpoints, you must use keys and tokens from a Twitter developer App that is attached to a Project. You can create a project via the developer portal.","registration_url":"https://developer.twitter.com/en/docs/projects/overview","title":"Client Forbidden","required_enrollment":"Appropriate Level of API Access","reason":"client-not-enrolled","type":"https://api.twitter.com/2/problems/client-forbidden"}}}

I can do "twspace-crawler --id 1ldxxxxxxxx " for catch 1 space manualy but impossible to do "--env --config" with the config.json for autocatch... :((

Keeping getting 403 after watching for 3 hours

Node version: 14.18.2
twspace-crawler version: dd2dada (v1.8.0)

After watching an user for 3 hours, UserWatcher started getting 403's.

Log:

$ node dist/index.js -d --user akaihaato
2022-01-13T07:57:08.916Z | [INFO] ================================================================================
2022-01-13T07:57:08.919Z | [DEBUG] Args | {"debug":true,"user":"akaihaato"}
2022-01-13T07:57:08.919Z | [INFO] Starting in user mode | {"users":["akaihaato"]}
2022-01-13T07:57:08.920Z | [DEBUG] [UserManager] add | {"usernames":["akaihaato"]}
2022-01-13T07:57:08.920Z | [DEBUG] [UserManager] --> fetchUsersByScreenName
2022-01-13T07:57:08.928Z | [DEBUG] [ConfigManager] --> getGuestToken
2022-01-13T07:57:09.552Z | [DEBUG] [ConfigManager] <-- getGuestToken | {"guestToken":"1481535776458690564"}
2022-01-13T07:57:09.557Z | [DEBUG] [UserManager] --> getUserByScreenName 1 | {"username":"akaihaato"}
2022-01-13T07:57:10.163Z | [DEBUG] [UserManager] <-- getUserByScreenName 1 | {"username":"akaihaato"}
2022-01-13T07:57:10.164Z | [DEBUG] [UserManager] <-- fetchUsersByScreenName
2022-01-13T07:57:10.164Z | [INFO] [UserWatcher@akaihaato] Watching...
2022-01-13T07:57:10.165Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T07:57:11.520Z | [DEBUG] [UserWatcher@akaihaato] --> getAudioSpaceById | {"id":"1mrxmalLejgxy"}
2022-01-13T07:57:11.520Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T07:57:12.160Z | [DEBUG] [UserWatcher@akaihaato] <-- getAudioSpaceById | {"id":"1mrxmalLejgxy","state":"Ended"}
2022-01-13T07:57:41.520Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T07:57:42.843Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T07:58:12.844Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T07:58:14.160Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T07:58:44.161Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T07:58:45.674Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T07:59:15.674Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T07:59:17.072Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T07:59:47.073Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T07:59:48.414Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T08:00:18.414Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T08:00:19.860Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
...
2022-01-13T10:55:08.454Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:55:09.827Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T10:55:39.827Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:55:41.289Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T10:56:11.289Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:56:12.605Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T10:56:42.605Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:56:44.087Z | [DEBUG] [UserWatcher@akaihaato] <-- getUserTweets | {"spaceIds":["1mrxmalLejgxy"]}
2022-01-13T10:57:14.088Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:57:14.635Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T10:57:44.636Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:57:45.231Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T10:58:15.231Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:58:15.786Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T10:58:45.786Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:58:46.333Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T10:59:16.333Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:59:16.887Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T10:59:46.889Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T10:59:47.452Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T11:00:17.452Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T11:00:17.997Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T11:00:47.999Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T11:00:48.537Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T11:01:18.538Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T11:01:19.075Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T11:01:49.075Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T11:01:49.637Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
...
2022-01-13T12:53:48.594Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T12:53:49.115Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403
2022-01-13T12:54:19.116Z | [DEBUG] [UserWatcher@akaihaato] --> getUserTweets
2022-01-13T12:54:19.649Z | [ERROR] [UserWatcher@akaihaato] getSpaces: Request failed with status code 403

Space Ended

Hello,

When a space is ended it's possible to retrieve a space ? Because i want retrieve a space but i've error 404...

thank you.

Twitter error

Twitter update their APIs again and now nothing is publicy visible.

Will try to fix later

UPDATED: PLEASE READ INSTALLATION for more detail

Issue: Some Sort of API Time Out?

Been getting Error code 429 after a day or two - I need to nail down the time frame. Guessing there's some sort of API token time out? Is there anyway to code it where if it detects this it auto restarts the service? Error below...

I get a bunch of these and eventually gets hung with a spaces URL and starts throwing 404s

2023-03-17T09:52:11.733Z | [ERROR] [UserListWatcher] getSpaces: Request failed with status code 429 | {"requestId":"7a580cad-6d39-449b-a6ce-04e939221224","response":{"data":{"title":"Too Many Requests","detail":"Too Many Requests","type":"about:blank","status":429},"headers":{"date":"Fri, 17 Mar 2023 09:52:11 UTC","perf":"7626143928","server":"tsa_b","set-cookie":["guest_id_marketing=v1%3A167904673173096689; Max-Age=63072000; Expires=Sun, 16 Mar 2025 09:52:11 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None","guest_id_ads=v1%3A167904673173096689; Max-Age=63072000; Expires=Sun, 16 Mar 2025 09:52:11 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None","personalization_id="v1_dy3wQXs5Llnzrao1YoIj5g=="; Max-Age=63072000; Expires=Sun, 16 Mar 2025 09:52:11 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None","guest_id=v1%3A167904673173096689; Max-Age=63072000; Expires=Sun, 16 Mar 2025 09:52:11 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None"],"api-version":"2.61","content-type":"application/json; charset=utf-8","cache-control":"no-cache, no-store, max-age=0","content-length":"92","x-access-level":"read","x-frame-options":"SAMEORIGIN","x-transaction-id":"033183acfc66aa59","x-xss-protection":"0","x-rate-limit-limit":"300","x-rate-limit-reset":"1679047260","content-disposition":"attachment; filename=json.json","x-content-type-options":"nosniff","x-rate-limit-remaining":"275","strict-transport-security":"max-age=631138519","x-response-time":"11","x-connection-hash":"e8213dd39ff82c155977688a21641e14cabe54b07e281f20631a29bfa2c0a7fb","connection":"close"}}}

Broken on Ubuntu 22.04 LTS

I probably have ran into some problems with Ubuntu 22.04 LTS 🤔
The crawler can grab the meta info but not being able to download any m4a audiofile.

Env	Ver
pnpm	7.18.0
Node.js	18.2.1
twspace-crawler	1.11.10
FFmpeg	5.0.1

I have tested this and confirmed that it happens on my Ubuntu machines (both x86_64 and aarch64, 22.04.1 LTS).
Strangely, everything works fine on my MacBook (Apple Silicon, 2021, Ventura 13.0.1).

Here's the log on my Ubuntu machine

$ twspace-crawler --url https://prod-fastly-ap-northeast-1.video.pscp.tv/Transcoding/v1/hls/SvvyS_Hr2uetdvBKLMcK-BWeT2I8BHq-3c5ZpoABD5tHCNCjaWZus3Uufrxb9BJFhQH9jS-6LDz5Nm4uh4mxrg/non_transcode/ap-northeast-1/periscope-replay-direct-prod-ap-northeast-1-public/audio-space/master_playlist.m3u8 -d

2022-12-05T12:12:04.784Z | [INFO] ================================================================================
2022-12-05T12:12:04.788Z | [INFO] Version: 1.11.10
2022-12-05T12:12:05.601Z | [DEBUG] Args | {"url":"https://prod-fastly-ap-northeast-1.video.pscp.tv/Transcoding/v1/hls/SvvyS_Hr2uetdvBKLMcK-BWeT2I8BHq-3c5ZpoABD5tHCNCjaWZus3Uufrxb9BJFhQH9jS-6LDz5Nm4uh4mxrg/non_transcode/ap-northeast-1/periscope-replay-direct-prod-ap-northeast-1-public/audio-space/master_playlist.m3u8","debug":true}
2022-12-05T12:12:05.603Z | [DEBUG] env.TWITTER_AUTHORIZATION=
2022-12-05T12:12:05.604Z | [DEBUG] env.TWITTER_AUTH_TOKEN=
2022-12-05T12:12:05.605Z | [INFO] Starting in playlist url mode | {"url":"https://prod-fastly-ap-northeast-1.video.pscp.tv/Transcoding/v1/hls/SvvyS_Hr2uetdvBKLMcK-BWeT2I8BHq-3c5ZpoABD5tHCNCjaWZus3Uufrxb9BJFhQH9jS-6LDz5Nm4uh4mxrg/non_transcode/ap-northeast-1/periscope-replay-direct-prod-ap-northeast-1-public/audio-space/master_playlist.m3u8"}
2022-12-05T12:12:05.607Z | [DEBUG] [SpaceDownloader] constructor | {"originUrl":"https://prod-fastly-ap-northeast-1.video.pscp.tv/Transcoding/v1/hls/SvvyS_Hr2uetdvBKLMcK-BWeT2I8BHq-3c5ZpoABD5tHCNCjaWZus3Uufrxb9BJFhQH9jS-6LDz5Nm4uh4mxrg/non_transcode/ap-northeast-1/periscope-replay-direct-prod-ap-northeast-1-public/audio-space/master_playlist.m3u8","filename":"2212051212","subDir":""}
2022-12-05T12:12:05.607Z | [VERBOSE] [SpaceDownloader] Playlist path: "/home/aozaki/download/2212051212.m3u8"
2022-12-05T12:12:05.608Z | [VERBOSE] [SpaceDownloader] Audio path: "/home/aozaki/download/2212051212.m4a"
2022-12-05T12:12:05.609Z | [DEBUG] [SpaceDownloader] download | {"originUrl":"https://prod-fastly-ap-northeast-1.video.pscp.tv/Transcoding/v1/hls/SvvyS_Hr2uetdvBKLMcK-BWeT2I8BHq-3c5ZpoABD5tHCNCjaWZus3Uufrxb9BJFhQH9jS-6LDz5Nm4uh4mxrg/non_transcode/ap-northeast-1/periscope-replay-direct-prod-ap-northeast-1-public/audio-space/master_playlist.m3u8"}
2022-12-05T12:12:06.091Z | [INFO] [SpaceDownloader] Final playlist url: https://prod-fastly-ap-northeast-1.video.pscp.tv/Transcoding/v1/hls/SvvyS_Hr2uetdvBKLMcK-BWeT2I8BHq-3c5ZpoABD5tHCNCjaWZus3Uufrxb9BJFhQH9jS-6LDz5Nm4uh4mxrg/non_transcode/ap-northeast-1/periscope-replay-direct-prod-ap-northeast-1-public/audio-space/playlist_16776578390500222618.m3u8
2022-12-05T12:12:06.093Z | [VERBOSE] [SpaceDownloader] Audio is saving to "/home/aozaki/download/2212051212.m4a"
2022-12-05T12:12:06.093Z | [VERBOSE] [SpaceDownloader] ffmpeg -protocol_whitelist file,https,tls,tcp -i https://prod-fastly-ap-northeast-1.video.pscp.tv/Transcoding/v1/hls/SvvyS_Hr2uetdvBKLMcK-BWeT2I8BHq-3c5ZpoABD5tHCNCjaWZus3Uufrxb9BJFhQH9jS-6LDz5Nm4uh4mxrg/non_transcode/ap-northeast-1/periscope-replay-direct-prod-ap-northeast-1-public/audio-space/playlist_16776578390500222618.m3u8 -c copy /home/aozaki/download/2212051212.m4a

Unexpected token '.' in Util.js

Hi, thanks for making this tool, I am giving it a try but encountered the following error on both Windows and Ubuntu:

I tried with --user, --config or --id, all result in the same error.

Feature Request: Batch Ended Spaces from Command

Would it be possible or is it possible to batch spaces (as I'm not sure how it multithreads, I know its captured more than one space at once for me) or do them as a queue where you could input twspace-crawler --id spaceoneid,spacetwoid,space3id etc. I've been doing them as one offs. And not sure if this is correct.

error 404 / getAudioSpaceById

Hello,

Do you know what is the problem please ?
Thank you.

2023-08-15T10:56:04.897Z | [ERROR] [SpaceWatcher@1dRKZMdLMrvxB] getAudioSpaceById: Request failed with status code 404
2023-08-15T10:56:04.937Z | [ERROR] [SpaceWatcher@1dRKZMdLMrvxB] getAudioSpaceByRestId: Request failed with status code 404
2023-08-15T10:56:04.939Z | [ERROR] [SpaceWatcher@1dRKZMdLMrvxB] AudioSpace metadata not found
2023-08-15T10:56:04.942Z | [INFO] [SpaceWatcher@1dRKZMdLMrvxB] Retry watch in 10000ms
2023-08-15T10:56:04.953Z | [ERROR] [SpaceWatcher@1eaKbraQLArKX] getAudioSpaceById: Request failed with status code 404
2023-08-15T10:56:04.977Z | [ERROR] [SpaceWatcher@1eaKbraQLArKX] getAudioSpaceByRestId: Request failed with status code 404
2023-08-15T10:56:04.982Z | [ERROR] [SpaceWatcher@1eaKbraQLArKX] AudioSpace metadata not found

New @ tracking system

Since yesterday I noticed on my Raspberry that the @ followed in the config.json work even when they are not host of the spaces (before the @ had to be the host of the space to trigger the capture), on the other hand on my PC the system has not changed, it only captures if the @ starts the space.
Do you know why ?

ps: FYI I prefer the new @ tracking system (capturing the space even if the @ doesn't host it).
I'm on 1.11.13 on Raspberry and PC.

Thank you

Feature request: Add the possibility to define post-processing scripts

It would be cool if there would be a way to trigger scripts after a space got downloaded completely. That way people could write additional plugis, I for example would want to write a plugin that automatically uploads the files to GDrive via rclone, others might want to move the files to a path for a webserver...

Protected accounts saving/organization and missing webhook

If the host of a space has a private account, there's no Discord webhook sent for whatever reason but also the file(s) don't get properly organized. I tested it by hosting a space with a private account, and using the auth of my other account that follows it.

twspace-crawler/src/modules/SpaceWatcher.ts

Line 63 in 430247d

public get userScreenName(): string {

calls

twspace-crawler/src/utils/SpaceUtil.ts

Line 22 in 430247d

return audioSpace?.metadata?.creator_results?.result?.legacy?.screen_name

but for protected accounts the value of creator_results is

"creator_results": {
  "result": {
    "__typename": "UserUnavailable",
    "reason": "Protected"
  }
}

I think we should modify SpaceUtil's functions to mitigate a protected account such as by adding a check to SpaceUtil#getHostUsername and other relevant functions. Despite create_results being unavailable we could utilize audiospace.participants.admins[0] instead as from my gathering that is always the host. Even though the account is protected, we can pull relevant data such as user and display names as well as the user's id and avatar (src/interfaces/Twitter.interface.ts#L69).

By using this information and adding checks, we could derive the appropriate information and be able to send the webhook with no problem as well as organizing the files accordingly.

I can help work on this but I will be busy until the next 12 hours or so.

[INFO] Version: 1.11.3

Hello I'm trying to setup the script to monitor users and record spaces without me keep monitoring them myself
so I followed the guide you posted and did this comamnd
twspace-crawler --env ./.env --config ./config.json
all I can see in the log file inside logs folder
is this

2022-02-22T18:11:40.010Z | [INFO] ================================================================================
2022-02-22T18:11:40.019Z | [INFO] Version: 1.11.3
2022-02-22T18:11:43.370Z | [INFO] ================================================================================
2022-02-22T18:11:43.380Z | [INFO] Version: 1.11.3
2022-02-22T18:11:46.723Z | [INFO] ================================================================================
2022-02-22T18:11:46.732Z | [INFO] Version: 1.11.3
2022-02-22T18:11:50.067Z | [INFO] ================================================================================
2022-02-22T18:11:50.076Z | [INFO] Version: 1.11.3
2022-02-22T18:11:53.411Z | [INFO] ================================================================================
2022-02-22T18:11:53.420Z | [INFO] Version: 1.11.3
2022-02-22T18:11:56.757Z | [INFO] ================================================================================
2022-02-22T18:11:56.766Z | [INFO] Version: 1.11.3
2022-02-22T18:12:00.105Z | [INFO] ================================================================================
2022-02-22T18:12:00.115Z | [INFO] Version: 1.11.3

so how can I know for sure it's working here

Feature request: Possibility to set the download destination

It would be nice if the download directory wouldnt be hardcoded to [workdir]/download but set via the .env file to any folder (default to [workdir]/download if not set would still be important)

Feature request: Optional trigger for "Space XXXX ended" Webhook-Posts

It would be cool to have the bot edit its Space live posts to "Space ended", maybe with a custom field "End Text" where people could add an info about how to optain the finished space (Like: "Head over to some.website/spaces to get the archive")

Info about editing Webhook messages: https://discord.com/developers/docs/resources/webhook#edit-webhook-message

Feature Request/Question: Stream Isolation

Ok another random question. Is it possible since I believe the audio are individual streams from participants to just capture the audio of a single participant - like if i wanted to record my own space and isolate my own voice for sampling/cuts without others talking over me? Sometimes I want to do subclips but someone talks over or buries something I said and I want to just pull my or one other persons audio for editing.

hitomarukonpaku / twspace-crawler Goto Github PK

twspace-crawler's Introduction

HitomaruKonpaku

twspace-crawler's People

Contributors

Stargazers

Watchers

Forkers

twspace-crawler's Issues

Recommend Projects

Recommend Topics

Recommend Org