nilfoer / gwaripper Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 4.0 2.14 MB

Tool for conveniently downloading audios from r/gonewildaudio and similar subreddits

License: MIT License

Batchfile 0.01% Python 83.15% Jupyter Notebook 7.46% CSS 5.02% HTML 4.01% JavaScript 0.34%

downloader gonewildaudio reddit scraping

gwaripper's People

Contributors

Stargazers

Watchers

Forkers

iamthenilu chirag127 princess-rainbow

gwaripper's Issues

Issues displaying certain audios

I've had an issue where certain audios, despite downloading fine (metadata and audio files downloaded, audio is not corrupted), are not recognised and cannot be embedded in the web player ("Local file couldn't be found" error).

Two audios with which I reproduced the issue on a clean install:
aHR0cHM6Ly93d3cucmVkZGl0LmNvbS9yL2dvbmV3aWxkYXVkaW8vY29tbWVudHMvb3lxNzJ0L2Y0bV9mcm9tX2Rpc29iZWRpZW50X3RlZW5fdG9fc3VibWlzc2l2ZV9mdWNrLw==
aHR0cHM6Ly9yZWRkaXQuY29tL3IvZ29uZXdpbGRhdWRpby9jb21tZW50cy9wcWJ3M3kvZjRtX2NsYXNzbWF0ZV9naXZlc195b3Vfc2VydmljZV9jaGlsZGhvb2Qv

I might well be doing something wrong here, in which case I apologise. Thank you for the useful tool.

Documentation

This seems to lack complete documentation of each command. I'd like to work with you and see if we can map out a complete list of commands and their capabilities/uses. Thank you for the great software; it has saved me a lot of time.

Any plan to add functionalities to rip from chirbit as well by any chance?

Hey, by the way, are you on irc or discord by any chance? I'm also a fan of gwa :)

can't get it to work on linux anymore.

Used to have no problem using gwaripper on linux. However, for the past 6 months or so I have not been able to use it for some reason.

It's giving me "Uncaught exceptions" error:
image link here: https://files.catbox.moe/zqgml1.png

install method via: > python -m pip install -r requirements.txt
and running with the following command: python gwaripper-runner.py

WARNING - URL Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired

A friend of mine is looking to backup her content before nuking everything.

I'm currently using the 0.6.8_single-folder version and running .\gwaripper.exe redditor 200 <username> (is there a way to not have to limit the # of posts?) and I'm getting nothing but the following kinds of errors:

2023-05-15 12:19:39,177 - gwaripper.download - WARNING - URL Error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1125) (url: <entirely valid soundgasm URL that works if I copy/paste it into a browser>)
2023-05-15 12:19:39,179 - gwaripper.extractors.base - WARNING - ERROR - NO_RESPONSE - Request timed out or no response received! (URL was <entirely valid soundgasm URL that works if I copy/paste it into a browser>)

I also get this error if I just try to grab a post from the subreddit's current front page. For example:

.\gwaripper.exe links <any comments link from GWA, or any file link from Soundgasm>

gives me the same errors.

(Also, praw is complaining about a new version being out. But I don't think that matters here?)

If it helps, I've checked the certificate sent by Soundgasm via web browser, and that one definitely isn't expired yet.

Imgur unhandled expectation "?" in filename after extension.

22:23:18 - CRITICAL - Uncaught exception:
Traceback (most recent call last):
File "gwaripper-runner.py", line 23, in
File "cli.py", line 237, in main
File "cli.py", line 293, in _cl_redditor
File "cli.py", line 273, in download_all_subs
File "gwaripper.py", line 256, in download_all
File "gwaripper.py", line 171, in parse_and_download_submission
File "gwaripper.py", line 262, in download
File "gwaripper.py", line 426, in _download_collection
File "gwaripper.py", line 304, in _download_file
File "gwaripper.py", line 345, in _download_file_http
File "download.py", line 107, in download_in_chunks
OSError: [Errno 22] Invalid argument:

'C:\filepath\redditor\filenamewiththefollowingextension -> ._1yp52c4.jpg?1'

[11736] Failed to execute script 'gwaripper-runner' due to unhandled exception!

Error apparent with kshib

Import audio

I guess this is more of a feature/documentation request, but is there any simple way to manually add files to the database? This would be useful for e.g. Patreon audios

Incomplete downloads from erocast.me

The following audio (NSFW) https://erocast.me/track/3958/shy-vampire-roomate, taken from this reddit post https://old.reddit.com/r/gonewildaudio/comments/wu0csy/f4m_your_shy_vampire_roommate_wants_to_suck_your/ downloads an mp4 file that's 4 minutes long when the real audio length is 14:01 minutes.

Command issued:

python3 gwaripper-runner.py links https://erocast.me/track/3958/shy-vampire-roomate

Download log:

2022-08-23 04:00:54,139 - gwaripper - INFO   - Processing URL 1 of 1: https://erocast.me/track/3958/shy-vampire-roomate
2022-08-23 04:00:55,066 - gwaripper.extractors.base - DEBUG  - Getting html done!
2022-08-23 04:00:55,067 - gwaripper - INFO   - Downloading: shy vampire roomate.mp4..., File 1 of 1
2022-08-23 04:00:55,067 - gwaripper - INFO   - Wating for ffmpeg to finish...
2022-08-23 04:01:08,588 - gwaripper - INFO   - Download report was written to folder _reports

HTML report says DOWNLOADED and NO_ERRORS.

I've tried downloading the m3u8 file served in the website manually with youtube-dl, it seems like the website is throwing a 429 error (Too Many Requests) whenever you try to download the m3u fragments too quickly, the first 3 ones download correctly and then these errors start popping up:

ERROR: [generic] Unable to download webpage: HTTP Error 429: Too Many Requests (caused by <HTTPError 429: 'Too Many Requests'>);

Additional note: I've tried it with a US VPN and now it downloaded 13 minutes before starting to throw 429 errors, however a few minutes in the middle were silent. So it seems inconsistent and dependent on the source IP, probably better to catch the errors and retry the failed downloads in some way.

Missing audio when concurrently downloading and using webgui

I had an issue recently where I attempted to download an audio while the webgui was running in another window. The audio file was downloaded, and gwaripper recognises it's there, refusing to grab it again, but it doesn't appear in the webgui. Not sure if this is an issue unique in some way to this audio or a more general bug. (https://www.reddit.com/r/gonewildaudio/comments/xzzgfh/tm4m_m4m_pupcup_cafe_would_you_like_the_special/)

Uncaught exception: download_hls_ffmpeg

I'm not too sure why this happens but it seems both audio links in the post play at least; I can help debug to the best of my ability. Downloading of subsequent posts halts after this though sadly as the app exits

03:00:12 - INFO - Starting download of collection: https://www.reddit.com/r/GoneWildAudioGay/comments/w0q4bb/mm4moc_a_shy_twin_and_a_devious_twin_and_they/
03:00:35 - INFO - File was already downloaded, skipped URL: https://soundgasm.net/u/PerchanceToDream/MM4MOC-A-shy-twin-and-a-devious-twin-and-they-both-want-you-Mdom-Msub-speakers-Power-bottom-Incest-Twins-Blowjob-Anal-Threesome-Blackmail-Dubcon-Rape
03:00:47 - INFO - Downloading: [MM4M][OC] A shy twin and a devious twin, and they both want you_ [Mdo_02_A shy twin and a devious twin, and they both want you_.mp4..., File 2 of 2
Downloading TS-parts of the m3u8 playlist:
49/6403:02:28 - INFO - Download report was written to folder _reports
03:02:28 - INFO - The last backup date is not yet 5.0 days old! The next backup will be in  4.82 days!
03:02:28 - CRITICAL - Uncaught exception:
Traceback (most recent call last):
  File "/app/gwaripper-runner.py", line 23, in <module>
    main()
  File "/app/gwaripper/cli.py", line 239, in main
    args.func(args)
  File "/app/gwaripper/cli.py", line 309, in _cl_sub
    download_all_subs(sublist, args)
  File "/app/gwaripper/cli.py", line 275, in download_all_subs
    gw.download_all(sublist)
  File "/app/gwaripper/gwaripper.py", line 258, in download_all
    self.parse_and_download_submission(sub)
  File "/app/gwaripper/gwaripper.py", line 173, in parse_and_download_submission
    self.download(info)
  File "/app/gwaripper/gwaripper.py", line 264, in download
    self._download_collection(info, None)
  File "/app/gwaripper/gwaripper.py", line 433, in _download_collection
    self._download_file(
  File "/app/gwaripper/gwaripper.py", line 312, in _download_file
    return self._download_file_hls(info, author_name, top_collection,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/gwaripper/gwaripper.py", line 393, in _download_file_hls
    success = dl.download_hls_ffmpeg(info.direct_url, os.path.abspath(os.path.join(mypath, filename)))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/gwaripper/download.py", line 239, in download_hls_ffmpeg
    raise e
  File "/app/gwaripper/download.py", line 229, in download_hls_ffmpeg
    download_in_chunks(url, full_fn, headers=DEFAULT_HEADERS)
  File "/app/gwaripper/download.py", line 104, in download_in_chunks
    with urllib.request.urlopen(req) as response:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Title of audio file named twicely

Hello. Program works, thanks. 1 thing that idk how to fix is filenames of audios. Description saved as "{title}.txt" normally. But audio saved as "{title} {title}.m4a". So the resulting title is twicely placed and cutted other part in audio file (so audio name is not full with dublicates) and full (no dublicates) in txt file. How to make same full title on audio?

Thanks.

Can not create the config file

First of all a great idea with the gonne wild scraper. Thx

I wanted to use this tool also gladly only I do not get it to run on my pc.
I have installed python 3.11 and 3.09 on windows 11, and I have also the requirements.txt installed.
When I want to start it I do not get the config file written to the root folder.
How could i fix this problem, do you have any idea?
Thanks for any help ♥

`PS C:\Users***\Downloads\GWARipper-0.6.8> python gwaripper-runner.py
No arguments passed! Call this script from the command line with -h to show available commands.
Simulating command line input!!

Type in command line args:

root_path not set in gwaripper_config.ini, use command config -p "C:\absolute\path" to specify where the files will be downloaded to
PS C:\Users*\Downloads\GWARipper-0.6.8> python gwaripper-runner.py config -p
usage: gwaripper-runner.py config [-h] [-p PATH] [-bf FREQUENCY] [-bn N-BACKUPS] [-tf TAG [TAG ...]]
[-tco TAGCOMBO [TAGCOMBO ...]] [-smr {0,1}] [-rci client_id] [-rcs client_secret]
[-ici imgur_client_id] [--only-one-mirror ZERO_OR_ONE] [--host-priority]
gwaripper-runner.py config: error: argument -p/--path: expected one argument
PS C:\Users*\Downloads\GWARipper-0.6.8>`

API Changes and the future

Are there any plans to continue development of GWARipper following the API changes? Are there workarounds?

I'm not well versed with reddit's previous or current API implementation but is it possible to have individual API keys we get for ourselves for personal use, or is that out of the question?

Figure this discussion needed to be made. I appreciate the work you've done and hope we can find solutions for the future, as of now I believe the script is borked.

error: argument POST_LIMIT: invalid int value: 'POST_LIMIT'

Hello. I was trying to download specific user (using redditor command) and no matter how i put number of posts and username it always shows same error
error: argument POST_LIMIT: invalid int value: 'POST_LIMIT'

Thanks.

Edit: POST_NAME and USERNAME arguments not needed in command just number and username

Possible to download an entire subreddit at once?

Thanks for this project! It looks great :)

Is it possible to download every post from a subreddit? I was thinking something like subreddit 999999999 GoneWildAudioGay or something might work, but am not too sure if you'd have to use something like pushshift for that..

HTTP 302 / ERR_CONNECTION_REFUSED when connecting to WebGUI from other computer in LAN

When I connect to the WebGUI from another computer in my LAN, the site doesn't load and I get an ERR_CONNECTION_REFUSED error. I tested this with Firefox and the Brave browser.

I get this message from the terminal that launched the WebGUI, whenever this happens: "GET / HTTP/1.1" 302 -

I run GWARipper through this build: GWARipper-v0.8.0_single-folder_lin-x64