nianeyna / ao3downloader Goto Github PK

Utility for downloading fanfiction in bulk from the Archive of Our Own

License: GNU General Public License v3.0

Batchfile 0.36% Python 97.49% HTML 2.16%

ao3downloader's Introduction

What is this?

This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to work with links to the Archive of Our Own itself, but has a secondary function of downloading any Pinboard bookmarks that link to the Archive of Our Own. You can ignore the Pinboard functionality if you don't know what Pinboard is or don't use Pinboard.

PSA: The Troubleshooting section of this readme exists and I swear to you it's not bullshit. If you encounter problems with the script DO THE TROUBLESHOOTING STEPS before giving up and/or sending a bug report. Thank you! 🙏

Announcements: List of changes that may be of note for returning users (not a complete changelog).
Instructions: Complete instructions for downloading and starting ao3downloader on Windows and Mac (running ao3downloader on Linux is left as an exercise for the reader). I have tried to make this as easy to follow as possible, even for those who have little experience with computers. If any of it is confusing, or you have a suggestion to improve the instructions, please contact me.
Menu Options: Explanation of the options you will see when you start ao3downloader and what they do. Note that most of these options will in turn present you with a series of prompts. These should largely be self-explanatory, however, if you are confused by any of the prompts your question may be answered in the notes.
Notes: Explanation of some of ao3downloader's features and quirks that may not be immediately obvious. I recommend reading this.
Known Issues: List of bugs that I know about but haven't yet been able to fix. If you encounter strange behavior, there may be a workaround here.
Troubleshooting: If you encounter a problem running the script, please read this section carefully and do all of the steps in order to the best of your ability before sending a bug report.
Contact: How to get in contact with me. Don't be shy!

Announcements

Sometimes python version updates break the script, so be careful which version of python you use. See Troubleshooting if you don't know how to check your python version. The most recent version of python confirmed to work with ao3downloader is: Python 3.11.4

Filename customization is here! You can change the filename pattern by editing the file settings.ini (instructions are in the file). If you don't wish to customize filenames, you can just not change anything and the program will continue to work the same way.

As of March 8, 2022 I have changed how file names are generated to allow for the inclusion of non-alphanumeric characters (cnovel fans rejoice). If you have a Process going on which relies on file names for the same fic being the same, please take note of this if/when you download the new version of the code.

As of May 14, 2022 I have reduced the maximum length of file and folder names generated by the script from 100 characters to 50 characters. This is to reduce the incidence of download failures caused by exceeding the maximum Windows file path length. Once again, note that this may cause the same fic to be saved under a different name than when it was downloaded previously.

As of September 16, 2022 I have very regretfully removed the series subfolders option, due to the fact that it was causing a huge amount of unnecessary repeated downloads even for people who weren't using it.

As of January 17, 2023 I have changed how file names are generated (again). All file names will now be prefixed with the work id. This is to fix the problem where fics with the same title and author would sometimes overwrite each other in the downloads folder. I have also removed the fandom from the file name, because it was usually gettting cut off by the path length restriction, anyway.

Instructions

install python from this link. do not install the latest version of python, or a version of python lower than 3.9.0.
- if on Windows, make sure you get the "installer" and not the "embeddable package" (if you are not sure which of the installers you need, get the 64-bit one)
- during installation, choose "Customize installation" when prompted, and check the "Add Python to environment variables" checkbox when it appears. (this option was previously called "add to PATH"). everything else can be left as default.
download the repository as a zip file. the "repository" means the folder containing the code.
- if you are reading this on github, you can download the repository by clicking on the "Code" button in github and selecting "Download ZIP"
- if you are reading this on my website, you can download the repository by clicking the button at the top of the page that says "Click to Download"
unzip the zip file you just downloaded. this will create a folder. open it. if you see a file called "ao3downloader.py" then you're in the right place.
- to unzip the file, you must right-click on it and select the option that says something like "Extract All" - don't just double-click it! this may appear to open the folder, but it's really just a preview that won't work correctly.
run the script using the instructions for your operating system:
- windows: double-click on "ao3downloader.cmd" (if you can't see the file extensions: this is the file named "ao3downloader" which does not have a python logo)
  - note: don't use the search bar to find the right file - the script will not work properly when run from the search results pane
- mac:
  - open a terminal window pointed to the folder containing "ao3downloader.py".
    - You can do this by right-clicking on the folder, going to Services at the bottom of the menu, and clicking "New Terminal at Folder". Alternatively, you can type "cd " and drag the folder to the terminal to copy the folder path.
  - enter the following commands one by one:
```
python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade pip
pip install -r requirements.txt
python3 ao3downloader.py
```
  - after this initial setup, when you want to run the program you only need to enter:
```
source venv/bin/activate
python3 ao3downloader.py
```
  - note that if you delete the "venv" folder for any reason you will need to do the initial setup again.
- other platforms: ao3downloader should work on any platform that supports python, however, you will need to do your own research into how to run python programs on your system.

Menu Options Explanation

'download from ao3 link' - this works for most links to ao3. for example, you can use this to download a single work, a series, or any ao3 page that contains links to works or series (such as your bookmarks or an author's works). the program will download multiple pages automatically without the need to enter the next page link manually.
'get all work links from an ao3 listing (saves links only)' - instead of downloading works, this will simply get a list of all the work links on the page you specify (as well as subsequent pages) and save them in a .txt file inside the downloads folder (one link on each line). this is useful if you prefer to download fics through FanFicFare or some other method, rather than using the ao3 download buttons. this option is much, much faster than a full download - usually only a few seconds per page. when using this option you can also choose to download a csv (spreadsheet) file containing detailed work metadata, instead of a plain text file containing links only.
'download links from file' - allows downloading links from a text file with one work or series link on each line. good if you have already harvested the links you want to download via some other method.
'download latest version of incomplete fics' - you can use this to check a folder on your computer (and any subfolders) for files downloaded from ao3 that are incomplete works. for each incomplete fic found, the program will check ao3 to see if there are any new chapters, and if so, will download the new version to the downloads folder.
'download missing fics from series' - checks for files downloaded from ao3 that are part of a series, and for each series found, checks the series page on ao3 and downloads any fics in the series that are not already in your library.
're-download fics saved in one format in a different format' - checks for all files downloaded from ao3 and redownloads every fic it finds (if possible - failed downloads due to deletion or other reasons will be logged). good if you change your mind about what format you want your library to be in. (file type choices for this option are not saved to settings.)
'download marked for later list and mark all as read (requires login)' - for those who like to use their marked for later as a download queue, this option takes the headache out of clearing the list after a download. note that this option does not generate 'starting page x' notifications in the console, but will still download all pages.
'download bookmarks from pinboard' - download ao3 bookmarks from pinboard. ignore this if you don't use pinboard. to get the api token go to settings -> password on the pinboard website.
'convert logfile into interactable html' - all downloads from ao3 (and some other actions) are logged in a file called log.jsonl in the 'logs' folder (if this folder does not exist it means no logs have been generated yet), along with information such as whether or not the download was successful, details about errors encountered, and so on. this option converts log.jsonl into a much more human-readable, searchable and sortable (click on the column headers to sort) html file that can be opened in any browser. the file is called 'logvisualization.html' (filename will also include some numbers indicating the timestamps of the first and last log messages it contains) and is saved in the same place as log.jsonl. If your log file is particularly large, it may get split up across several html files. Note that the searching and sorting functionality (searchbox, filters, etc) may take some time to load in after the page opens. (If it never loads, you can try refreshing the page in your browser.)
'configure ignore list (list of links to never try to download)' - creates (if it does not already exist) a file in the main script folder which allows you to specify links to works or series that you never want the script to attempt to download. particularly good if the work or series update option is perpetually grabbing junk you don't want. this option also gives you a chance to auto-add links to the ignore list if they were previously tagged in the log file as failed downloads due to deletion.

Notes

IMPORTANT: some of your input choices are saved in a file called settings.json (in the same folder as ao3downloader.py). In some cases you will not be able to change these choices unless you clear your settings by deleting settings.json (or editing it, if you are comfortable with json). In addition, please note that saved settings include passwords and keys and are saved in plain text. Use appropriate caution with this file.
You may change certain behaviors of the script by editing the file settings.ini. Current configurable options are:
- Whether the script should save your password - if set to 'false', you will need to re-enter your password every time you log in via the script.
- How many seconds to pause between requests to Ao3 - the default is 0 seconds, which means that pauses will only be initiated when Ao3 requests them. Normally you should not need to adjust this, but it can be useful if you are running into odd behavior related to the rate limit.
The purpose of entering your ao3 login information is to download archive-locked works or anything else that is not visible when you are not logged in. If you don't care about that, there is no need to enter your login information.
Ao3 limits the number of requests a single user can make to the site in a given time period. When this limit is reached, the script will pause for the amount of time (usually a few minutes) that Ao3 requests. When this happens, the start time, end time, and length of the pause in seconds will be printed to the console. If you try to access Ao3 from your browser during this period, you will see a "Retry later" message. Don't be alarmed by this - it's normal, and you aren't in trouble. Simply wait for the specified amount of time and then refresh the page. Other than during these required pauses, you can use Ao3 as normal while the script is running.
If you choose to 'get works from all encountered series links' then if the script encounters a work that is part of a series, it will also download the entire series that the work is a part of. This can dramatically extend the amount of time the script takes to run. If you don't want this, choose 'n' when you get this prompt. (Series that you have bookmarked directly will always be fully downloaded, regardless of what you choose here.)
If you choose to 'download embedded images' the script will look for image links on all works it downloads and attempt to save those images to an 'images' subfolder. Images will be titled with the name of the fic + 'imgxxx' to distinguish them.
- Note that this feature does not encode any association between the downloaded images and the fic file aside from the file name.
- Most file formats will include embedded image files anyway, regardless of whether you choose this option. I have confirmed this for PDF, EPUB, MOBI, and AZW3 file formats. (If you saw me contradict this in an earlier version of this readme... no you didn't)
- Should an image download fail, the details of the failure will be logged in the log file with the message 'Problem getting image' along with the work link and the image link. It's a good idea to check the log file for these messages, since you may still be able to download the image manually or track it down some other way.
If you need to stop a download in the middle, you can just close the window. When you restart the script:
- If you are using the option 'download from ao3 link', you will be given an option to restart the download from the page you left off on. The program will attempt to avoid re-downloading works that are already in the downloads folder.
- If you are using the option 'download bookmarks from pinboard' or 're-download fics saved in one format in a different format', the list of fics to download will be retrieved as normal but will then be filtered to remove work links that meet the following conditions:
  - A record of a download attempt for that link is present in the log file AND
    - There is a fic with the same title already in the downloads folder OR
    - The download was marked as unsuccessful
- If you are using the option 'download latest version of incomplete fics' or 'download missing fics from series', just make sure to add any fics you don't want to download again to your library (that is, the folder you entered when prompted 'input path to folder containing files you want to check for updates') and clean up any old versions before re-starting the download.
- Most methods of avoiding repeat downloads rely on a file called log.jsonl which is generated by the script. Make sure not to move, delete, or modify log.jsonl if you want these features to work. (Using the option to generate the log visualization file is fine.)
When checking for incomplete fics, the code makes certain assumptions about how fic files are formatted. I have tried to make this logic as flexible as possible, but there is still some possibility that not all incomplete fics will be properly identified by the updater, especially if the files are old (since ao3 may have made changes to how they format fics for download over time) or have been edited.
Custom work skins are not preserved in downloaded files. I don't currently have a way around that, however, when a work is downloaded the log entry for the download will contain a column (called 'workskin') indicating whether the work had a custom skin or not, so you can at least know which fics are in danger of looking garbled.
If you need to keep a different version of python on your system for some other purpose, please note that these instructions may not work as expected if you have multiple versions of python installed. However, I can point you toward the following resources:
- Windows: the py launcher may be helpful to you
- Mac and Linux: pyenv may be helpful to you

Known Issues

With the exception of series links, if you enter a link to an ao3 page that contains links to works or series, but does not support multiple pages of results, the script will loop infinitely. Most notably, this applies to user dashboard pages. If this happens, you can close the window to get out of the loop.
When downloading missing fics from series, if you are logged in, and the downloader finds a link to a series that is inaccessible because you do not have permission to access the series page, the downloader will download all of the works linked on your user dashboard page, instead. Yes... really.
Links containing more than 4095 characters may cause issues on Mac and Linux. To work around this (on Mac and Linux only!) enter stty -icanon into your terminal before running ao3downloader. When you are finished running ao3downloader, enter stty icanon to restore the default behavior. H/t github user verotheelf for this workaround.
Links containing more than 8191 characters will cause problems on Windows. There is no workaround, other than using a different link. Thankfully, it is unlikely you will run into this problem, as 8191 characters is quite a lot.

Troubleshooting

Make sure your python version number is not lower than 3.9.0 and is not higher than the most recent version confirmed to work with this script (this number is listed in the Announcements section). If your python version is too low or too high, uninstall python, then install the version linked in Announcements. To check which version of python you are using:
- Windows: open a command prompt and enter "python --version"
- Mac: open a terminal window and enter "python3 --version"
If you are able to create logvisualization.html (menu option 'v'), take a look through the logs to see if there are any helpful error messages.
If there are no logs or the logs are unhelpful, look for a folder called "venv" inside the repository. Delete "venv" and try re-running the script. (Re-running the script will re-create "venv" - that's fine. You only need to do this step once.)
If deleting venv doesn't work, try deleting the entire repository and re-downloading from github (but remember to save your existing downloads and log files if you have any!)
If re-downloading the repository doesn't work, try uninstalling and reinstalling python.
- Make sure you install a compatible version of python as described in the first troubleshooting step.
- Choose "Customize installation" when prompted, and check the "Add Python to environment variables" checkbox when it appears. (This option was previously called "add to PATH"). Everything else can be left as default.
If reinstalling python doesn't work, and you are on Windows, see this stackoverflow answer.
If you have tried all of the above and it still doesn't work, see below for how to send me a bug report.

Questions? Comments? Bug reports?

Feel free to head over to the discussion board and make a post, or create an issue. I prefer to communicate through the above channels if possible, however I understand many of my users don't have github accounts and may not want to make one just for this, so you can also email me at [email protected] if you prefer. Please include "ao3downloader" in the subject line of emails about the downloader. If you are reporting a bug, please describe exactly what you did to make the bug happen to the best of your ability. (More is more! Be as detailed as possible.)

(Please note that while I will absolutely do my best to get back to you, I can't make any promises - I have a job, etc.)

ao3downloader's People

Contributors

Stargazers

Watchers

Forkers

ericfinn thenianblues quihi momijizukamori shamelesslymkp pingnova hendersonyang celestaia elsy07 prichmp enstarprise mn3m0syn3

ao3downloader's Issues

Find some way to integrate with ao3-stylish-downloader

https://github.com/niacdoial/AO3-stylish-downloader I covet this functionality and want it for myself! not sure how to make the leap from command-line though, also it seems to rely on binaries from calibre that would need to be included somehow.

Refactor to object-oriented architecture

This is a big task but it should go a long way toward making future development easier. The functional programming approach has its charms but I'm starting to run into its limits - the parameter list for download_recursive is just getting silly, for example.

Make file naming scheme configurable

Ignore Muted Users

Discussed in #82

^{Originally posted by readingfan April 29, 2023}
Hello,

I was wondering if the downloader ignores Muted users? I have several, but it seems that unfortunately some keep showing up when I update a series. Does the most recent version of the downloader ignore these users?

Also, I found instruction on how to block a tag via CCS by a user named najo, would the program recognize a skin if it was set to be on when logged in?

Thank you for your help!

Auto-update bookmarks with work metadata

In particular work link, title, and author - to facilitate tracking down deleted works via wayback or other means. Would want to ensure any existing bookmark description data is preserved, but not endlessly append the same data on subsequent runs.

Downloaded New Versions of Python and Downloader Now Have a NoModuleFoundError: no module named "requests"

Hello,

Thank you for making the the AO3 downloader. I have an issue. I decided to download the newest versions of the downloader and python but now have a NoModuleFound: No module named "requests" error. I think I downloaded everything correctly but I don't know what to do now.

Thank you for reading.

Unable To Download Individual Fics From Collections

Hello,nineyna! First and foremost, thank you so much for creating this wonderful program. I have 0 experience in programming and I really appreciate your instructions and this program in general. ( I sent an ask previously if this question is familiar to you.)

I have run into an issue while using it. I can't download individual stories from collections for example Big Bangs. If I include stories in a series, they are able to be downloaded, however the individual fics do not show up in the downloads folder.

I have tried deleting the settings, re-downloading the program itself. and I tried to see if the files were hidden but still nothing. I checked for this issue in several separate collections as well as tried to see if maybe my filters affected the download but the result is the same.

I have checked the logs and there is only the link of the collection listed but then no stories listed as being downloaded if I don't select to download from a series.

I don't have this issue if I am downloading from my favorites, from a creators page or from a works tag, just the collections.

I do not know if other people also ran into this issue or if it is just me. I understand if you do not have the time or if it is not possible to fix this issue but I wanted to bring this to your attention.

Thank you so much for making this awesome program! Me and my library are very grateful to you.

Fix exception when work text contains the string "This work could have adult content"

example: https://archiveofourown.org/works/8018749

Also investigate other exception checks (locked, deleted, etc) to avoid this issue.

Reduce maximum file name length

This is causing errors for people whose root folder path is more than ~50 characters

The Downloader won't work for some reason.

I was following the instructions but it didn't work. I already have Python 3.10 as far as I know. I've included a screenshot of the folder where it is. When I try to do the windows: double-click on "ao3downloader.cmd" step it gives me the error in the second screenshot.

Move logs into root folder

This will make the log files easier to find and harder to accidentally delete. Also, while most log entries are going to be about a download event, not all of them are so it's weird to have such a strong link between the log file and the concept of "downloads"

Allow checking for updates to series

Probably going to be tricky! So worth it though.

Add support for downloading from manually entered links lists

Discussed in #41

^{Originally posted by coolioniki August 23, 2022}
Hello! This is an amazing piece of work.

I wanted to ask if there's a way to input multiple links (directly of the fic) into the program and download it that way? Like FanFicFare (which is amazing and what I have been using so far, but have also been having formatting issues with).

...

Ao3 reformatted pdf stats?

When I update incomplete fics from folder, I have the program search through my PDFs, but realized the other day that some of them weren't downloading despite there being a new update. Looked through the pdfs causing issues and it appears the stats are now formatted Published, Updated, Words, Chapters. Because 'Words' is no longer at the end, unless there is no 'Updated' stat, the actual chapter numbers get thrown onto the second line. Since the logic for finding pdf stats assumes they are on the same line as 'Chapters: ', it's now broken.

I also download EPUBs so I tried searching for incomplete fics using that format instead. Seems to still be working, but not sure if any other formats are also affected.

Improve interoperability with FanFicFare

Check for already downloaded fics when downloading from an ao3 link

This functionality already exists for other functions but needs to be expanded to cover downloading from ao3 links as well.

Add option to download multiple formats

Use logging or events instead of print() in certain places

Specifically:

inside repo.py when alerting the user about status 429 events
inside soup.py when alerting the user that a new page has started downloading

Reasoning: separation of concerns between the logic layer and the UI layer

Add option to download links list as a csv with metadata

Find deleted works in library

Would work similarly to the redownload option in that it would scan every work found in a folder, but instead of downloading anything would simply spit out a list of works that return a 404 when pinged

Use local jquery and datatables libs

Currently log visualization will break if there's a problem with the cdn or the user's internet connection, and there's no performance reason to use cdns in this case. Not doing this to begin with was mainly laziness on my part.

Let users toggle password save

Would want to do this in some persistent way that doesn't require running the program in order to change. Possibly an ini file. Should probably be turned off by default, rather than on like it currently is.

Find some way to pause non-paginated downloads

The pinboard and redownload functions are a bit scary because it is expected that they might need to download very large numbers of fics - potentially taking many hours - and there is absolutely no way to pause them in the middle. There has GOT to be some way to save state on these guys.

Add "ignorelist" file to perma-skip links you don't want to download

include option to auto-add any links logged as "Deleted" in the log file
allow arbitrary comments to allow users to keep track of what the link is and why they added it to the list
syntax: https://archiveofourown.org/works/12345; Deleted - one link and (optional) comment per line
links must begin with https://archiveofourown.org

Request: Ignore Incomplete Fic That Did Not Update When Running Updating

Hello,

Thank you for creating the AO3 Downloader! It has been incredibly helpful in creating my library. I was wondering however if there is a way to make updating the folder faster?

I have 55000 works and of those according to the program I have 4950 in progress works. This takes some 9+ hours to run the program for that. There is basically a 5 min downtime between every 70 works downloaded. But obviously since not all of those would be updating I generally get maybe between 150 of the 4950 works or so get updated at most after the 9 hours of the program running.

I was wondering if there is a way to improve the speed by ignoring works that were last updated in 2015 and just update the ones that had chapters added since the last update?

Ex: Ignore: The Sea - Fandom - 35/? - last Update: 1/1/2015 but take The Sea - Fandom -36? - Last Update 4/23/2023 should it update?

Even if it would take some time to sort out the works I think if this was possible it would improve the speed greatly since the older WIP works may get updated but sometimes are just discontinued, so why re-download if there is no update?

P.S. A question: if a work is marked complete but gets new chapters does it get updated too? I noticed that some authors mark a work as complete, but sometimes add on in the same story as extras. Does that get updated too?

Thank you for your hard work on this program. I understand if this is not possible but still had to ask.

I hope you have a great day.

Make logvisualization look nicer

Pure vanity

Work links in summaries should not be included in the download

Regression caused by #43. Previously, the downloader only recognized internal/relative work links (of the form /works/12345). Now, work links of the form [...]/works/12345 are included also. This has the unintended side effect of including absolute work links (https://archiveofourown.org/works/12345) in the list. This can happen when an author manually links to another work on ao3 in their summary. Aside from not being intended behavior, the ao3 root url is still prepended to the absolute links (https://archiveofourown.orghttps://archiveofourown.org/works/12345) causing mayhem.

tl;dr amend the work and series patterns to recognize internal (starting with "/") links only.

Initial set-up MacOS error code return

Let me preface by saying I don't know coding at all and am following the guide to the best of my ability. I run a Mac OS and I'm getting a return error in the initial set-up run command window following the command "pip install -r requirements.txt". I think what the error means is that I'm missing a key code package, but I figure it would be best to ask the developer to be sure. Code pasted below:

Collecting beautifulsoup4==4.9.3
Using cached beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
Collecting certifi==2020.12.5
Using cached certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
Collecting cffi==1.15.0
Using cached cffi-1.15.0-cp310-cp310-macosx_11_0_arm64.whl (173 kB)
Collecting chardet==4.0.0
Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB)
Collecting colorama==0.4.4
Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Collecting cryptography==36.0.1
Using cached cryptography-36.0.1-cp36-abi3-macosx_10_10_universal2.whl (4.8 MB)
Collecting cssselect==1.1.0
Using cached cssselect-1.1.0-py2.py3-none-any.whl (16 kB)
Collecting EbookLib==0.17.1
Using cached EbookLib-0.17.1.tar.gz (111 kB)
Preparing metadata (setup.py) ... done
Collecting idna==2.10
Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Collecting loguru==0.5.3
Using cached loguru-0.5.3-py3-none-any.whl (57 kB)
Collecting lxml==4.6.4
Using cached lxml-4.6.4.tar.gz (3.2 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
Building lxml version 4.6.4.
Building without Cython.
Error: Please make sure the libxml2 and libxslt development packages are installed.
[end of output]

Save embedded images

Nothing fancy here. Probably just gonna label them with the fic name and dump them in a subfolder in downloads (presuming I can get it to work in the first place, but theoretically it should just be a matter of parsing the soup for img tags)

Wayback machine integration

Add an option to save a snapshot of downloaded fics to the internet archive. Programmatically saving a snapshot of a page can be achieved by GETting 'https://web.archive.org/save/' + the url to snapshot.

Considerations:

Properly save explicit works by adding 'view_adult=true' to the query string. Detecting when this is necessary may be difficult. One option is to include it in the query string of all saved works, but this is clumsy and may cause confusion.
The internet archive imposes a rate limit on saving snapshots. At last check I think this is five seconds between calls. This eliminates the option that I can just sneak the wayback calls in synchronously, because the timing won't line up.
- Option a: do the wayback saving in a second step (scraping the urls from the log file - or create a dedicated file just for this purpose). This might involve either some code duplication or a refactor. Also, just kind of a clunky user experience.
- Option b: use a queue. This is a cool idea and it would be nice to say I did it, but my previous attempts at it have been disasters. Also, it would interact oddly with the concept of stopping the download in the middle, which is an important feature.

Make update folder setting editable

...

Updated Fic Not Properly Detected

Noticed some inconsistencies in the program's ability to detect updated work, no matter which format I had it check. Found that it was correctly determining the chapter count on both my fic and its counterpart on ao3.

Isolated the problem to try_download() in ao3.py. When it compares currentchapters to chapters, both values are still strings so in my case, 10<=9 was coming back True. Wrapping the values in an int function during comparison seems to resolve the issue.

Creating a link list with metadata is throwing an error.

I tried twice and got the same error at the same point.

If you want the specific link I used without having to type it out: https://archiveofourown.org/works?commit=Sort+and+Filter&work_search%5Bsort_column%5D=kudos_count&work_search%5Bother_tag_names%5D=&work_search%5Bexcluded_tag_names%5D=&work_search%5Bcrossover%5D=&work_search%5Bcomplete%5D=&work_search%5Bwords_from%5D=&work_search%5Bwords_to%5D=&work_search%5Bdate_from%5D=&work_search%5Bdate_to%5D=&work_search%5Bquery%5D=&work_search%5Blanguage_id%5D=&tag_id=Wednesday+%28TV+2022%29

Improve handling of custom exceptions

Make all custom exceptions inherit from a base class
Don't bother logging stack trace for custom exceptions... I know where they came from
Instead of "message" field, just pass the relevant message into the exception constructor (no idea why I didn't do this in the first place...)

Add option to re-download fics saved in one format as a different format

Log file locations and urls of fics found to be incomplete

Currently a log is only written at the download step, and that only happens if the fic is found to have new chapters. It would be better to have visibility into all incomplete fics, including the ones that haven't been updated.

Add option to save a list of work links only

For the FanFicFare folks. Might do a csv with some rich data like titles etc

Add slowmode

Currently the program does its best to skate just under the ao3 rate limit, which makes it go as fast as possible but tends to foul up casual ao3 browsing while it's running. Add an option to halve the speed so you can browse in peace.

Fix edge case where the downloader erroneously thinks there are no more pages

This happens when all the links on a page have already been downloaded via series expansion.

Split up huge log files

Add option to update multiple formats

Update Folder Doesn't Work

Issue occurs while using: 'download latest version of incomplete fics'

The no option for 'check the same folder as last time' does not work. Doesn't prompt for a new folder and still utilizes the one originally saved in settings.json.

Problem seems to be in update_folder() in shared.py. The same code is used for the initial folder save and updating, but this relies on fileops.setting(). The method only prompts for user input/saves if the setting does not yet exist or does not have a value already attached.

ModuleNotFound: No Module named 'bs4' Error

Hello, ninaneya

I hope everything is going well for you. Thank you as always for your hard work. I downloaded the most current version of the downloaded and ran into this issue:

ModuleNotFound: No Module named 'bs4' Error

I tried to research it myself and found out it has something to do with a thing called Beautiful Soup but could not install it by opening python and inputting the command that was listed in it. One of the advice forums said to open a virtualenv? Is that the python command screen?

Any help would be greatly appreciated.

Update: I was able to install 'bs4' after spending time troubleshooting it. via "pip install"? But after doing that, the program said it didn't have several other modules: ebook, mobi and something else I think as well?

For reference my python program is 3.10 and I was trying to install the most recent version of the program..

Thank you for creating this program! I really appreciate it.

"download links from file" keeps looking for more pages of series without stopping if the link has a space at the end

Ignore everything below the line I figured it out!!!

When leaving a space at the end of the link ( \n) the script will try to download pages when fetching the other works in a series. This is the result of user error not a bug, because of the way I generated the list of links. This does not need to be fixed if a warning is added for users.

But I do have a suggestion for a setting to set a page limit, I keep sending ao3 100 requests for empty pages and getting temp banned all the time.

I encountered an error when downloading a list of links.
Here is what my log file said:

%%[idk where the log starts, this is mid execution]%%
{"starting": "https://archiveofourown.org/works/7269544 \n", "timestamp": "02/05/2023, 22:34:43"}
{"starting": "https://archiveofourown.org/works/3204764 \n", "timestamp": "02/05/2023, 22:34:43"}
{"link": "https://archiveofourown.org/works/3126128", "title": "3126128 Way Better Than Flowers - orphan_account", "workskin": false, "success": true, "timestamp": "02/05/2023, 22:34:45"}
{"series": "Sterek Tumblr Prompts", "link": "https://archiveofourown.org/works/2537945", "title": "2537945 French Silk Pie, Baby - orphan_account", "workskin": false, "success": true, "timestamp": "02/05/2023, 22:34:49"}
%%[the rest of the series]%%
{"link": "https://archiveofourown.org/works/3469925", "title": "3469925 Down the Rabbit Hole - orphan_account", "workskin": false, "success": true, "timestamp": "02/05/2023, 22:35:10"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=2", "timestamp": "02/05/2023, 22:35:10"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=3", "timestamp": "02/05/2023, 22:35:10"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=4", "timestamp": "02/05/2023, 22:35:11"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=5", "timestamp": "02/05/2023, 22:35:11"}
{"starting": "https://archiveofourown.org/works/3204764 \n?page=6", "timestamp": "02/05/2023, 22:35:12"}
%%[i had to quit the program]%%

um don't judge the works i just had random links in my OneTab for some reason, i jut exported them and fixed the links with csv so i could merge them all in calibre

This happened twice with these two links before I gave up and manually downloaded all the works. ;(

I tried to look through the code to fix this myself but having never seen python code I gave up after an hour of trying to figure it out... but I don't really understand why it did this? My test with two random works seemed to be fine; the second work was second in the series and the first one got downloaded too. I think it was simply because the series was too long?

{"link": "https://archiveofourown.org/works/4371455\n", "title": "4371455\n hello goodbye ('twas nice to know you) - tamerofdarkstars", "workskin": false, "success": true, "timestamp": "02/05/2023, 23:17:33"}
{"link": "https://archiveofourown.org/works/234222", "title": "234222 Then Comes a Mist and a Weeping Rain - Faith Wood (faithwood)", "workskin": false, "success": true, "timestamp": "02/05/2023, 23:17:36"}
{"link": "https://archiveofourown.org/works/12507420", "title": "12507420 right hand to god - rohkeutta", "workskin": false, "success": true, "timestamp": "02/05/2023, 23:51:58"}

But what I attempted to do to fix the issue was to get the prompt from "download from ao3 link" to show where I could set a limit on the pages downloaded, but that would be a janky solution anyways. Downloading a series shouldn't try to download pages in the first place.... wait what's that new line character doing there...