I used this to download a ton of my data from reddit. Figured I'd give some feedback on how I used it/what worked/what didn't
"[X] It looks like you are not authenticated well ..."
Sometimes I would get "[X] It looks like you are not authenticated well. [X] Please check your credentials and retry.". Upon further inspection it was the result of querying a link_id that returned a 403 response, and was not an issue with authentication. As an example, link_id 8vkhv8
, as the subreddit is now private. Probably needs a better error message for that exception/another except case.
JSON format
I found the html format to be nice but not machine-digestable. I added a pretty botched json export along with my html export. After "Submission downloaded", I added
@jsonpickle.handlers.register(praw.models.reddit.submission.Submission, base=True)
class SubmissionHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
return {}
@jsonpickle.handlers.register(praw.reddit.Reddit, base=True)
class RedditHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
return {}
write_json(jsonpickle.encode(submission.comments[:]), submission, submission_id, now, args.output)
I also added jsonpickle to make this work. Someone might find this useful, I liked having both exports
-i take an array?
It would be nice if -i could take an array. I used jq and xargs to pipe all link_ids from my rexport dump into RedditArchiver-standlone. The command was jq '[.submissions[].id, .saved[].id, .upvoted[].id, .comments[].link_id[3:]] | unique | .[]' ~/Seafile/archive/ExportedServiceData/reddit-apiexport/export-username-2023-06-11.json | xargs -i python ~/Seafile/projects/FORKED/RedditArchiver-standalone/RedditArchiver.py -c ./config-username.yml -i {} -o /home/username/Seafile/archive/ExportedServiceData/redditarchiver
if you're interested. Gets all submission ids, saved ids, uploaded ids, and link_id from comments and pipes it to xargs against RedditArchiver.py . It would have been nice if I could have send the entire array to redditarchiver but this worked. Probably make it faster not having to spin up the python interpreter for every invocation
praw.ini / config.yml
I had to make a praw.ini to use refresh_token.py. Kind of annoying to archive multiple accounts because now I have 3 praw.inis and config.ymls. Would be nice if this repo just read praw.ini for all client-specific secrets, or ran refresh_token for you. Probably more work that its worth
Thanks
Thanks for the library! Really saved my weekend. And no pressure to implement this just thought I'd share my pain points