Giter VIP home page Giter VIP logo

redditarchiver-standalone's People

Contributors

ailothaen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

redditarchiver-standalone's Issues

getting config error

after setting up requirements parts then followed the guide to set up config.yml (putting values of client id, secret and refresh token)

python3 RedditArchiver.py  -i https://www.reddit.com/r/LifeProTips/comments/1cprqz2/lpt_packing_stressing_you_out_itemize_a/
[x] Cannot load config file. Make sure it exists and the syntax is correct.

What am i doing wrong?
Please help

colored throwing exception on Windows 11

The colored library used for coloring the console feedback, thows an exception on Windows 11:

[x] Uncaught problem: function 'SetsConsoleMode' not found
Traceback (most recent call last):
  File "RedditArchiver.py", line 392, in <module>
    myprint(f'[i] {len(submission_id_list)} submissions to download', 14)
  File "RedditArchiver.py", line 301, in myprint
    print(f"{colored.fg(color)}{message}{colored.attr(0)}")
             ^^^^^^^^^^^^^^^^^
  File "AppData\Local\Programs\Python\Python311\Lib\site-packages\colored\colored.py", line 276, in fg
    return Colored(name).foreground()
           ^^^^^^^^^^^^^
  File "AppData\Local\Programs\Python\Python311\Lib\site-packages\colored\colored.py", line 48, in __init__
    self.enable_windows_terminal_mode()
  File "AppData\Local\Programs\Python\Python311\Lib\site-packages\colored\colored.py", line 145, in enable_windows_terminal_mode
    ok = windll.kernel32.SetsConsoleMode(wintypes.HANDLE(hStdout), mode)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "AppData\Local\Programs\Python\Python311\Lib\ctypes\__init__.py", line 389, in __getattr__
    func = self.__getitem__(name)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "AppData\Local\Programs\Python\Python311\Lib\ctypes\__init__.py", line 394, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: function 'SetsConsoleMode' not found. Did you mean: 'GetConsoleMode'?

I fixed it by switching to the colorama lib, but have not tested it on other OS, and also my linter prettified the hell of the file so I didn't create a pull request. If anyone is interested on the fix, it consists on installing colorama and then these changes on RedditArchiver.py:

from colorama import Fore, Style, init
init(autoreset=True)

Then substitute the myprint function as follows:

def myprint(message, color, stderr=False):
    """
    Easy wrapper for print
    """
    color = color % 8
    color_dict = {
        0: Fore.BLACK,
        1: Fore.RED,
        2: Fore.GREEN,
        3: Fore.YELLOW,
        4: Fore.BLUE,
        5: Fore.MAGENTA,
        6: Fore.CYAN,
        7: Fore.WHITE
    }
    if stderr:
        print(
            f"{color_dict[color]}{message}{Style.RESET_ALL}", file=sys.stderr)
    else:
        if args.quiet:
            return None
        else:
            print(f"{color_dict[color]}{message}{Style.RESET_ALL}")

Result:
image

Some feedback and thanks!

I used this to download a ton of my data from reddit. Figured I'd give some feedback on how I used it/what worked/what didn't

"[X] It looks like you are not authenticated well ..."

Sometimes I would get "[X] It looks like you are not authenticated well. [X] Please check your credentials and retry.". Upon further inspection it was the result of querying a link_id that returned a 403 response, and was not an issue with authentication. As an example, link_id 8vkhv8, as the subreddit is now private. Probably needs a better error message for that exception/another except case.

JSON format

I found the html format to be nice but not machine-digestable. I added a pretty botched json export along with my html export. After "Submission downloaded", I added

    @jsonpickle.handlers.register(praw.models.reddit.submission.Submission, base=True)
    class SubmissionHandler(jsonpickle.handlers.BaseHandler):
        def flatten(self, obj, data):
            return {}
    @jsonpickle.handlers.register(praw.reddit.Reddit, base=True)
    class RedditHandler(jsonpickle.handlers.BaseHandler):
        def flatten(self, obj, data):
            return {}
    write_json(jsonpickle.encode(submission.comments[:]), submission, submission_id, now, args.output)

I also added jsonpickle to make this work. Someone might find this useful, I liked having both exports

-i take an array?

It would be nice if -i could take an array. I used jq and xargs to pipe all link_ids from my rexport dump into RedditArchiver-standlone. The command was jq '[.submissions[].id, .saved[].id, .upvoted[].id, .comments[].link_id[3:]] | unique | .[]' ~/Seafile/archive/ExportedServiceData/reddit-apiexport/export-username-2023-06-11.json | xargs -i python ~/Seafile/projects/FORKED/RedditArchiver-standalone/RedditArchiver.py -c ./config-username.yml -i {} -o /home/username/Seafile/archive/ExportedServiceData/redditarchiver if you're interested. Gets all submission ids, saved ids, uploaded ids, and link_id from comments and pipes it to xargs against RedditArchiver.py . It would have been nice if I could have send the entire array to redditarchiver but this worked. Probably make it faster not having to spin up the python interpreter for every invocation

praw.ini / config.yml

I had to make a praw.ini to use refresh_token.py. Kind of annoying to archive multiple accounts because now I have 3 praw.inis and config.ymls. Would be nice if this repo just read praw.ini for all client-specific secrets, or ran refresh_token for you. Probably more work that its worth

Thanks

Thanks for the library! Really saved my weekend. And no pressure to implement this just thought I'd share my pain points

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.