Giter VIP home page Giter VIP logo

statement-dl's Introduction

statement-dl

PyPI calver

This tool is built to be able to automatically download all documents from online banks and brokes that don't offer batch download options themselves, since it is usually quite cumbersome to download each file individually. The automatic download is especially useful when used in conjunction with portfolio tracking tools like portfolio performance that are able to subsequently import the downloaded PDFs. The downloaded files are sorted into separate subdirectories based on the document type.

Currently, only flatex is supported. PRs are welcome!

Installation

This Python tool currently uses Firefox and geckodriver to find and download your files. You can download Firefox from the mozilla homepage: https://www.mozilla.org/de/.

You can download geckodriver from Github: https://github.com/mozilla/geckodriver/releases.

Then, download statement-dl using pip: pip install -U statement-dl or simply clone the repository and install from source using pip install -U <repository-path>. If you don't want to install the tool and its dependencies into your global python environment, I recommended using pipx instead of pip directly.

Usage

To start off, you probably want to download all your previous documents. By default, the tool only downloads the unread files. To download all files from flatex, use the command

>>> statement_dl flatex <destination dir> --all-files

If you don't specify the --username and --password options, you will be prompted to enter them yourself in the browser.

To see all options, type

>>> statement_dl --help

To get all options for a specific broker/bank, type e.g.

>>> statement_dl flatex --help
usage: statement_dl flatex [-h] [-f DATE] [-t DATE] [-g PATH] [-u USERNAME]
                           [-p PASSWORD] [--wsl] [--headless] [-a] [-k] [--de]
                           dest

positional arguments:
  dest                  Directory in which your downloaded files will be saved

optional arguments:
  -h, --help            show this help message and exit
  -f DATE, --from-date DATE
                        Date from which you want to download your files (in
                        the format YYYY-MM-DD or 'today'). Defaults to
                        '2010-01-01'
  -t DATE, --to-date DATE
                        Date until which you want to download your files (in
                        the format YYYY-MM-DD or 'today'). Defaults to 'today'
  -g PATH, --geckodriver PATH
                        Path to geckodriver executable. If not specified, it
                        will look for it in the Path
  -u USERNAME, --username USERNAME
                        Username for automatic login
  -p PASSWORD, --password PASSWORD
                        Password for automatic login
  --wsl                 Set this option when running the script in WSL while
                        using a geckodriver executable that was installed on
                        Windows
  --headless            Launch browser in headless mode. This only works if
                        username and password are set
  -a, --all-files       Automatically download all files instead of only
                        unread
  -k, --keep-filenames  Keep the original filenames instead of renaming them
                        to a more useful format
  --de                  Use 'de' domain instead of 'at' (experimental)

statement-dl's People

Contributors

pspeter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

dmateiu krensi

statement-dl's Issues

Experimental --de argument not working - form button ids wrong

running with the --de argument stops the script, returns:
"selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: //input[@id="loginForm_userId"]"

The form input ids for UserId, password and login button on the DE login page are different, so the --de argument would need to swich these IDs as well.
For DE it must be "loginForm_txtUserId" and "loginForm_txtPassword_txtPassword" and "loginForm_loginButton"

After changing these IDs in lines 128, 129, 144 and 146 accordingly, the script worked perfectly for my DE account.

Attached my working flatex.py file.
EDIT: Just built in the switch myself, and it works. Changed the _login() function to take the de argument and the download_documents() function passes it upon calling it. inside _login() the form ids are used based on the de argument.
Feel free to review:
flatex.txt

Thanks a lot for this tool and your work! Exactly what i was looking for.

P.S. (Tool just finished downloading all files!)

flatex.py fails: TypeError: _download_current_pdfs() missing 1 required positional argument: 'tld'

The lines 197/203 of flatex.py should be the same:

_download_current_pdfs(driver, download_path, dest, all_files, keep_filenames, tld)

instead of
_download_current_pdfs(driver, download_path, dest, all_files, keep_filenames)

So The parameter ", tld" is missing. This leads to the error:

C:\Users\XXXX>statement_dl flatex D:\Documents\ --all-files -g C:\Users\XXXX\geckodriver.exe
Please login in the browser
Downloading files to D:\Documents
More than 100 files, paging through results
Traceback (most recent call last):
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python310\Scripts\statement_dl.exe\__main__.py", line 7, in <module>
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\statement_dl\__init__.py", line 92, in main
    args.func(args)
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\statement_dl\flatex.py", line 42, in download_documents_from_args
    download_documents(
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\statement_dl\flatex.py", line 93, in download_documents
    _download_pdfs(
  File "C:\Users\XXXX\AppData\Local\Programs\Python\Python310\lib\site-packages\statement_dl\flatex.py", line 197, in _download_pdfs
    _download_current_pdfs(driver, download_path, dest, all_files, keep_filenames)
TypeError: _download_current_pdfs() missing 1 required positional argument: 'tld'

and the script would not work.

Thanks for the great work!

`ValueError: time data '***221 Cashkonto - My Name' does not match format '%d.%m.%Y'`

Hi, after executing statement_dl flatex documents --all-files, I get the following error:

Please login in the browser
Downloading files to /my/path/to/the/documents
More than 100 files, paging through results
Traceback (most recent call last):
  File "/Users/username/Library/Python/3.8/bin/statement_dl", line 8, in <module>
    sys.exit(main())
  File "/Users/username/Library/Python/3.8/lib/python/site-packages/statement_dl/__init__.py", line 92, in main
    args.func(args)
  File "/Users/username/Library/Python/3.8/lib/python/site-packages/statement_dl/flatex.py", line 42, in download_documents_from_args
    download_documents(
  File "/Users/username/Library/Python/3.8/lib/python/site-packages/statement_dl/flatex.py", line 93, in download_documents
    _download_pdfs(
  File "/Users/username/Library/Python/3.8/lib/python/site-packages/statement_dl/flatex.py", line 194, in _download_pdfs
    last_date = _parse_list_date(last_date_string)
  File "/Users/username/Library/Python/3.8/lib/python/site-packages/statement_dl/flatex.py", line 357, in _parse_list_date
    return datetime.strptime(date_string, "%d.%m.%Y").date()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '***221 Cashkonto - First Last Name' does not match format '%d.%m.%Y'

I am on a macbook, latest software. I installed Firefox now from its webpage, also geckodriver (via homebrew), and statement_dl via pip3 today.

The error is with this line:

return datetime.strptime(date_string, "%d.%m.%Y").date()

My documents are called ***221 Cashkonto - ... and not some datetime. I think that's the issue. Do you know what I could do to download all pdfs?

Thanks by the way for sharing your code!

Best

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.