markwk / qs_ledger Goto Github PK

View Code? Open in Web Editor NEW

964.0 55.0 196.0 8.73 MB

Quantified Self Personal Data Aggregator and Data Analysis

License: MIT License

Jupyter Notebook 99.66% Python 0.34%

quantified-self personal-data data-analysis data-visualization apple-health lastfm fitbit rescuetime todoist toggl

qs_ledger's Introduction

Quantified Self (QS) Ledger

A Personal Data Aggregator and Dashboard for Self-Trackers and Quantified Self Enthusiasts

Quantfied Self (QS) Ledger aggregates and visualizes your personal data.

The project has two primary goals:

download all of your personal data from various tracking services (see below for list of integration services) and store locally.
provide the starting point for personal data analysis, data visualization and a personal data dashboard

At present, the main objective is to provide working data downloaders and simple data analysis for each of the integrated services.

Some initial work has been started on using these data streams for predictive analytics and forecasting using Machine Learning and Artificial Intelligence, and the intention to increasingly focus on modeling in future iterations. .

Code / Dependencies:

The code is written in Python 3.
Shared and distributed via Jupyter Notebooks.
Most services depend on Pandas and NumPy for data manipulation and Matplot and Seaborn for data analysis and visualization.
To get started, we recommend downloading and using the Anaconda Distribution.
For initial installation and setup help, see documentation below.
For setup and usage of individual services, see documentation provided by each integration.

Current Integrations:

Apple Health: fitness and health tracking, data analysis and dashboard from iPhone or Apple Watch (includes example of Elastic Search integration and Kibana Health Dashboard).
AutoSleep: iOS sleep tracking data analysis of sleep per night and rolling averages.
Fitbit: fitness and health tracking and analysis of Steps, Sleep, and Heart Rate from a Fitbit wearable.
GoodReads: book reading tracking and data analysis for GoodReads.
Google Calendar: past events, meetings and times for Google Calendar.
Google Sheets: get data from any Google Sheet which can be useful for pulling data from IFTTT integrations that add data.
Habitica: habit and task tracking with Habitica's gamified approach to task management.
Instapaper: articles read and highlighted passages from Instapaper.
Kindle Highlights: Parser and Highlight Extract from Kindle clippings, along with a sample data analysis and tool to export highlights to separate markdown files.
Last.fm: music tracking and analysis of music listening history from Last.fm.
Oura: oura ring activity, sleep and wellness data.
RescueTime: track computer usage and analysis of computer activities and time with RescueTime.
Pocket: articles read and read count from Pocket.
Strava: activities downloader (runs, cycling, swimming, etc.) and analysis from Strava.
Todoist: task tracking and analysis of todo's and tasks completed history from Todoist app.
Toggl: time tracking and analysis of manual timelog entries from Toggl.
WordCounter: extract wordcounter app history and visualize recent periods of word counts.

EXAMPLES:

Combine and Merge Personal Data into Unified Data Frame: This example notebook provides a step-by-step walkthrough about how to combine multiple data points into a unified daily CSV of personal metrics.
Simple QS Correlation Explorer with Plot.ly and Dash: This example code uses combined data frame to generate a simple way to view data, visualize correlation and test for linear regression relationship. Requires Dash and Plot.ly.
Import Apple Health Data into Elastic Search and Create a Dashboard: This example code shows how to import data from a panda's dataframe into Elastic Search and then how create necessary indexes, objects and finally a working dynamic dashboard. See Apple Health readme for specific instructions.

How to use this project: Installation and Setup Locally

Until we provide a working version for Google's Collab or other online jupyter notebook setups, we recommend to get started by downloading and using the Anaconda Distribution, which is free and open source. This will give you a local working version of Numpy, Pandas, Jupyter Notebook and other Python Data Science tools.

After installation, we recommend create and activating a virtual environment using Anaconda or manually:

python3 -m venv ~/.virtualenvs/qs_ledger

source ~/.virtualenvs/qs_ledger/bin/activate

Then clone the current github repo:

git clone https://github.com/markwk/qs_ledger.git

Using your activate virtual environment, install dependencies:

pip install -r requirements.txt

Then navigate into your directory and launch an individual notebook or the full project with jupyter notebook or jupyter lab:

jupyter lab

Code Organization

Best practices and organization are still a work-in-progresss, but in general:

Each project has a NAME_downloader and NAME_data_analysis.
Some projects include a helper function for data pulling.
Optionally, some projects have useful notebooks for specific use cases, like weekly reviews.

Useful Shortcuts

You can use command line to run jupyter notebooks directly and, in the case of papermill, you can pass parameters:

With nbconvert:

pip install nbconvert
jupyter nbconvert --to notebook --execute --inplace rescuetime/rescuetime_downloader.ipynb

With Papermill:

pip install papermill
papermill rescuetime_downloader.ipynb data/output.ipynb -p start_date '2019-08-14' -p end_date '2019-10-14'
NOTE: You first need to parameterize your notebook in order pass parameters into commands.

Creators and Contributors:

Mark Koester

Want to help? Fork the project and provide your own data analysis, integration, etc.

Questions? Bugs? Feature Requests? Need Support?

Post a ticket in the QS Ledger Issue Queue

qs_ledger's People

Contributors

Stargazers

Watchers

Forkers

bubbletao wkerzendorf marlinspike leolorenzoluis ke0h imadeit andybor ruixuandai katychow mshaneburns avjdataminer rdorris2 jeanvelloen henrike9 xsmsx gwthompson mbbroberg zaibfridi cabush dcruuuz nazmus-shakib-mist-ipe02 guusjeb yh0903 fobiasmog vmgbritt prakhar625 kylesalcedo jpeck219 andheartsjaz polonius11 wyenaught tadzz jennysun91 huddlestona algarcia89 nluetzge hugupic davidcheung83 yapsycho nazihkalo nuhabit-ai hl03 indigosphinx tylerwatkins101 alexsu006 neileichelberger omgitsjj zackhardtoname peichenpatty hnykda thoratprasad luiscam7 marcmastro dmozealous huasheng bottleofpop pctseng7 ncrisler antoalv19 joelbell brycedewitt elwarren avunque hieund12 ipaasjedi sonicks codeofnature plebuhn1 shawnyu0217 acgomide gregmames chrisweiss csenf jbquant christyhorn14 michaeljblow wanleizhang curthert oaarne jules410 dustynash brohandev bluesmyboy16 tsmcgrath djbowers laurenstory lehulbert lac-uefa igor-handle smwade wabi-sabi-software kriszieba mbriggs134 steveperx dhruvbaldawa gerardortobar marcusblake-uc rkedge redyellow gablerp

qs_ledger's Issues

How to get started?

Hey all, I'm hoping to get this project up and running and I'm starting from a fresh MacOS environment. I don't yet understand the toolchain necessary to do so.

Notes:

I'm running Python 3.7, with python aliased to python3.7 and pip aliased to pip3

I'm specifically:

Beginning by installing Anaconda, which successfully installed
- verification on the CLI using conda -V shows conda 4.6.11
- from the GUI of Anaconda-Navigator.app
- I also installed PyCharm with the Anaconda plugin
Starting from the Todoist downloader, I run pip install todoist-python
- no errors on installation
I set up my credentials in credentials.json
I then try to view todoist_downloader.ipynb
- in PyCharm, it only shows a JSON object due to it being the community edition
- through Anaconda-Navigator, I launch Jupyter Notebook, choose the todoist_downloader.ipynb file, then get stuck.
Jupyter Notebook says ModuleNotFoundError: No module named 'todoist' when I attempt to run it.
- conda install todoist doesn't work, but I did re-run pip install todoist-python to verify it was installed and used pip list | grep todoist to verify it was there.

What's the best path forward? I'm out of my Python depths 🐍 😄

Error on running Todoist analysis

I'm working from the Jupyter Notebook launched by Anaconda Navigator and I'm running each section one-by-one. I pulled the data using your script at https://github.com/markwk/todoist_export. After importing and running the other earlier tasks, I then grab the data from that location.

tasks = pd.read_csv("/Users/mbbroberg/Develop/todoist_export/data/todost-tasks-completed.csv")
len(tasks)

Once I reach year_data = tasks['year'].value_counts().sort_index(), I receive the error below. When I look at the tasks object, I don't see a column for year (see screenshot). Is it possible I'm looking at the wrong data or am I missing something?

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2656             try:
-> 2657                 return self._engine.get_loc(key)
   2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'year'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-24-a30ed33e8ded> in <module>
----> 1 year_data = tasks['year'].value_counts().sort_index()

/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2925             if self.columns.nlevels > 1:
   2926                 return self._getitem_multilevel(key)
-> 2927             indexer = self.columns.get_loc(key)
   2928             if is_integer(indexer):
   2929                 indexer = [indexer]

/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2657                 return self._engine.get_loc(key)
   2658             except KeyError:
-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'year'

strava - authorization flow change

hi Mark, thanks for sharing strava data downloader code. All works smoothly until when I try to pull activities. I believe they have changed the API settings and now that newly generated general token is not able to access activities (is just of scope read, whereas read_all scope is required). You can read more about this here:

https://developers.strava.com/docs/oauth-updates/

and here is stack overflow post with details of potential solution:
https://stackoverflow.com/questions/52880434/problem-with-access-token-in-strava-api-v3-get-all-athlete-activities

Also stravalib seems to include some code that should help with this:
https://github.com/hozn/stravalib#authentication

Thanks!

Demogrhaphic information is not parsed

The demographic information is not saved to the csv. information like HKCharacteristicTypeIdentifierDateOfBirth etc is parsed but not saved to csv

Add support for Whoop activity tracker?

https://www.whoop.com

Great device, but doesn't integrate into apple health so data is siloed in whoop at the moment. Would be great to be able to export it.

Nate

any help is appreciated..

Hello
where is the error? it's the right path.. followed all the procedure..

Thanks!

interpreting apple watch timestamps

Hi!

I am using this awesome work to look at my health data, and I have a question about how to interpret the timestamps/datetimes.

If I open the csv file I am interested in see that the creationDate of the first row is 2017-01-28 10:20:52 +0200, while if I read the file with pandas.read_csv(...) using the parse_dates argument, I get the corresponding value to be 2017-01-28 08:20:52.

In the blog post it says that the data contain UTC timestamps and that's ok, my problem is more about the timezone info: During this summer I spent the August in the US (I live in Europe), but I don't see this reflected in the data: here is what the creationDate of a row on August 8th looks like: 2019-08-08 02:03:20 +0200.

Shall I assume that the timestamp (i.e. the pure date & time, discarding the timezone info) is still UTC and then I have to figure out the daily time zone by myself? Or what else?

I wonder if there is a bug apple-health-data-parser.py ?

qs_ledger is really helpful. It would be hard for me to use the XML version

I wonder if I found a bug in apple-health-data-parser.py. I noticed the HeartRate.csv that the head had 8 field, how the data had 14. tools like unix cut fail to parse

I looked at the XML I think this is what a heart rate record looks like. It looks like apple include ',' in the value string. Once I figure out this out I was able to work around it

<Record type="HKQuantityTypeIdentifierHeartRate" sourceName="andrew e.’s Apple Watch" sourceVersion="5.1.3" device="&lt;&lt;HKDevice: 0x281e21c20&gt;, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch4,4, software:5.1.3&gt;" unit="count/min" creationDate="2019-07-29 16:13:30 -0800" startDate="2019-07-29 16:10:29 -0800" endDate="2019-07-29 16:10:29 -0800" value="53">

This is the Header from the HeartRate.csv file

sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value

this is a record from HeartRate.csv . I broke it up to figure what was going on

"andrew e.’s Apple Watch",
"5.1.3",
"<<HKDevice: 0x281e36490>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch4,4, software:5.1.3>",
"HeartRate",
"count/min",
2019-07-29 15:28:48 -0800,
2019-07-29 15:26:47 -0800,
2019-07-29 15:26:47 -0800,
62

Andy

Toggl Updates (to api.track.toggl.com) and FileNotFound Error

Update TogglPy.py and downloader.ipynb to https://api.track.toggl.com/api/v8/ resp https://api.track.toggl.com/api/v2/

FileNotFound Error in: get_detailed_reports(workspace_id, since, until)
It doesn't make a difference if I create the corosponding empty file in advance or if there's no file.
FIXED: use full path instead of ~/

import dashboard to Kibana issue

Hi @markwk nice work! I have an issue with import apple_health/apple_health_elastic_dashboard to Kibana:

There is any error on the dev console.
Kibana version 6.8.15
Elastic: 7.12.0

Using iOS 16.6 health data extraction does not work

Having the discussed problems with iOS 16.2 in mind and your ideas of a work around, I’ve tried the apple_health-extractor.ipynb.

During reading-/ parsing-procedure you get the error message „Unexpected node of type Correlation" after a few seconds.

None of the following notebooks is working. This error is the show stopper.

apple_health_data2elastic NOT WORKING :(

Hi everyone, I'd appreciate your help. I tried to run this code in Jupyter Lab, and it is not working. The main issues are in cell [18]

# Create Customized Index Mappings     
es.indices.put_mapping(index=INDEX, doc_type=TYPE, body=d, include_type_name=True)

Apparently, elasticsearch 8.1 made some changes and there are issues with the arguments in put_mapping.
I uninstalled 8.1 and installed elastic 7.1, and now there is a ConnectionError.

Dear Mark, could you please update the file? Thank you so much!

apple health: parse error after upgrading to iOS 16

Upgrading to iOS 16 apparently also updated the HealthKit protocol:
HealthKit Export Version: 11 to HealthKit Export Version: 12

Parsing export.xml now throughs a parse error:

File "/.../qs_ledger/apple_health/apple-health-data-parser.py", line 118, in __init__
    self.data = ElementTree.parse(f)
File "/home/usr/.pyenv/versions/3.10.4/lib/python3.10/xml/etree/ElementTree.py", line 1229, in parse
    tree.parse(source, parser)
File "/home/usr/.pyenv/versions/3.10.4/lib/python3.10/xml/etree/ElementTree.py", line 580, in parse
    self._root = parser._parse_whole(source)
**xml.etree.ElementTree.ParseError: syntax error: line 156, column 0**

The problem has already been reported and seems to be ignored by Apple:
problem with import of XML Apple HealthKit Export Version: 12
Any thoughts on the suggested workaround with "patch.txt"?

Apple Health Extractor not working?

Hi!

Might be a mistake I'm making but the extractor doesnt seem to be working. I keep having

FileNotFoundError Traceback (most recent call last)

Whenever i try to run the extractor.

Not sure where I'm going wrong?

I did note that there was a note earlier on in the file that says

NOTE: Currently there are a few minror errors based on additional data from Apple Health that require some updates.

Any idea where I'm going wrong? I'm a bit of a noob so could be user error!

thanks!

Tom