Giter VIP home page Giter VIP logo

notion-df's Introduction

notion-df: Seamlessly Connecting Notion Database with Pandas DataFrame

Please Note: This project is currently in pre-alpha stage. The code are not appropriately documented and tested. Please report any issues you find. Thanks!

Installation

pip install notion-df

Usage

  • Before starting, please follow the instructions to create a new integration and add it to your Notion page or database.

    • We'll refer Internal Integration Token as the api_key below.
  • Pandas-flavored APIs: Just need to add two additional lines of code:

    import notion_df
    notion_df.pandas() #That's it!
    
    page_url = "paste your page url from Notion"
    api_key = "paste your api key (internal integration key)"
    
    import pandas as pd
    df = pd.read_notion(page_url, api_key=api_key)
    df.to_notion(page_url, api_key=api_key)
  • Download your Notion table as a pandas DataFrame

    import notion_df
    df = notion_df.download(notion_database_url, api_key=api_key)
    # Equivalent to: df = pd.read_notion(notion_database_url, api_key=api_key)
    df.head()
    Only downloading the first `nrows` from a database
    df = notion_df.download(notion_database_url, nrows=nrows) #e.g., 10
    What if your table has a relation column?
    df = notion_df.download(notion_database_url, 
                            resolve_relation_values=True)

    The resolve_relation_values=True will automatically resolve the linking for all the relation columns whose target can be accessed by the current notion integration.

    In details, let's say the "test" column in df is a relation column in Notion.

    1. When resolve_relation_values=False, the return results for that column will be a list of UUIDs of the target page: ['65e04f11-xxxx', 'b0ffcb4b-xxxx', ].
    2. When resolve_relation_values=True, the return results for that column will be a list of regular strings corresponding to the name column of the target pages: ['page1', 'page2', ].
  • Append a local df to a Notion database:

    import notion_df
    notion_df.upload(df, notion_database_url, title="page-title", api_key=api_key)
    # Equivalent to: df.to_notion(notion_database_url, title="page-title", api_key=api_key)
  • Upload a local df to a newly created database in a Notion page:

    import notion_df
    notion_df.upload(df, notion_page_url, title="page-title", api_key=api_key)
    # Equivalent to: df.to_notion(notion_page_url, title="page-title", api_key=api_key)
  • Tired of typing api_key=api_key each time?

    import notion_df
    notion_df.config(api_key=api_key) # Or set an environment variable `NOTION_API_KEY`
    df = notion_df.download(notion_database_url)
    notion_df.upload(df, notion_page_url, title="page-title")
    # Similarly in pandas APIs: df.to_notion(notion_page_url, title="page-title")

Development

  1. Clone the repo and install the dependencies:
    git clone [email protected]:lolipopshock/notion-df.git
    cd notion-df
    pip install -e .[dev]
  2. How to run tests?
    NOTION_API_KEY="<the-api-key>" pytest tests/
    The tests are dependent on a list of notebooks, specified by the following environment variables:
Environment Variable Description
NOTION_API_KEY The API key for your Notion integration
NOTION_ROLLUP_DF -
NOTION_FILES_DF -
NOTION_FORMULA_DF -
NOTION_RELATION_DF -
NOTION_RELATION_TARGET_DF -
NOTION_LONG_STRING_DF -
NOTION_RICH_TEXT_DF -

TODOs

  • Add tests for
    • load
    • upload
    • values.py
    • configs.py
    • base.py
  • Better class organizations/namings for *Configs and *Values

notion-df's People

Contributors

lolipopshock avatar prateekiiest avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

notion-df's Issues

Uploading pages url issue

I like this package and I am starting to use it to automate some of my note taking. I noticed if I want to add a page into database, only full database url works but not the short version database ID. For downloading, both short ID and full url works. This is a minor issue. I mentioned it just in case you want to fix it.

Thank you!
Eric

get user information from database

When I try to read a databse from notion, the column that shows the name of the mate assigned to this task is shown as a code but not the name. But the property last edited shows the name. Someone knows what is happening ?

Thanks

Additional check for title lengths

notion_client.errors.APIResponseError: body failed validation: body.properties.docket_text.title[0].text.content.length should be ≤ `2000`, instead was `2043`.

Pydantic requirement too strict

Could you relax the pydantic requirements to allow for 1.10? Note that pydantic released 2.0 with breaking changes so supporting all versions might be difficult. Thanks in advance!

KeyError: 'status'

Hello,

New to this library and very interested!

Trying to import my database into pandas but got a KeyError: 'status' error.
It seems that the status type property is not supported? Is there any workaround to avoid this?

Regards

Here is the full error log :

~/anaconda3/envs/py39_env/lib/python3.9/site-packages/notion_df/_pandas.py in read_notion(notion_url, nrows, resolve_relation_values, errors, api_key)
37 pd.DataFrame: the loaded dataframe.
38 """
---> 39 return download(
40 notion_url,
41 nrows=nrows,

~/anaconda3/envs/py39_env/lib/python3.9/site-packages/notion_df/agent.py in wrapper(*args, **kwargs)
52 api_key = _load_api_key(kwargs.pop("api_key", None))
53 client = Client(auth=api_key)
---> 54 out = func(client=client, *args, **kwargs)
55
56 if orig_client is None:

~/anaconda3/envs/py39_env/lib/python3.9/site-packages/notion_df/agent.py in download(notion_url, nrows, resolve_relation_values, errors, api_key, client)
195 client: Client = None,
196 ):
--> 197 df = download_df_from_database(
198 notion_url=notion_url,
199 nrows=nrows,

~/anaconda3/envs/py39_env/lib/python3.9/site-packages/notion_df/agent.py in download_df_from_database(notion_url, client, nrows, errors)
138 try:
139 retrieve_results = client.databases.retrieve(database_id=database_id)
--> 140 schema = DatabaseSchema.from_raw(retrieve_results["properties"])
141 except HTTPStatusError:
142 error_msg = (

~/anaconda3/envs/py39_env/lib/python3.9/site-packages/notion_df/configs.py in from_raw(cls, configs)
298 def from_raw(cls, configs: Dict) -> "DatabaseSchema":
299
--> 300 configs = {key: parse_single_config(config) for key, config in configs.items()}
301 return cls(configs)
302

~/anaconda3/envs/py39_env/lib/python3.9/site-packages/notion_df/configs.py in (.0)
298 def from_raw(cls, configs: Dict) -> "DatabaseSchema":
299
--> 300 configs = {key: parse_single_config(config) for key, config in configs.items()}
301 return cls(configs)
302

~/anaconda3/envs/py39_env/lib/python3.9/site-packages/notion_df/configs.py in parse_single_config(data)
230
231 def parse_single_config(data: Dict) -> BasePropertyConfig:
--> 232 return parse_obj_as(CONFIGS_MAPPING[data["type"]], data)
233
234

KeyError: 'status'

Error when resolve_relation_values=True

Hey!
In contrast to #26 , I can see rollups an relations when resolve_relation_values =False. When I set resolve_relation_values=True, I get the following error:

Traceback (most recent call last):
  File "C:\Users\m_thi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Beschreibung'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "a:\xxx\notion_fin_test.py", line 10, in <module>
    df = notion_df.download(page_url, resolve_relation_values=True)
  File "C:\Users\m_thi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\notion_df\agent.py", line 54, in wrapper
    out = func(client=client, *args, **kwargs)
  File "C:\Users\m_thi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\notion_df\agent.py", line 216, in download
    relation_df.notion_ids, relation_df[rel_title_col]
  File "C:\Users\m_thi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\m_thi\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'Beschreibung'

Get Page ID?

Is it possible to get the page id of a database item into the data frame? This way I can map relations without setting resolve_relation_values=True and I won't have a problem with relation values that are named the same.

Better ways for storing notion_urls

The current way of saving notion_urls in notion_df is not perfect:

df.notion_urls = pd.Series([ele["url"] for ele in database_query_results])
df.notion_ids = pd.Series([ele["id"] for ele in database_query_results])

For example, after running df.copy(), this value would disappear. Try to figure out a better way to do so.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.