workday / prism-python Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 10.0 252 KB

Python client library for interacting with Workday’s Prism API.

License: Apache License 2.0

Python 100.00%

prism-python's People

Contributors

Stargazers

Watchers

Forkers

curtlh jcorbett-wday johnjdailey mwaldronii isabella232 ghas-results jojo10smith wd-mgreynolds ghas-results

prism-python's Issues

Workday CloudQuery Plugin

Hi Team, hopefully this is right place to ask, if not, I'd appreciate if you can direct me.

I'm the founder of cloudquery.io, a high performance open source ELT framework.

Our users are interested in a Workday plugin, but as we cannot maintain all the plugins ourselves, I was curious if this would be an interesting collaboration, where we would help implement an initial source plugin, and you will help maintain it.

This will give your users the ability to sync Workday data to any of their datalakes/data-warehouses/databases easily using any of the growing list of CQ destination plugins.

Best,
Yevgeny

Update v2 Python example

A quick fix, but this:

# create an empty API table with your schema
table = prism.create_table("my_new_table", schema=schema["fields"])

Should be:
table = prism.create_table(p,"my_new_table, schema=schema["fields"])

Add version to class docstring

Documentation is missing for the class attribute version.

prism-python/prism/prism.py

Line 71 in dd9a7c0

self.version = version

Extra white space

Hi, I'm thinking of contributing to this repo and I'm familiarizing myself with code. I noticed that there were a few whitespaces (found with flake8) in two of the files: _version.py and versioneer.py An example of the flake8 output for the current file has been attached. Please view the E203 whitespace before ':'

While this is a relatively small issue and the changes would not affect the performance, I would still like to submit a PR for this and will do so. If it is decided that the changes will not be merged I can move on to trying to look at the other issues.

Deprecate support for V1 of the Prism API

Version 2 is the future of the Prism API. In order to evolve prism-python to incorporate more features of V2 of the Prism API, we need to deprecate support for V1.

In effort to make prism-python as intuitive as possible, we will be renaming many of the existing functions to be better aligned with nomenclature of the Prism API V2 (e.g.: tables instead of datasets).

While these changes will be consider "breaking", we believe the benefit of increased functionality outweighs the cost of some minor refactoring after the new version is released.

Rasie exception instead of log message

When calling the function prism.upload_file(), an error can occur yet the process continues. Instead of just logging the error, we should raise an exception to break the process.

prism-python/prism/prism.py

Line 224 in df35c67

logging.warning(f"HTTP status code {r.status_code}: {r.content}")

Enable create_dataset() to work with V2 of the Prism API

Add optional parameter named fields that contains the JSON schema for the table:

https://github.com/CurtLH/prism-python/blob/b561848988e10c9f8e98ba41351e07ee755b38f3/prism/prism.py#L132

Do not hard code bucket schema

Hard coding the bucket schema makes it so that no other file configuration besides this can be used (e.g.: change delimiter, skip rows, etc.)

Instead, we should look to see if we can dynamically pick up the existing schema from the API.

prism-python/prism/prism.py

Lines 390 to 402 in 276b5ed

 # The "header" for the load schema 

 bucket_schema = { 

 "parseOptions": { 

 "fieldsDelimitedBy": ",", 

 "fieldsEnclosedBy": '"', 

 "headerLinesToIgnore": 1, 

 "charset": {"id": "Encoding=UTF-8"}, 

 "type": {"id": "Schema_File_Type=Delimited"}, 

 } 

 } 

 # The footer for the load schema 

 schema_version = {"id": "Schema_Version=1.0"}

Change logging reference from dataset to table

With the changes made in db7635b, logging message should also reference tables instead of datasets.

prism-python/prism/prism.py

Line 345 in db7635b

logging.info("Successfully obtained information about your datasets")

NameError: name 'prism_endpoint' is not defined

Within the file prism.py, the following:

prism-python/prism/prism.py

Line 265 in 790a0f8

url = prism_endpoint + "/wBuckets"

Should be changed to: url = self.prism_endpoint + "/wBuckets"

AttributeError: module 'prism' has no attribute 'upload'

Error message returned when using prism upload

Add PyTest

Add a simple unit test to make sure the package successfully imports. Unit tests can be expanded in the future to offer greater coverage.

Change name of wBucket

When creating a new wBucket, the name is created as follows:

prism-python/prism/prism.py

Line 179 in 89f7902

"name": "bucket_" + str(random.randint(100000, 999999)),

This name should be change to something like prismpython_123456. This change will enable better auditing of wBuckets created using this package.

Add documentation

Documentation should be built for this project using Sphinx. The documentation should be built and upload to gh-pages using the GitHub action Sphinx to GitHub Pages V3. To use this feature, we must enable the GitHub action to build and deploy the Sphinx documentation as described here.

Create a function to search all tables by name

Current Issue

One way of finding the table_wid of an existing table is as follows:

all_tables = p.list_table()
for table in all_tables['data']:
    if table['name'] == "my_new_table":
        print(table)
        break

This snippet of code is not only wordy but also error prone if you have more than 100 tables, as that is the maximum returned from list_tables().

Proposed Solution

I propose we create a new function that is something like find_table("table_name") that will search all of your existing tables, on all available pages of results, and if the search string is found, that table will be returned. If multiple tables contain the same search string, maybe we return them all in a list? The function should also indicate what has or has not been found through logging messages to the user.

Example Usage

# find the table
> table = p.find_table("table_name_BDS")
2020-12-07 01:07:39 INFO: Found 1 table(s) containing "table_name_BDS"

# inspect table data
> type(table)
dict

# find the table
> table = p.find_table("BDS")
2020-12-07 01:07:39 INFO: Found 10 table(s) containing "BDS"

# inspect table data
> type(table)
list

Retry failed requests

If a request fails, we should retry it before moving on. The parameters for retry (e.g.: number of attempts, back-off time, etc.) should be configurable by the user, but we should choose sensible defaults.

Check out #35636367 on StackOverflow for an example of how the Retry class can be mounted to a Requests session. For more information about the Retry class, refer to the documentation.

Support V3 of the Prism API

Remove $ from examples in README.md

The following section, among others:

$ export workday_base_url=<INSERT WORKDAY BASE URL HERE>
$ export workday_tenant_name=<INSERT WORKDAY TENANT NAME HERE>
$ export prism_client_id=<INERT PRISM CLIENT ID HERE>
$ export prism_client_secret=<INSERT PRISM CLIENT SECRET HERE>
$ export prism_refresh_token=<INSERT PRISM REFRESH TOKEN HERE>

Should be changed to:

export workday_base_url=<INSERT WORKDAY BASE URL HERE>
export workday_tenant_name=<INSERT WORKDAY TENANT NAME HERE>
export prism_client_id=<INERT PRISM CLIENT ID HERE>
export prism_client_secret=<INSERT PRISM CLIENT SECRET HERE>
export prism_refresh_token=<INSERT PRISM REFRESH TOKEN HERE>

Add verbose flag when using prism list CLI

If the verbose flag isn't passed, only print out the table name and table id. If the verbose flag is passed, print out all the details.

Use Versioneer to manager version numbers

What is Versioneer?

This is a tool for managing a recorded version number in distutils-based python projects. The goal is to remove the tedious and error-prone "update the embedded version string" step from your release process. Making a new release should be as easy as recording a new tag in your version-control system, and maybe making new tarballs.

Update CLI

With the recent changes to move to focus on tables instead of datasets, the CLI has become outdated and needs attention.

If a request is not successful, return the reason

Currently, if a request is not successful, the error code is returned. However, additional information could be returned to better identify the issue.

For example, if p.complete_bucket(bucket["id"]) returns a 400 status code, we should also return the content of the request.

> r.content
b'{"error":"invalid request: validation errors","errors":[{"error":"A different wBucket is in the Processing stage and is currently loading data to the same target table."}]}'

	# The "header" for the load schema
	bucket_schema = {
	"parseOptions": {
	"fieldsDelimitedBy": ",",
	"fieldsEnclosedBy": '"',
	"headerLinesToIgnore": 1,
	"charset": {"id": "Encoding=UTF-8"},
	"type": {"id": "Schema_File_Type=Delimited"},
	}
	}

	# The footer for the load schema
	schema_version = {"id": "Schema_Version=1.0"}

workday / prism-python Goto Github PK

prism-python's People

Contributors

Stargazers

Watchers

Forkers

prism-python's Issues

Current Issue

Proposed Solution

Example Usage

Recommend Projects

Recommend Topics

Recommend Org