Giter VIP home page Giter VIP logo

prism-python's People

Contributors

curtlh avatar jcorbett-wday avatar mwaldronii avatar wd-mgreynolds avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prism-python's Issues

Workday CloudQuery Plugin

Hi Team, hopefully this is right place to ask, if not, I'd appreciate if you can direct me.

I'm the founder of cloudquery.io, a high performance open source ELT framework.

Our users are interested in a Workday plugin, but as we cannot maintain all the plugins ourselves, I was curious if this would be an interesting collaboration, where we would help implement an initial source plugin, and you will help maintain it.

This will give your users the ability to sync Workday data to any of their datalakes/data-warehouses/databases easily using any of the growing list of CQ destination plugins.

Best,
Yevgeny

Update v2 Python example

A quick fix, but this:

# create an empty API table with your schema
table = prism.create_table("my_new_table", schema=schema["fields"])

Should be:
table = prism.create_table(p,"my_new_table, schema=schema["fields"])

Extra white space

Hi, I'm thinking of contributing to this repo and I'm familiarizing myself with code. I noticed that there were a few whitespaces (found with flake8) in two of the files: _version.py and versioneer.py An example of the flake8 output for the current file has been attached. Please view the E203 whitespace before ':'

image

While this is a relatively small issue and the changes would not affect the performance, I would still like to submit a PR for this and will do so. If it is decided that the changes will not be merged I can move on to trying to look at the other issues.

Deprecate support for V1 of the Prism API

Version 2 is the future of the Prism API. In order to evolve prism-python to incorporate more features of V2 of the Prism API, we need to deprecate support for V1.

In effort to make prism-python as intuitive as possible, we will be renaming many of the existing functions to be better aligned with nomenclature of the Prism API V2 (e.g.: tables instead of datasets).

While these changes will be consider "breaking", we believe the benefit of increased functionality outweighs the cost of some minor refactoring after the new version is released.

Do not hard code bucket schema

Hard coding the bucket schema makes it so that no other file configuration besides this can be used (e.g.: change delimiter, skip rows, etc.)

Instead, we should look to see if we can dynamically pick up the existing schema from the API.

prism-python/prism/prism.py

Lines 390 to 402 in 276b5ed

# The "header" for the load schema
bucket_schema = {
"parseOptions": {
"fieldsDelimitedBy": ",",
"fieldsEnclosedBy": '"',
"headerLinesToIgnore": 1,
"charset": {"id": "Encoding=UTF-8"},
"type": {"id": "Schema_File_Type=Delimited"},
}
}
# The footer for the load schema
schema_version = {"id": "Schema_Version=1.0"}

Add PyTest

Add a simple unit test to make sure the package successfully imports. Unit tests can be expanded in the future to offer greater coverage.

Change name of wBucket

When creating a new wBucket, the name is created as follows:

"name": "bucket_" + str(random.randint(100000, 999999)),

This name should be change to something like prismpython_123456. This change will enable better auditing of wBuckets created using this package.

Add documentation

Documentation should be built for this project using Sphinx. The documentation should be built and upload to gh-pages using the GitHub action Sphinx to GitHub Pages V3. To use this feature, we must enable the GitHub action to build and deploy the Sphinx documentation as described here.

Create a function to search all tables by name

Current Issue

One way of finding the table_wid of an existing table is as follows:

all_tables = p.list_table()
for table in all_tables['data']:
    if table['name'] == "my_new_table":
        print(table)
        break

This snippet of code is not only wordy but also error prone if you have more than 100 tables, as that is the maximum returned from list_tables().

Proposed Solution

I propose we create a new function that is something like find_table("table_name") that will search all of your existing tables, on all available pages of results, and if the search string is found, that table will be returned. If multiple tables contain the same search string, maybe we return them all in a list? The function should also indicate what has or has not been found through logging messages to the user.

Example Usage

# find the table
> table = p.find_table("table_name_BDS")
2020-12-07 01:07:39 INFO: Found 1 table(s) containing "table_name_BDS"
# inspect table data
> type(table)
dict
# find the table
> table = p.find_table("BDS")
2020-12-07 01:07:39 INFO: Found 10 table(s) containing "BDS"
# inspect table data
> type(table)
list

Retry failed requests

If a request fails, we should retry it before moving on. The parameters for retry (e.g.: number of attempts, back-off time, etc.) should be configurable by the user, but we should choose sensible defaults.

Check out #35636367 on StackOverflow for an example of how the Retry class can be mounted to a Requests session. For more information about the Retry class, refer to the documentation.

Remove $ from examples in README.md

The following section, among others:

$ export workday_base_url=<INSERT WORKDAY BASE URL HERE>
$ export workday_tenant_name=<INSERT WORKDAY TENANT NAME HERE>
$ export prism_client_id=<INERT PRISM CLIENT ID HERE>
$ export prism_client_secret=<INSERT PRISM CLIENT SECRET HERE>
$ export prism_refresh_token=<INSERT PRISM REFRESH TOKEN HERE>

Should be changed to:

export workday_base_url=<INSERT WORKDAY BASE URL HERE>
export workday_tenant_name=<INSERT WORKDAY TENANT NAME HERE>
export prism_client_id=<INERT PRISM CLIENT ID HERE>
export prism_client_secret=<INSERT PRISM CLIENT SECRET HERE>
export prism_refresh_token=<INSERT PRISM REFRESH TOKEN HERE>

Use Versioneer to manager version numbers

What is Versioneer?

This is a tool for managing a recorded version number in distutils-based python projects. The goal is to remove the tedious and error-prone "update the embedded version string" step from your release process. Making a new release should be as easy as recording a new tag in your version-control system, and maybe making new tarballs.

Update CLI

With the recent changes to move to focus on tables instead of datasets, the CLI has become outdated and needs attention.

If a request is not successful, return the reason

Currently, if a request is not successful, the error code is returned. However, additional information could be returned to better identify the issue.

For example, if p.complete_bucket(bucket["id"]) returns a 400 status code, we should also return the content of the request.

> r.content
b'{"error":"invalid request: validation errors","errors":[{"error":"A different wBucket is in the Processing stage and is currently loading data to the same target table."}]}'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.