Giter VIP home page Giter VIP logo

tap-rest-api's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tap-rest-api's Issues

`No such file or directory` found when running discovery

Running tap within Meltano. Specifically the following command meltano invoke tap-rest-api --infer_schema or meltano select --list --all tap-rest-api. I have the following meltano.yml:

version: 1
send_anonymous_usage_stats: false
elt.buffer_size: 52428800
plugins:
  extractors:
  - name: tap-rest-api
    pip_url: tap-rest-api
    namespace: tap_rest_api
    executable: tap-rest-api
    capabilities:
      - catalog
      - config
      - state
      - discover
    settings:
      - name: streams
      - name: url
      - name: catalog_dir
      - name: schema_dir
      - name: schema
      - name: auth_method
    config:
      url: http://<whatever>.com
      auth_method: no_auth
      catalog_dir: ./extract
      schema_dir: ./extract
      streams: test_stream
      schema: test_schema

Here's full text of the error. It appears to be trying to read a file that has not been created yet.

Catalog discovery failed: command ['/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/bin/tap-rest-api', '--config', '/Users/.../meltano/.meltano/run/tap-rest-api/tap.config.json', '--discover'] returned 1: INFO Loading Schemas
INFO Loading schema for test_stream
CRITICAL [Errno 2] No such file or directory: './extract/test_stream.json'
Traceback (most recent call last):
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/bin/tap-rest-api", line 8, in <module>
    sys.exit(main())
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/singer/utils.py", line 229, in wrapped
    return fnc(*args, **kwargs)
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/__init__.py", line 188, in main
    discover(CONFIG, STREAMS)
  File "/Users.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 64, in discover
    config["schema"])
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 54, in _discover_schemas
    stream)})
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 39, in load_discovered_schema
    schema = load_schema(schema_dir, stream.tap_stream_id)
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 33, in load_schema
    schema = utils.load_json(os.path.join(schema_dir, "{}.json".format(entity)))
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/singer/utils.py", line 108, in load_json
    with open(path) as fil:
FileNotFoundError: [Errno 2] No such file or directory: './extract/test_stream.json'

One concern is that the command meltano seems to be generating seems to be using discover instead of infer_schema. So maybe this is a bug in meltano or just demonstrating incompatibility with Meltano?

Use digest for dup check instead of raw record

tap-rest-api keeps a copy of the last extracted record in the bookmark (aka state) together with the last recorded index or timestamp. The extra information is used to ignore the same record in the next run with the bookmark whose start index/time is inclusive.

However, It is not a good practice to include a raw record in the bookmark items mainly for security reasons.

So we should

  1. Sort the last record (except the extraction time key), by keys
  2. create a md5 digest
  3. include the digest instead of raw record in the bookmark.

To ensure backward compatibility, the dup check is made for both raw record and the digest.

UnboundLocalError: local variable 'streams' referenced before assignment

Using this tap within a Meltano pipeline. I'm trying to run meltano select --list --all tap-rest-api but I'm getting the following error:

Cannot list the selected attributes: Catalog discovery failed: command ['/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/bin/tap-rest-api', '--config', '/Users/.../meltano/.meltano/run/tap-rest-api/tap.config.json', '--discover'] returned 1: CRITICAL local variable 'streams' referenced before assignment
Traceback (most recent call last):
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/bin/tap-rest-api", line 8, in <module>
    sys.exit(main())
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/singer/utils.py", line 229, in wrapped
    return fnc(*args, **kwargs)
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/__init__.py", line 177, in main
    for stream in streams:
UnboundLocalError: local variable 'streams' referenced before assignment

Operating System: MacOS Catalina

Meltano.yml

version: 1
send_anonymous_usage_stats: false
elt.buffer_size: 52428800
plugins:
  extractors:
  - name: tap-rest-api
    pip_url: tap-rest-api
    namespace: tap_rest_api
    executable: tap-rest-api
    capabilities:
      - catalog
      - config
      - state
      - discover
    settings:
      - name: url
    config:
      url: http://<something>.com

Wrong params for singer.utils.backoff decorator

Python backoff module lets the code retry in a specified interval for given maximum retries when it encounters an exception such as HTTP server errors (5xx).

tap-rest-api uses singer wrapped version references as singer.utils.backoff.

But the parameters are set as though it is using the native backoff

@utils.backoff((backoff.expo, requests.exceptions.RequestException), _giveup)
@utils.ratelimit(20, 1)
def generate_request(stream_id, url, auth_method="no_auth", headers=None,
                     username=None, password=None):

https://github.com/anelendata/tap-rest-api/blob/master/tap_rest_api/helper.py#L301

This results in TypeError: catching classes that do not inherit from BaseException is not allowed error in the backoff routine:

CRITICAL catching classes that do not inherit from BaseException is not allowed
--
Traceback (most recent call last):
File "/app/workspace/proc_01/lib/python3.6/site-packages/backoff/_sync.py", line 94, in retry
ret = target(*args, **kwargs)
File "/app/workspace/proc_01/lib/python3.6/site-packages/singer/utils.py", line 95, in wrapper
return func(*args, **kwargs)
File "/app/workspace/proc_01/lib/python3.6/site-packages/tap_rest_api/helper.py", line 321, in generate_request
resp.raise_for_status()
File "/app/workspace/proc_01/lib/python3.6/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://xxxx
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/workspace/proc_01/bin/tap-rest-api", line 8, in <module>
sys.exit(main())
File "/app/workspace/proc_01/lib/python3.6/site-packages/singer/utils.py", line 229, in wrapped
return fnc(*args, **kwargs)
File "/app/workspace/proc_01/lib/python3.6/site-packages/tap_rest_api/__init__.py", line 191, in main
auth_method, raw=args.raw, filter_by_schema=filter_by_schema)
File "/app/workspace/proc_01/lib/python3.6/site-packages/tap_rest_api/sync.py", line 211, in sync
raise e
File "/app/workspace/proc_01/lib/python3.6/site-packages/tap_rest_api/sync.py", line 208, in sync
filter_by_schema=filter_by_schema)
File "/app/workspace/proc_01/lib/python3.6/site-packages/tap_rest_api/sync.py", line 98, in sync_rows
config.get("password"))
File "/app/workspace/proc_01/lib/python3.6/site-packages/backoff/_sync.py", line 95, in retry
except exception as e:
TypeError: catching classes that do not inherit from BaseException is not allowed

Infer schema mode produces null record that causes "CRITICAL list index out of range" error

--infer_schema mode produces a null type in JSON schema:

.....
"tz": {
    "type": [
        "null"
    ]
},....

and it causes the sync to crash with

...  File "/home/danyel/.virtualenvs/tap-rest-api/lib/python3.7/site-packages/getschema/impl.py", line 300, in fix_type
    on_invalid_property)
  File "/home/danyel/.virtualenvs/tap-rest-api/lib/python3.7/site-packages/getschema/impl.py", line 283, in fix_type
    obj_type = obj_type[1]
IndexError: list index out of range

The sync runs fine when the schema is manually fixed to have non-null:

.....
"tz": {
    "type": [
        "null",
        "string"
    ]
},....

Invalid type on json schema causes crash

On manually editing the json schema for the example, I changed all variables of type "number" to "sting" (rather than "string"). This causes the tap to fail with message:

UnboundLocalError: local variable 'filtered' referenced before assignment

This is because on json2schema.py types are handled between lines 172 and 213, with no failsafe for invalid types.

I think the most helpful way to handle this would be adding something like the following before line 213:

else:
  raise Exception("Schema file X contains invalid type Y")

Let me know if you want me to go ahead and make that PR.

Support POST as HTTP method

This particular API I'm dealing with (aXcelerate) requires that some requests be sent as POSTs. Could the request type be set in the config?

Fully support multiple streams

Things to do:

  1. Extend the config format to accommodate multiple endpoint templates. I'm thinking to make config's url field accept both string and dictionary. (key=stream, value=url).
  2. Same for index, datetime, & timestamp keys
  3. Same for record list & record level.
  4. Loop through the stream during the infer_schema & sync (schema files are generated per stream, but catalog should be consolidated.)

OAuth refresh token support

Currently, tap-rest-api doesn't fully support OAuth. It can customize the header, so if the developer can manually obtain the token and if it does not expire, the tap still works with the APIs with OAuth. That isn't the case for most services.
OAuth usually implements a refresh token with which we can obtain a new token after the older one expires.

It's probably possible to implement a generic OAuth refresh token flow. If we can achieve this, we just need to set a refresh token in the config, then run the flow to obtain the token.

I recently implemented such flow in PyPardotSF project, and I'm hoping to reuse some of the logic from it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.