Giter VIP home page Giter VIP logo

target-snowflake's Introduction

Target Snowflake

A Singer Snowflake target, for use with Singer streams generated by Singer taps.

Snowflake Connector

Docs

Install

pip install target-snowflake

Usage

  1. Follow the Singer.io Best Practices for setting up separate tap and target virtualenvs to avoid version conflicts.

  2. Create a config file at ~/singer.io/target_snowflake_config.json with Snowflake connection information and target Snowflake schema and warehouse.

    {
      "snowflake_account": "https://XXXXX.snowflakecomputing.com",
      "snowflake_username": "myuser",
      "snowflake_role": "myrole",
      "snowflake_password": "1234",
      "snowflake_database": "my_analytics",
      "snowflake_schema": "mytapname",
      "snowflake_warehouse": "dw"
    }

1. Run `target-snowfkajke` against a [Singer](https://singer.io) tap.

 ```bash
 ~/.virtualenvs/tap-something/bin/tap-something \
   | ~/.virtualenvs/target-snowflake/bin/target-snowflake \
     --config ~/singer.io/target_snowflake_config.json >> state.json

If you are running windows, the following is equivalent:

venvs\tap-exchangeratesapi\Scripts\tap-exchangeratesapi.exe | ^
venvs\target-snowflake\Scripts\target-snowlfake.exe ^
--config target_snowflake_config.json

Config.json

The fields available to be specified in the config file are specified here.

Field Type Default Details
snowflake_account ["string"] N/A ACCOUNT might require the region and cloud platform where your account is located, in the form of: <your_account_name>.<region_id>.<cloud> (e.g. xy12345.east-us-2.azure) Refer to Snowflake's documentation about Account
snowflake_username ["string"] N/A
snowflake_password ["string", "null"] null
snowflake_role ["string"] null If not specified, Snowflake will use the user's default role.
snowflake_database ["string"] N/A
snowflake_authenticator ["string"] "snowflake" Speifies the authentication provider for snowflake to use. Valud options are the internal one ("snowflake"), a browser session ("externalbrowser"), or Okta ("https://<your_okta_account_name>.okta.com"). See the snowflake docs for more details.
snowflake_schema ["string", "null"] "PUBLIC"
snowflake_warehouse ["string"] N/A
invalid_records_detect ["boolean", "null"] true Include false in your config to disable crashing on invalid records
invalid_records_threshold ["integer", "null"] 0 Include a positive value n in your config to allow at most n invalid records per stream before giving up.
disable_collection ["string", "null"] false Include true in your config to disable Singer Usage Logging.
logging_level ["string", "null"] "INFO" The level for logging. Set to DEBUG to get things like queries executed, timing of those queries, etc. See Python's Logger Levels for information about valid values.
persist_empty_tables ["boolean", "null"] False Whether the Target should create tables which have no records present in Remote.
max_batch_rows ["integer", "null"] 200000 The maximum number of rows to buffer in memory before writing to the destination table in Postgres
max_buffer_size ["integer", "null"] 104857600 (100MB in bytes) The maximum number of bytes to buffer in memory before writing to the destination table in Postgres
max_buffer_seconds ["integer", "null"] 900 (15 minutes in seconds) The maximum number of seconds to buffer in memory before writing to the destination table in Postgres
batch_detection_threshold ["integer", "null"] 5000, or 1/40th max_batch_rows How often, in rows received, to count the buffered rows and bytes to check if a flush is necessary. There's a slight performance penalty to checking the buffered records count or bytesize, so this controls how often this is polled in order to mitigate the penalty. This value is usually not necessary to set as the default is dynamically adjusted to check reasonably often.
batch_force_flush ["boolean", "null"] False Whether all buffered data should be force flushed every batch_detection_threshold, effectively making that a global cap below max_batch_rows. The reason for doing this is that smaller schemas from earlier in the stream that never exceed the batch size and stop getting new records can completely block state emission for larger schemas that come after. Setting this forces everything that's buffered to be flushed and unblock state emission.
state_support ["boolean", "null"] True Whether the Target should emit STATE messages to stdout for further consumption. In this mode, which is on by default, STATE messages are buffered in memory until all the records that occurred before them are flushed according to the batch flushing schedule the target is configured with.
target_s3 ["object", "null"] N/A When included, use S3 to stage files. See S3 below

S3 Config.json

Field Type Default Details
aws_access_key_id ["string"] N/A
aws_secret_access_key ["string"] N/A
bucket ["string"] N/A Bucket where staging files should be uploaded to.
key_prefix ["string", "null"] "" Prefix for staging file uploads to allow for better delineation of tmp files

Limitations

  • Snowflake SQL Identifiers:
    • Although Snowflake supports quoted identifiers to have non-alphanumeric values, target-snowflake limits identifiers to uppercase alphanumerics, and underscores
    • This is done to make querability/useability in Snowflake simpler, so as to not require users to have to use sometimes cumbersome quotes to query their data
  • Requires a JSON Schema for every stream.
  • Only string, string with date-time format, integer, number, boolean, object, and array types with or without null are supported. Arrays can have any of the other types listed, including objects as types within items.
    • Example of JSON Schema types that work
      • ['number']
      • ['string']
      • ['string', 'null']
      • ['string', 'integer']
      • ['integer', 'number']
    • Exmaple of JSON Schema types that DO NOT work
      • ['any']
      • ['null']
  • JSON Schema combinations such as anyOf and allOf are not supported.
  • JSON Schema $ref is partially supported:
    • NOTE: The following limitations are known to NOT fail gracefully
    • Presently you cannot have any circular or recursive $refs
    • $refs must be present within the schema:
      • URI's do not work
      • if the $ref is broken, the behaviour is considered unexpected
  • Any values which are the string \\N will be streamed to Snowflake as the literal null

target-snowflake's People

Contributors

alexandermann avatar nsmith22 avatar aaboyd avatar kborders01 avatar awm33 avatar ers81239 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.