Giter VIP home page Giter VIP logo

csv-component-v3's Introduction

CircleCI

CSV Component

Table of Contents

Description

A component to read and write Comma Separated Values (CSV) files.

How works

The component can read the CSV file from a remote URL or from the message attachment. It can also write a CSV file from the incoming events.

Requirements

Environment variables

Name Mandatory Description Values
EIO_REQUIRED_RAM_MB false Value of allocated memory to component Recommended: 512/1024
REQUEST_TIMEOUT false HTTP request timeout in milliseconds Default value: 10000
REQUEST_RETRY_DELAY false Delay between retry attempts in milliseconds Default value: 7000
REQUEST_MAX_RETRY false Number of HTTP request retry attempts Default value: 7
REQUEST_MAX_CONTENT_LENGTH false Max size of http request in bytes Default value: 10485760
TIMEOUT_BETWEEN_EVENTS false Number of milliseconds write action wait before creating separate attachments Default value: 10000

Credentials

The component does not require credentials to function.

Actions

Read CSV attachment

This action will read the CSV attachment of the incoming message or from the specified URL and output a JSON object. To configure this action the following fields can be used:

Config Fields

  • Emit Behavior (dropdown, required) - this selector configures output behavior of the component.
    • Fetch All - the component emits an array of messages;
    • Emit Individually - the component emits a message per row;
    • Emit Batch - component will produce a series of message where each message has an array of max length equal to the Batch Size;
  • Skip empty lines (checkbox, optional) - by default, empty lines are parsed if checked they will be skipped
  • Comment char (string, optional) - if specified, skips lines starting with this string

Input Metadata

  • URL - We will fetch this URL and parse it as CSV file

  • Contains headers - if true, the first row of parsed data will be interpreted as field names, false by default.

  • Delimiter - The delimiting character. Leave blank to auto-detect from a list of most common delimiters or provide your own

    Example if you use "$" as Delimiter, this CSV:
    a$b$c$d
    

    can be parsed into this JSON

    {
     "column0": "a",
     "column1": "b",
     "column2": "c",
     "column3": "d"
    }
  • Convert Data types - numeric data and boolean data will be converted to their type instead of remaining strings, false by default. If Emit Behavior equals to Emit Batch - new field appears: Batch Size - max length of array for each message

Output Metadata

  • For Fetch page and Emit Batch: An object with key result that has an array as its value
  • For Emit Individually: Each object fill the entire message

Limitations

  • If you use Fetch All then component needs to store whole file and object in memory that cause big memory usage
  • In Emit Batch use wisely Batch Size, bigger number cause bigger memory usage
  • Possible exception: [ERR_STREAM_PREMATURE_CLOSE] could be thrown when flow stopped before finish emiting all data in file, as stream stopped

Create CSV From Message Stream

This action will combine multiple incoming events into a CSV file until there is a gap of more than 10 seconds between events. Afterwards, the CSV file will be closed and attached to the outgoing message.

Config Fields

  • Upload CSV as file to attachments (checkbox, optional) - If checked store the generated CSV data as an attachment. If unchecked, place the CSV as a string in the outbound message.

  • Separator (string, optional) - A single char used to delimit the CSV file. Default to "," but you can set any

    Example if you use "$" as Delimiter, this CSV:
    a$b$c$d
    

    can be parsed into this JSON

    {
     "column0": "a",
     "column1": "b",
     "column2": "c",
     "column3": "d"
    }
  • Column Order (string, optional) - A string delimited with the separator indicating which columns & in what order the columns should appear in the resulting file. If omitted, the column order in the resulting file will not be deterministic. Columns names will be trimmed (removed spaces in beginning and end of column name, for example: 'col 1,col 2 ,col 3, col 4' => ['col 1', 'col 2', 'col 3', 'col 4'])

  • New line delimiter (string, optional, defaults to \r\n) - The character used to determine newline sequence.

  • Escape formulae (checkbox, optional) - If checked, field values that begin with =, +, -, @, \t, or \r, will be prepended with a ` to defend against injection attacks, because Excel and LibreOffice will automatically parse such cells as formulae

Input Metadata

  • Include Headers - Indicates if a header row should be included in the generated file.
  • Input Object - Object to be written as a row in the CSV file. If the Column Order is specified, then individual properties can be specified.

Output Metadata

  • If Upload CSV as file to attachments is checked:

    • csvString - The output CSV as a string inline in the body
  • If Upload CSV as file to attachments is not checked:

    • attachmentUrl - A URL to the CSV output
    • type - Always set to .csv
    • size - Size in bytes of the resulting CSV file
    • attachmentCreationTime - When the attachment was generated
    • attachmentExpiryTime - When the attachment is set to expire
    • contentType - Always set to text/csv

Create CSV From JSON Array

This action will convert an incoming array into a CSV file

Config Fields

  • Upload CSV as file to attachments (checkbox, optional) - If checked store the generated CSV data as an attachment. If unchecked, place the CSV as a string in the outbound message.

  • Separator (string, optional) - A single char used to delimit the CSV file. Default to "," but you can set any

    Example default:
    a,b,c,d
    

    using ";" as separator:

    a;b;c;d
    
  • Column Order (string, optional) - A string delimited with the separator indicating which columns & in what order the columns should appear in the resulting file. If omitted, the column order in the resulting file will not be deterministic. Columns names will be trimmed (removed spaces in beginning and end of column name, for example: 'col 1,col 2 ,col 3, col 4' => ['col 1', 'col 2', 'col 3', 'col 4'])

  • New line delimiter (string, optional, defaults to \r\n) - The character used to determine newline sequence.

  • Escape formulae (checkbox, optional) - If checked, field values that begin with =, +, -, @, \t, or \r, will be prepended with a ` to defend against injection attacks, because Excel and LibreOffice will automatically parse such cells as formulae

Input Metadata

  • Include Headers - Indicates if a header row should be included in the generated file.
  • Input Array - Array of objects to be written as rows in the CSV file. (One row per object + headers) If the Column Order is specified, then individual properties can be specified. The component will throw an error when the array is empty.

Output Metadata

  • If Upload CSV as file to attachments is checked:

    • csvString - The output CSV as a string inline in the body
  • If Upload CSV as file to attachments is not checked:

    • attachmentUrl - A URL to the CSV output
    • type - Always set to .csv
    • size - Size in bytes of the resulting CSV file
    • attachmentCreationTime - When the attachment was generated
    • attachmentExpiryTime - When the attachment is set to expire
    • contentType - Always set to text/csv

Triggers

Read CSV file from URL

This trigger read the CSV file from the URL provided in the configuration fields and output the result as a JSON object. The trigger works pretty much the same as the Read CSV attachment action. The difference is that all the settings are to be provided in the configuration fields, not in the body message. As the triggers do not have input messages.

Config Fields

  • Emit Behavior (dropdown, required) - this selector configures output behavior of the component.
    • Fetch All - the component emits an array of messages;
    • Emit Individually - the component emits a message per row;
    • Emit Batch - component will produce a series of message where each message has an array of max length equal to the Batch Size;
  • Skip empty lines (checkbox, optional) - by default, empty lines are parsed if checked they will be skipped
  • Comment char (string, optional) - if specified, skips lines starting with this string

Input Metadata

  • URL (string, required) - URL of the CSV file to parse

  • Contains headers (boolean, optional) - If true, the first row of parsed data will be interpreted as field names, false by default.

  • Delimiter (string, optional) - The delimiting character. Leave blank to auto-detect from a list of most common delimiters or provide your own

    Example if you use "$" as Delimiter, this CSV:
    a$b$c$d
    

    can be parsed into this JSON

    {
     "column0": "a",
     "column1": "b",
     "column2": "c",
     "column3": "d"
    }
  • Convert Data types (boolean, optional) - Numeric data and boolean data will be converted to their type instead of remaining strings, false by default.

Output Metadata

  • For Fetch page and Emit Batch: An object with key result that has an array as its value
  • For Emit Individually: Each object fill the entire message

Limitations

General

  • You may get Component run out of memory and terminated. error during run-time, that means that component needs more memory, please add EIO_REQUIRED_RAM_MB environment variable with an appropriate value (e.g. value 1024 means that 1024 MB will be allocated) for the component in this case.
  • Maximal possible size for an attachment is 10 MB.
  • Attachments mechanism does not work with Local Agent Installation
  • Inbound message in Message Stream and each element of JSON Array should be a plain Object, if value not a primitive type it will be set as [object Object]

csv-component-v3's People

Contributors

a3a3e1 avatar denyshld avatar emptyinfinity avatar if0s avatar jhorbulyk avatar khanzadyan avatar kirill-levitskiy avatar nazar910 avatar pnedelko avatar shkarupanick avatar shulkaolka avatar stas-fomenko avatar uaarsen avatar umkaline avatar zubairov avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csv-component-v3's Issues

Failed to create CSV file with 20000 rows

Below is the issue elasticio/csv-component#47
Need to check if it is still a bug


I've build a simple integration flow:

image

then using simple load-test tool loaded 2000 request to it:

#!/bin/bash
loadtest -c 5 -n 20000 "https://in.elastic.io/hook/5de5199b20ea507fdaf1a542?message=Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum"

It run all well:

[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Max requests:        20000
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Concurrency level:   5
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Agent:               none
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Completed requests:  20000
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Total errors:        0
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Total time:          837.860013091 s
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Requests per second: 24
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Mean latency:        209.4 ms
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO Percentage of the requests served within a certain time
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO   50%      201 ms
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO   90%      213 ms
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO   95%      235 ms
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO   99%      400 ms
[Mon Dec 02 2019 15:34:23 GMT+0100 (Central European Standard Time)] INFO  100%      723 ms (longest request)

at the end I expected a file to be uploaded to SFTP to have 20.000 rows in it:

[2019-12-02 14:34:33.287]: The resulting CSV file contains 20000 rows
[2019-12-02 14:34:33.287]: Closing the stream due to inactivity
[2019-12-02 14:34:33.290]: Emitting message {"id":"f705ff7c-b18a-4e91-83ac-5a879745d92b","attachments":{"f705ff7c-b18a-4e91-83ac-5a879745d92b.csv":{"content-type":"text/csv","url":"http://steward-service.platform.svc.cluster.local:8200/files/0375788e-5fcd-4931-8208-811c28d1ac3d"}},"body":{"rowCount":20000},"headers":{},"metadata":{}}

unfortunately resulting file was only 1 MB large and had 1795 rows in it, not 20000

Attaching result and log of the execution as ZIP file.

Archive.zip

Add Trigger 'Read CSV from URL'

We are about to deprecate the CSV component
To make customers' life easier we should add this trigger to the new component. As that component has it, so should this one.

As the base for this trigger an existing action Read CSV attachment can be used. The URL should be moved from the input metadata to the config fields

Verify that the CSV component can read files of 1GB with default memory config

Other Feature Request

Description

Currently, the max supported attachment size is 100MB (by the platform). It would be nice if the component can handle files of up to 1GB in size without the need to increase the allocated component memory beyond 250 MB. Specifically, we should check the read action when the emit mode is either emit individually or emit batch (for reasonably small batch sizes).

Definition of Done

One of:

  • The component can manipulate files of 1 GB without needing to alter the component code or the available memory (only change the max file size env var.)
  • We make the needed code changes so that the component can manipulate files of 1 GB without needing to increase the available component memory
  • We identify a reason why files of 1 GB can not be manipulated without increasing the component memory.

Emit batch strategy does not render on the UI and does not work in the code as well

Component Bug Report

Description

Description from the Infobip team:
Hi Team,
we are trying to use the Emit Batch configuration for the CSV component but it doesn't have a way to input the batch size and it sends a null to the next step, can you advise?

Component Version

3.1.2

Steps to Reproduce

Actual Result

[What actually happened]

Expected Result

[What you expected to happen]

Workaround(s)

[Any known workaround(s) to circumvent the above issue]

Allow Read CSV to emit rows in fixed block sizes

Other Feature Request

Description

Currently, the Read CSV Attachment action has two options for the Emit Behavior config field: Fetch All & Emit Individually. It currently lacks a Emit Batch option like the option that is present in the MongoDB component.

Definition of Done

Ideally, there should be a third option: Emit in Batches. When this option is selected, a new input should be added to the metadata to store Batch Size. Then the component should produce a series of message where each message has an array of max length equal to the Batch Size. This behavior should be similar to the MongoDB component.

`Batch Size` field in metadata

Internal Issue/Enhancement

Description

If selected Read CSV attachment action with Emit Behavior - Emit Batch there is no Batch Size field in input metadata

Definition of Done

Batch Size field in input metadata when selected Emit Behavior - Emit Batch

Improve the component

Other Feature Request

Description

We got feedback from the poss client:
"Partially very limited connectors - e.g. CSV: hardly any setting options for date, separator formats, position coded text formats"

Definition of Done

Define new additional actions applicable to the component.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.