arp-g / csv2sql Goto Github PK

View Code? Open in Web Editor NEW

53.0 3.0 4.0 67.5 MB

A blazing fast fully-automated CSV to database importer

License: MIT License

Elixir 61.87% SCSS 6.77% CSS 6.18% JavaScript 1.88% HTML 23.29%

csv2sql csvs database csv-files import csv-import csv-reader csv-converter mysql postgres

csv2sql's Issues

App doesn't support csv files with BOM character at the beginning

https://en.wikipedia.org/wiki/Byte_order_mark

Broken UI with long file paths

Error while importing Packet too large

%MyXQL.Error{connection_id: 186, message: "(1153) (ER_NET_PACKET_TOO_LARGE) Got a packet bigger than 'max_allowed_packet' bytes", mysql: %{code: 1153, name: :ER_NET_PACKET_TOO_LARGE}, statement: "INSERT INTO GUILD_PROD.Image (CreatedBy,CreatedOn,Description,FileName,ImageData,ModifiedBy,ModifiedOn,Name,RowTimestamp,idImage,idImageCategory) VALUES (?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?)"}

Support additional date time formats

While importing csvs we often come across arbitrary date or date time formats, such data is often imported as varchar by csv2sql.
Could we have some way of specifying custom Date/DateTime formats so that such data can be imported as date/datetime instead of varchar

Unsupported csv character

When the CSV header has some unsupported character like " then the app crashes.

For example, if the header is in " " then we need to remove " from the header
"Id","IsDeleted","ParentId","OwnerId","CreatedDate","CreatedById"

Support utf16 and other common encodings along with utf8

An option to switch between the csv file's encoding type before importing.

MySQL error: row size to large

`AN ERROR OCCURED AND FURTHER PROCESSING WAS STOPPED:

%MyXQL.Error{connection_id: 175, message: "(1118) (ER_TOO_BIG_ROWSIZE) Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs", mysql: %{code: 1118, name: :ER_TOO_BIG_ROWSIZE}, statement: "CREATE TABLE accur_recruiting.Contact (Id VARCHAR(100), IsDeleted BIT, MasterRecordId VARCHAR(100), AccountId VARCHAR(100), Salutation VARCHAR(100), FirstName VARCHAR(100), LastName VARCHAR(100), RecordTypeId VARCHAR(100), OtherStreet VARCHAR(100), OtherCity VARCHAR(100), OtherState VARCHAR(100), OtherPostalCode VARCHAR(100), OtherCountry VARCHAR(100), OtherLatitude VARCHAR(100), OtherLongitude VARCHAR(100), OtherGeocodeAccuracy VARCHAR(100), MailingStreet VARCHAR(100), MailingCity VARCHAR(100), MailingState VARCHAR(100), MailingPostalCode VARCHAR(100), MailingCountry VARCHAR(100), MailingLatitude VARCHAR(100), MailingLongitude VARCHAR(100), MailingGeocodeAccuracy VARCHAR(100), Phone VARCHAR(100), Fax VARCHAR(100), MobilePhone VARCHAR(100), HomePhone VARCHAR(100), OtherPhone VARCHAR(100), AssistantPhone VARCHAR(100), ReportsToId VARCHAR(100), Email VARCHAR(100), Title TEXT, Department VARCHAR(100), AssistantName VARCHAR(100), LeadSource VARCHAR(100), Birthdate VARCHAR(100), Description TEXT, OwnerId VARCHAR(100), HasOptedOutOfEmail BIT, HasOptedOutOfFax BIT, DoNotCall BIT, CreatedDate VARCHAR(100), CreatedById VARCHAR(100), LastModifiedDate VARCHAR(100), LastModifiedById VARCHAR(100), SystemModstamp VARCHAR(100), LastActivityDate VARCHAR(100), LastCURequestDate VARCHAR(100), LastCUUpdateDate VARCHAR(100), EmailBouncedReason TEXT, EmailBouncedDate VARCHAR(100), Jigsaw VARCHAR(100), JigsawContactId VARCHAR(100), IndividualId VARCHAR(100), ts2__Picture_Id__c VARCHAR(100), Legacy_ID__c VARCHAR(100), Legacy_File_ID__c VARCHAR(100), Legacy_File_Resume_ID__c VARCHAR(100), ts2__Hide_Photo__c BIT, ts2__Legacy_ContactID__c VARCHAR(100), ts2__Legacy_DocumentID__c VARCHAR(100), ts2__ReferrerEmailMessage__c VARCHAR(100), ts2__ReferrerEmailSubject__c VARCHAR(100), ts2__SessionKey__c VARCHAR(100), ts2__Education_School_1__c VARCHAR(100), ts2__Education_School_2__c VARCHAR(100), ts2__JobNotificationsSummary__c BIT, ts2__EEO_Disabled__c VARCHAR(100), ts2__EEO_Gender__c VARCHAR(100), ts2__EEO_Race__c VARCHAR(100), ts2__EEO_Veteran_Status__c VARCHAR(100), ts2__Referral_Lookup__c VARCHAR(100), ts2__Verified_Key__c VARCHAR(100), ts2__Verified__c BIT, ts2__Geo_Location_Passed__c BIT, ts2__Latitude__c DOUBLE, ts2__Longitude__c DOUBLE, ts2__Resume_Last_Updated__c VARCHAR(100), ts2__Verified_Date__c VARCHAR(100), test__c VARCHAR(100), accur_candidates_industries__c VARCHAR(100), accur_candidates_functions__c VARCHAR(100), accur_candidates_linkedin_viadeo__c TEXT, accur_candidates_facebook__c VARCHAR(100), accur_candidates_web__c TEXT, accur_candidates_Travel_Retail_Duty_Free__c BIT, accur_candidates_Salary_Range__c VARCHAR(100), accur_candidates_language_list__c VARCHAR(100), accur_candidates_Excel_Level__c VARCHAR(100), accur_candidates_Traveling_ability__c VARCHAR(100), accur_candidates_US_Work_Authorization__c VARCHAR(100), accur_candidates_Video_Resume__c VARCHAR(100), accur_currency__c VARCHAR(100), accur_CV__c INT, accur_Industry_1__c VARCHAR(100), accur_Industry_2__c VARCHAR(100), accur_Industry_3__c VARCHAR(100), accur_Function_1__c VARCHAR(100), accur_Function_2__c VARCHAR(100), accur_Function_3__c VARCHAR(100), Salary_Range_2__c VARCHAR(100), accur_Recommended_by__c TEXT, accur_Relocation__c TEXT, accur_Email_Pro__c VARCHAR(100), Percentile_Rankings__c VARCHAR(100), Skype_username__c VARCHAR(100), Contact__c VARCHAR(100), Candidate_Contact_Profile__c VARCHAR(100), Excel_Test_Result__c VARCHAR(100), General_Comments__c VARCHAR(100), Scenario_Date__c VARCHAR(100), TheDecisionMakers_co_uk__c VARCHAR(100), Excel_Test_Date__c VARCHAR(100), Direct_Office_Line__c VARCHAR(100), Office_Mobile__c VARCHAR(100), Company_Extension__c VARCHAR(100), Internal_ITW__c BIT, ts2__Facebook_Profile__c VARCHAR(100), ts2__LUID__c VARCHAR(100), ts2__LinkedIn_Profile__c TEXT, ts2__MailingCountryText__c VARCHAR(100), ts2_" <> ...}

Better error handling

Ideas on error handling

If an error is encountered during the insertion of data we should not fail the entire process.

Ideally, we should localize the error only for the file, or even better we can continue with that file after logging the error.
If a file logs more than X errors we can choose to stop processing the file.

Errors can be written to some error log file which we can save in the user's CSV directory.
The CSV directory can have a results folder with the schema SQL file and any error logs.
On the processing end, we can inform the users that errors were encountered in the following files and point then to the log file for details.

Changes required here:

csv2sql/apps/csv2sql/lib/csv2sql/database/database.ex

Line 97 in 74c8fac

repo.insert_all(name, encoded_data_chunk)

In case of errors during schema inference that particular file can be skipped.

default max database workers to 20

Else it errors if the number of files is too large and it is establishing a large number of connections.

Maintain column ordering

The order of CSV headers and database columns is not maintained

Feature Request

This would be a fantastic tool dockerized. If I knew more about creating dockers I would do it. But its just a suggestion.

MySQL error when running the application for the first time.

When running the application sometimes MySQL gives an error like:

%MyXQL.Error{connection_id: 9, message: "(1067) (ER_INVALID_DEFAULT) Invalid default value for 'DateModified'", 
mysql: %{code: 1067, name: :ER_INVALID_DEFAULT}, statement: "CREATE TABLE csvsql_test.ActivityStream (`ItemId` VARCHAR(100), 
`ItemVersion` BIT, `_ItemTimestamp` VARCHAR(100), `ActivityStreamType` INT, `OriginalValue` TEXT, `NewValue` VARCHAR(100), 
`CreatorId` VARCHAR(100), `DateCreated` TIMESTAMP, `ModifierId` VARCHAR(100), `DateModified` TIMESTAMP, `OwnerId` VARCHAR(100));"}

Surprisingly this error disappears most of the time, and the app works fine when stopped and started again just after the error occurs.

As evident from the error message, it has something to do with default values for timestamp columns.

I suspect setting some MySQL modes might have something to do with this error. However, I am not sure, since this error occurs randomly even when testing with the same CSV files.

SET GLOBAL SQL_MODE="NO_BACKSLASH_ESCAPES,NO_ENGINE_SUBSTITUTION,NO_ZERO_IN_DATE";

MySQL docs, regarding the NO_ZERO_IN_DATE mode.

Cross platform elixir releases

The release for csv2sql is currently built on ubuntu 20.4, when a user attempted to use it on ubuntu 18.0 it gave an error like

../csv2sqk_web/erts-12.0.3/bin/beam.smp: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory`

It seems the shared library libtinfo.so.6 might not be available on ubuntu-18.0.

More insights

Elixir release won’t just depend on your OS flavor. Instead, an Elixir release depends on your processor architecture and C library version (glibc package). This is because there are still system dependencies in place even though Erlang bytecode is platform-independent.

Ideally, we want to support different operating systems like Linux, windows, and mac and the release should run independently of the user's processor architecture.

Till now I have found two ways to achieve this:

Use docker to build the release for different platforms and operating systems - We can have a docker setup that builds the release for all the different operating systems that we want to target, however, I am not sure if this can bypass the Target architecture limitation, also I am not sure how it will work out for windows-based docker containers.
The second approach is using something like buritto which uses bakeware

Some Helpful References I have found:

Mix release docs about release requirements
This discussion on elixirforum
This blog

Config page UI distroted

The config page UI is distorted on some devices and on zoom in.

Example od distorted UI:

Postgres support ETA?

Hey,

Loving this project. It's an absolute gem. I can see in your readme that you have plans to add support for postgres and can see a branch for it that looks promising. Just wondering if there is an ETA on postgres support as this is exactly what I need to import time-series CSV from S3 buckets into TimescaleDB?

Keep up the top work!

MyXQL.Error{connection_id: 78, message: "(1166) (ER_WRONG_COLUMN_NAME) Incorrect column name 'Sub-Skill Areas '", mysql: %{code: 1166, name: :ER_WRONG_COLUMN_NAME},

Date time parsing fails due to bad characters

The following rows lead to an error like

Error:

%FunctionClauseError{
  args: nil,
  arity: 2,
  clauses: nil,
  function: :fixed_integer_impl,
  kind: nil,
  module: Combine.Parsers.Text
}

Data:

"a080W00001U6ghZQAR","0","SCH-0317-116982","2017-03-22 11:08:42","005d0000001KYMqAAO","2017-03-22 11:08:42","005d0000001KYMqAAO","2017-03-22 11:08:42","0030W00003Kn7kTQAR","0","Master","","","�vora University","masters","","Contemporary Art History","1"

This happens due to a bad character � in the data "�vora University".
It fails in the call to Timex.parse(item, pattern) in apps/csv2sql/lib/csv2sql/schema_maker.ex line 209 and 213

Tab switch issues

when we switch from tabs the progress get lost