arp-g / csv2sql Goto Github PK
View Code? Open in Web Editor NEWA blazing fast fully-automated CSV to database importer
License: MIT License
A blazing fast fully-automated CSV to database importer
License: MIT License
%MyXQL.Error{connection_id: 186, message: "(1153) (ER_NET_PACKET_TOO_LARGE) Got a packet bigger than 'max_allowed_packet' bytes", mysql: %{code: 1153, name: :ER_NET_PACKET_TOO_LARGE}, statement: "INSERT INTO GUILD_PROD
.Image
(CreatedBy
,CreatedOn
,Description
,FileName
,ImageData
,ModifiedBy
,ModifiedOn
,Name
,RowTimestamp
,idImage
,idImageCategory
) VALUES (?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?,?,?)"}
While importing csvs we often come across arbitrary date or date time formats, such data is often imported as varchar
by csv2sql.
Could we have some way of specifying custom Date/DateTime formats so that such data can be imported as date/datetime instead of varchar
When the CSV header has some unsupported character like " then the app crashes.
For example, if the header is in " " then we need to remove " from the header
"Id","IsDeleted","ParentId","OwnerId","CreatedDate","CreatedById"
An option to switch between the csv file's encoding type before importing.
`AN ERROR OCCURED AND FURTHER PROCESSING WAS STOPPED:
%MyXQL.Error{connection_id: 175, message: "(1118) (ER_TOO_BIG_ROWSIZE) Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs", mysql: %{code: 1118, name: :ER_TOO_BIG_ROWSIZE}, statement: "CREATE TABLE accur_recruiting.Contact (Id
VARCHAR(100), IsDeleted
BIT, MasterRecordId
VARCHAR(100), AccountId
VARCHAR(100), Salutation
VARCHAR(100), FirstName
VARCHAR(100), LastName
VARCHAR(100), RecordTypeId
VARCHAR(100), OtherStreet
VARCHAR(100), OtherCity
VARCHAR(100), OtherState
VARCHAR(100), OtherPostalCode
VARCHAR(100), OtherCountry
VARCHAR(100), OtherLatitude
VARCHAR(100), OtherLongitude
VARCHAR(100), OtherGeocodeAccuracy
VARCHAR(100), MailingStreet
VARCHAR(100), MailingCity
VARCHAR(100), MailingState
VARCHAR(100), MailingPostalCode
VARCHAR(100), MailingCountry
VARCHAR(100), MailingLatitude
VARCHAR(100), MailingLongitude
VARCHAR(100), MailingGeocodeAccuracy
VARCHAR(100), Phone
VARCHAR(100), Fax
VARCHAR(100), MobilePhone
VARCHAR(100), HomePhone
VARCHAR(100), OtherPhone
VARCHAR(100), AssistantPhone
VARCHAR(100), ReportsToId
VARCHAR(100), Email
VARCHAR(100), Title
TEXT, Department
VARCHAR(100), AssistantName
VARCHAR(100), LeadSource
VARCHAR(100), Birthdate
VARCHAR(100), Description
TEXT, OwnerId
VARCHAR(100), HasOptedOutOfEmail
BIT, HasOptedOutOfFax
BIT, DoNotCall
BIT, CreatedDate
VARCHAR(100), CreatedById
VARCHAR(100), LastModifiedDate
VARCHAR(100), LastModifiedById
VARCHAR(100), SystemModstamp
VARCHAR(100), LastActivityDate
VARCHAR(100), LastCURequestDate
VARCHAR(100), LastCUUpdateDate
VARCHAR(100), EmailBouncedReason
TEXT, EmailBouncedDate
VARCHAR(100), Jigsaw
VARCHAR(100), JigsawContactId
VARCHAR(100), IndividualId
VARCHAR(100), ts2__Picture_Id__c
VARCHAR(100), Legacy_ID__c
VARCHAR(100), Legacy_File_ID__c
VARCHAR(100), Legacy_File_Resume_ID__c
VARCHAR(100), ts2__Hide_Photo__c
BIT, ts2__Legacy_ContactID__c
VARCHAR(100), ts2__Legacy_DocumentID__c
VARCHAR(100), ts2__ReferrerEmailMessage__c
VARCHAR(100), ts2__ReferrerEmailSubject__c
VARCHAR(100), ts2__SessionKey__c
VARCHAR(100), ts2__Education_School_1__c
VARCHAR(100), ts2__Education_School_2__c
VARCHAR(100), ts2__JobNotificationsSummary__c
BIT, ts2__EEO_Disabled__c
VARCHAR(100), ts2__EEO_Gender__c
VARCHAR(100), ts2__EEO_Race__c
VARCHAR(100), ts2__EEO_Veteran_Status__c
VARCHAR(100), ts2__Referral_Lookup__c
VARCHAR(100), ts2__Verified_Key__c
VARCHAR(100), ts2__Verified__c
BIT, ts2__Geo_Location_Passed__c
BIT, ts2__Latitude__c
DOUBLE, ts2__Longitude__c
DOUBLE, ts2__Resume_Last_Updated__c
VARCHAR(100), ts2__Verified_Date__c
VARCHAR(100), test__c
VARCHAR(100), accur_candidates_industries__c
VARCHAR(100), accur_candidates_functions__c
VARCHAR(100), accur_candidates_linkedin_viadeo__c
TEXT, accur_candidates_facebook__c
VARCHAR(100), accur_candidates_web__c
TEXT, accur_candidates_Travel_Retail_Duty_Free__c
BIT, accur_candidates_Salary_Range__c
VARCHAR(100), accur_candidates_language_list__c
VARCHAR(100), accur_candidates_Excel_Level__c
VARCHAR(100), accur_candidates_Traveling_ability__c
VARCHAR(100), accur_candidates_US_Work_Authorization__c
VARCHAR(100), accur_candidates_Video_Resume__c
VARCHAR(100), accur_currency__c
VARCHAR(100), accur_CV__c
INT, accur_Industry_1__c
VARCHAR(100), accur_Industry_2__c
VARCHAR(100), accur_Industry_3__c
VARCHAR(100), accur_Function_1__c
VARCHAR(100), accur_Function_2__c
VARCHAR(100), accur_Function_3__c
VARCHAR(100), Salary_Range_2__c
VARCHAR(100), accur_Recommended_by__c
TEXT, accur_Relocation__c
TEXT, accur_Email_Pro__c
VARCHAR(100), Percentile_Rankings__c
VARCHAR(100), Skype_username__c
VARCHAR(100), Contact__c
VARCHAR(100), Candidate_Contact_Profile__c
VARCHAR(100), Excel_Test_Result__c
VARCHAR(100), General_Comments__c
VARCHAR(100), Scenario_Date__c
VARCHAR(100), TheDecisionMakers_co_uk__c
VARCHAR(100), Excel_Test_Date__c
VARCHAR(100), Direct_Office_Line__c
VARCHAR(100), Office_Mobile__c
VARCHAR(100), Company_Extension__c
VARCHAR(100), Internal_ITW__c
BIT, ts2__Facebook_Profile__c
VARCHAR(100), ts2__LUID__c
VARCHAR(100), ts2__LinkedIn_Profile__c
TEXT, ts2__MailingCountryText__c
VARCHAR(100), ts2_" <> ...}
If an error is encountered during the insertion of data we should not fail the entire process.
Ideally, we should localize the error only for the file, or even better we can continue with that file after logging the error.
If a file logs more than X errors we can choose to stop processing the file.
Errors can be written to some error log file which we can save in the user's CSV directory.
The CSV directory can have a results folder with the schema SQL file and any error logs.
On the processing end, we can inform the users that errors were encountered in the following files and point then to the log file for details.
Changes required here:
In case of errors during schema inference that particular file can be skipped.
Else it errors if the number of files is too large and it is establishing a large number of connections.
The order of CSV headers and database columns is not maintained
This would be a fantastic tool dockerized. If I knew more about creating dockers I would do it. But its just a suggestion.
When running the application sometimes MySQL gives an error like:
%MyXQL.Error{connection_id: 9, message: "(1067) (ER_INVALID_DEFAULT) Invalid default value for 'DateModified'",
mysql: %{code: 1067, name: :ER_INVALID_DEFAULT}, statement: "CREATE TABLE csvsql_test.ActivityStream (`ItemId` VARCHAR(100),
`ItemVersion` BIT, `_ItemTimestamp` VARCHAR(100), `ActivityStreamType` INT, `OriginalValue` TEXT, `NewValue` VARCHAR(100),
`CreatorId` VARCHAR(100), `DateCreated` TIMESTAMP, `ModifierId` VARCHAR(100), `DateModified` TIMESTAMP, `OwnerId` VARCHAR(100));"}
Surprisingly this error disappears most of the time, and the app works fine when stopped and started again just after the error occurs.
As evident from the error message, it has something to do with default values for timestamp columns.
I suspect setting some MySQL modes might have something to do with this error. However, I am not sure, since this error occurs randomly even when testing with the same CSV files.
SET GLOBAL SQL_MODE="NO_BACKSLASH_ESCAPES,NO_ENGINE_SUBSTITUTION,NO_ZERO_IN_DATE";
MySQL docs, regarding the NO_ZERO_IN_DATE mode.
The release for csv2sql is currently built on ubuntu 20.4, when a user attempted to use it on ubuntu 18.0 it gave an error like
../csv2sqk_web/erts-12.0.3/bin/beam.smp: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory`
It seems the shared library libtinfo.so.6
might not be available on ubuntu-18.0.
Elixir release won’t just depend on your OS flavor. Instead, an Elixir release depends on your processor architecture and C library version (glibc package). This is because there are still system dependencies in place even though Erlang bytecode is platform-independent.
Ideally, we want to support different operating systems like Linux, windows, and mac and the release should run independently of the user's processor architecture.
Till now I have found two ways to achieve this:
Use docker to build the release for different platforms and operating systems - We can have a docker setup that builds the release for all the different operating systems that we want to target, however, I am not sure if this can bypass the Target architecture
limitation, also I am not sure how it will work out for windows-based docker containers.
The second approach is using something like buritto which uses bakeware
Mix release docs about release requirements
This discussion on elixirforum
This blog
Hey,
Loving this project. It's an absolute gem. I can see in your readme that you have plans to add support for postgres and can see a branch for it that looks promising. Just wondering if there is an ETA on postgres support as this is exactly what I need to import time-series CSV from S3 buckets into TimescaleDB?
Keep up the top work!
When a CSV has text data which exceeds the length of TEXT that is 65535 characters then the characters are getting cut off while inserting into DB.
Add a option which when enabled will attempt to automatically fix csv headers if they are not valid sql column names or if their are duplicate column names.
If the column name has trailing space or is an empty string then handle that and also log it
Currently, it gives an error like
MyXQL.Error{connection_id: 78, message: "(1166) (ER_WRONG_COLUMN_NAME) Incorrect column name 'Sub-Skill Areas '", mysql: %{code: 1166, name: :ER_WRONG_COLUMN_NAME},
The following rows lead to an error like
%FunctionClauseError{
args: nil,
arity: 2,
clauses: nil,
function: :fixed_integer_impl,
kind: nil,
module: Combine.Parsers.Text
}
"a080W00001U6ghZQAR","0","SCH-0317-116982","2017-03-22 11:08:42","005d0000001KYMqAAO","2017-03-22 11:08:42","005d0000001KYMqAAO","2017-03-22 11:08:42","0030W00003Kn7kTQAR","0","Master","","","�vora University","masters","","Contemporary Art History","1"
This happens due to a bad character �
in the data "�vora University"
.
It fails in the call to Timex.parse(item, pattern)
in apps/csv2sql/lib/csv2sql/schema_maker.ex
line 209
and 213
when we switch from tabs the progress get lost
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.