Giter VIP home page Giter VIP logo

pgloader's Introduction

PGLoader

Build Status Join the chat at https://gitter.im/dimitri/pgloader Read The Docs Status

pgloader is a data loading tool for PostgreSQL, using the COPY command.

Its main advantage over just using COPY or \copy, and over using a Foreign Data Wrapper, is its transaction behaviour, where pgloader will keep a separate file of rejected data, but continue trying to copy good data in your database.

The default PostgreSQL behaviour is transactional, which means that any erroneous line in the input data (file or remote database) will stop the entire bulk load for the table.

pgloader also implements data reformatting, a typical example of that being the transformation of MySQL datestamps 0000-00-00 and 0000-00-00 00:00:00 to PostgreSQL NULL value (because our calendar never had a year zero).

Documentation

Full documentation is available online, including manual pages of all the pgloader sub-commands. Check out https://pgloader.readthedocs.io/.

$ pgloader --help
pgloader [ option ... ] SOURCE TARGET
  --help -h                       boolean  Show usage and exit.
  --version -V                    boolean  Displays pgloader version and exit.
  --quiet -q                      boolean  Be quiet
  --verbose -v                    boolean  Be verbose
  --debug -d                      boolean  Display debug level information.
  --client-min-messages           string   Filter logs seen at the console (default: "warning")
  --log-min-messages              string   Filter logs seen in the logfile (default: "notice")
  --summary -S                    string   Filename where to copy the summary
  --root-dir -D                   string   Output root directory. (default: #P"/tmp/pgloader/")
  --upgrade-config -U             boolean  Output the command(s) corresponding to .conf file for v2.x
  --list-encodings -E             boolean  List pgloader known encodings and exit.
  --logfile -L                    string   Filename where to send the logs.
  --load-lisp-file -l             string   Read user code from files
  --dry-run                       boolean  Only check database connections, don't load anything.
  --on-error-stop                 boolean  Refrain from handling errors properly.
  --no-ssl-cert-verification      boolean  Instruct OpenSSL to bypass verifying certificates.
  --context -C                    string   Command Context Variables
  --with                          string   Load options
  --set                           string   PostgreSQL options
  --field                         string   Source file fields specification
  --cast                          string   Specific cast rules
  --type                          string   Force input source type
  --encoding                      string   Source expected encoding
  --before                        string   SQL script to run before loading the data
  --after                         string   SQL script to run after loading the data
  --self-upgrade                  string   Path to pgloader newer sources
  --regress                       boolean  Drive regression testing

Usage

You can either give a command file to pgloader or run it all from the command line, see the pgloader quick start on https://pgloader.readthedocs.io for more details.

$ ./build/bin/pgloader --help
$ ./build/bin/pgloader <file.load>

For example, for a full migration from SQLite:

$ createdb newdb
$ pgloader ./test/sqlite/sqlite.db postgresql:///newdb

Or for a full migration from MySQL, including schema definition (tables, indexes, foreign keys, comments) and parallel loading of the corrected data:

$ createdb pagila
$ pgloader mysql://user@localhost/sakila postgresql:///pagila

LICENCE

pgloader is available under The PostgreSQL Licence.

INSTALL

Please see full documentation at https://pgloader.readthedocs.io/.

If you're using debian, it's already available:

$ apt-get install pgloader

If you're using docker, you can use the latest version built by the CI at each commit to the master branch:

$ docker pull ghcr.io/dimitri/pgloader:latest
$ docker run --rm -it ghcr.io/dimitri/pgloader:latest pgloader --version

pgloader's People

Contributors

alexbaretta avatar benesch avatar cbandy avatar chanmix51 avatar christophkaser avatar df7cb avatar dimitri avatar disco-stu avatar gvangool avatar jamim avatar jdufresne avatar krzysiekj avatar mtyson01 avatar pborreli avatar phoe avatar pingram3030 avatar pmatseykanets avatar pylaligand avatar rdunklau avatar richardkmichael avatar rodo avatar rudi-bruchez avatar seamusabshere avatar simonachmueller avatar smarkwell avatar svantevonerichsen6906 avatar swt30 avatar vkryukov avatar willyhakim avatar zickzackv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgloader's Issues

Python version with threads enhanced

I think that python version of pgloader could get improved in combination with gevent to make their threading more efficient and non-blocking
The python version is more suitable to get integrated in apps than lisp one, so my here is my vote to not dropped or changed in favor of lisp one.
+1

truncate cascade option

it would be nice to truncate cascade automatically when requested. unfortunately it's impossible to use truncate when there are foreign keys when that option is not available.

pgloader full stack trace OR list index out of range with bogus csv files

Hey everyone,

I am running into an error (logically) when I am reading csv files with pgloader. If there are lines with fewer columns than expected for that table, pgloader throws a "list index out of range" error, after which the whole loading process stops.
Although the error is to be expected with junk csv files, I am looking to making pgloader a bit more resillient on this, such that if it detects a line with fewer columns then expected, it just writes that line to reject.log and skips that line.

The problem is, I don't know exactly where in pgloader the error occurs. I am looking at readlines in pgloader.csvreader.CSVReader, but don't know exactly where the error occurs.
Is there a way to show the full stacktrace in pgloader logs where the "list index out of range" exception occurs? That would make looking for that line a lot easier. Or can somebody point me at the correct location. Then I could change this behaviour.
Cheers,

Dolf.

[patch] better exception handling for python <= 2.4

This patch fixes SystemExit exception being handled as a regular exception when python <= 2.4 is in use.

    diff --git a/pgloader.py b/pgloader.py
    index 494e57d..a00f93e 100755
    --- a/pgloader.py
    +++ b/pgloader.py
    @@ -771,6 +771,12 @@ def load_data():
     if __name__ == "__main__":
         try:
             ret = load_data()
    +
    +    # SystemExit inherits from Exception in python < 2.5, so we have to
    +    # handle it separately.
    +    except SystemExit, e:
    +   sys.exit(e.code)
    +        
         except Exception, e:
             from pgloader.options import DEBUG
             if DEBUG:

Copy 1 column to 2 other columns and cast it #question

Hello Dimitri,
I've been using pgloader for migration from mysql to postgresql and it helped me a lot.
This is not an issue but more of a question and I didn't know where to post is, but maybe it will be helpful for somebody else as well.

Is it possible in pgloader to copy one column from mysql to two different columns in postgresql?
For example:

column (mysql) type
time int(11)
column (psql) type
time int(11)
day date / timestamp

Sadly column time has to stay the same, but I also need to copy value from time column to day column in psql and cast it to date or timestamp.

Thank you in advance.

Parameter "work_mem" requires an integer value

Got this error when trying the csv.load example:

An unhandled error condition has been signalled:
   Database error 22023: parameter "work_mem" requires an integer value
QUERY: SET work_mem TO '12MB'

Date/time: 2014-01-21-01:49An unhandled error condition has been signalled:
                              Database error 22023: parameter "work_mem" requires an integer value
QUERY: SET work_mem TO '12MB'


Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {D9F7C51}>
0: ((LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX))
1: (SB-IMPL::CALL-WITH-SANE-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {DFFCBBD}>)
2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {DFFCBA5}>)
3: (PRINT-BACKTRACE :STREAM #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {91E2F09}> :START 0 :FROM :DEBUGGER-FRAME :COUNT 536870911 :PRINT-THREAD T :PRINT-FRAME-SOURCE NIL :METHOD-FRAME-STYLE NIL)
4: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE-TO-STREAM #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {91E2F09}>)
5: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE #<CL-POSTGRES-ERROR:DATA-EXCEPTION {DFF7A31}> :OUTPUT #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDOUT* {91E2F09}> :IF-EXISTS :APPEND :VERBOSE T)
6: (SIGNAL #<CL-POSTGRES-ERROR:DATA-EXCEPTION {DFF7A31}>)
7: (ERROR CL-POSTGRES-ERROR:DATA-EXCEPTION :CODE "22023" :MESSAGE "parameter \"work_mem\" requires an integer value" :DETAIL NIL :HINT NIL :CONTEXT NIL :POSITION NIL)
8: (CL-POSTGRES::GET-ERROR #<SB-SYS:FD-STREAM for "socket 127.0.0.1:33609, peer: 127.0.0.1:5432" {DFF24A1}>)
9: (CL-POSTGRES::LOOK-FOR-ROW #<unavailable argument>)
10: (CL-POSTGRES::SEND-QUERY #<unavailable argument> #<unavailable argument> #<unavailable argument>)
11: ((LABELS #:G508 :IN CL-POSTGRES:EXEC-QUERY))
12: (PGLOADER.PGSQL::SET-SESSION-GUCS (("client_encoding" . "latin1") ("work_mem" . "12MB") ("standard_conforming_strings" . "on")) :TRANSACTION NIL :DATABASE NIL)
13: ((LAMBDA ()))
14: (PGLOADER.PARSER:RUN-COMMANDS #P"csv.load" :START-LOGGER NIL :LOG-FILENAME NIL :LOG-MIN-MESSAGES NIL :CLIENT-MIN-MESSAGES NIL)
15: ((FLET #:CLEANUP-FUN-124 :IN PGLOADER::MAIN)) [cleanup]
16: (PGLOADER::MAIN ("../build/pgloader.exe" "-d" "csv.load"))
17: ((LAMBDA NIL :IN "/root/pgloader-master/dumper-2SKVI5f7.lisp"))
18: ((FLET #:WITHOUT-INTERRUPTS-BODY-44 :IN SAVE-LISP-AND-DIE))
19: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))

2014-01-21T01:49:03.408000-06:00 DEBUG CONNECT
2014-01-21T01:49:03.408000-06:00 DEBUG SET client_encoding TO 'latin1'
2014-01-21T01:49:03.408000-06:00 DEBUG SET work_mem TO '12MB'
2014-01-21T01:49:03.408000-06:00 FATAL We have a situation here.
2014-01-21T01:49:03.408000-06:00 INFO Stopping monitor

debugger invoked on a CL-POSTGRES-ERROR:DATA-EXCEPTION in thread
#<THREAD "main thread" RUNNING {D9F7C51}>:
  Database error 22023: parameter "work_mem" requires an integer value
QUERY: SET work_mem TO '12MB'

[patch] pgloader hangs on connection errors (python 2.4)

Hi,

It seems that pgloader hangs after encountering many types of errors, e.g., database connection errors (but possibly other kinds of errors, too). It looks like it hangs after catching (and printing) the exception. I think it may have something to do with multi-threading.

Here's the output of of a sample invocation when the specified DB server is down:

$ ./pgloader.py -c pgloader.conf -d
pgloader     INFO     Logger initialized
pgloader     WARNING  path entry '/usr/share/python-support/pgloader/reformat' does not exists, ignored
pgloader     INFO     Reformat path is []
pgloader     INFO     Will consider following sections:
pgloader     INFO       data
pgloader     INFO     Will load 1 section at a time
data         INFO     Loading threads: 1
data         ERROR    could not connect to server: Connection refused
    Is the server running on host "localhost" and accepting
    TCP/IP connections on port 54321?

Exception in thread data:
Traceback (most recent call last):
  File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/home/dpanech/pgloader_git/pgloader/pgloader/pgloader.py", line 831, in run
    self._postinit()
  File "/home/dpanech/pgloader_git/pgloader/pgloader/pgloader.py", line 212, in _postinit
    self.db.reset()
  File "/home/dpanech/pgloader_git/pgloader/pgloader/db.py", line 196, in reset
    raise PGLoader_Error, "Can't connect to database"
PGLoader_Error: Can't connect to database

At this point it hangs and ignores SIGINT (KeybpardInterrupt).

I'm using python 2.4, psycopg 2.0.13, libpq 8.1.21 on RedHat EL 5 (update 5).

Thanks,
D.

Trouble with mysql data migration and enum type

Allow using a transformation function when casting enum data.

Ref : When migrating a mysql database which contains enum data types, pgloader fails to copy the data as the data source is not among the enum accepted values.

Example : a column in a table from a mysql database is of type enum('0','1') but some lines have "" as values.

Encoding problem at compile time

I have an issue when tring to build PgLoader on a freshly installed brand new minimal Debian.

make fails with odd error message. If I try to launch ./pgloader.lisp it seems to download sources from some repos and then fails with the following stack trace:

; Upgrading ASDF from version 2.011 to version 2.26
Loading quicklisp and the pgloader project and its dependencies...
unhandled LOAD-SYSTEM-DEFINITION-ERROR in thread #<SB-THREAD:THREAD
                                                   "initial thread" RUNNING
                                                   {AAE58D1}>:
  Error while trying to load definition for system qmynd from pathname
  /home/greg/quicklisp/local-projects/qmynd/qmynd.asd:
     decoding error on stream
     #<SB-SYS:FD-STREAM
       for "file /home/greg/quicklisp/local-projects/qmynd/qmynd.asd"
       {B1641B9}>
     (:EXTERNAL-FORMAT :ASCII):
       the octet sequence (195) cannot be decoded.

0: (SB-DEBUG::MAP-BACKTRACE #<CLOSURE (LAMBDA #) {B16A725}>)[:EXTERNAL]
1: (SB-DEBUG:BACKTRACE 128 #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDERR* {90D48F9}>)
2: (SB-DEBUG::DEBUGGER-DISABLED-HOOK
    #<LOAD-SYSTEM-DEFINITION-ERROR {B166E21}>
    #<unavailable argument>)
3: (SB-DEBUG::RUN-HOOK
    SB-EXT:*INVOKE-DEBUGGER-HOOK*
    #<LOAD-SYSTEM-DEFINITION-ERROR {B166E21}>)
4: (INVOKE-DEBUGGER #<LOAD-SYSTEM-DEFINITION-ERROR {B166E21}>)
5: (ERROR LOAD-SYSTEM-DEFINITION-ERROR)[:EXTERNAL]
6: ((FLET #:LAMBDA2060) #<SB-INT:STREAM-DECODING-ERROR {B166BD1}>)
7: (SIGNAL #<SB-INT:STREAM-DECODING-ERROR {B166BD1}>)[:EXTERNAL]
8: (ERROR SB-INT:STREAM-DECODING-ERROR)[:EXTERNAL]
9: (SB-INT:STREAM-DECODING-ERROR
    #<SB-SYS:FD-STREAM
      for "file /home/greg/quicklisp/local-projects/qmynd/qmynd.asd" {B1641B9}>
    (195))
10: (SB-IMPL::STREAM-DECODING-ERROR-AND-HANDLE
     #<SB-SYS:FD-STREAM
       for "file /home/greg/quicklisp/local-projects/qmynd/qmynd.asd"
       {B1641B9}>
     1)
11: (SB-IMPL::FD-STREAM-READ-N-CHARACTERS/ASCII
     #<SB-SYS:FD-STREAM
       for "file /home/greg/quicklisp/local-projects/qmynd/qmynd.asd"
       {B1641B9}>"

There are also undocumented dependencies, I am going to PR your install doc for that.

An unhandled error condition has been signalled

I am getting following error :

2014-01-07T16:04:31.319000+05:30 NOTICE COPY widget_widgetsettings
An unhandled error condition has been signalled:
   There is no applicable method for the generic function
     #<STANDARD-GENERIC-FUNCTION QMYND-IMPL:MYSQL-CONNECTION-CHARACTER-SET (1)>
   when called with arguments
     (NIL).
2014-01-07T16:04:31.519000+05:30 INFO COPY widget_widgetsettings done.

Build fails on Ubuntu12.04 LTS 64bit

Hi Dimitri,

the build on 12.04 LTS still fails. Let me know if you need more info about me system.

[package pgloader.docs]
touch build/libs.stamp
sbcl --no-sysinit --no-userinit --load build/quicklisp/setup.lisp                \
             --eval '(ql:write-asdf-manifest-file "build/manifest.ql")'  \
             --eval '(quit)'
This is SBCL 1.0.55.0.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
mkdir -p build/bin
sbcl --no-sysinit --no-userinit --load build/quicklisp/setup.lisp               \
             --eval '(ql:quickload "buildapp")'                   \
             --eval '(buildapp:build-buildapp "build/bin/buildapp.sbcl")'              \
             --eval '(quit)'
This is SBCL 1.0.55.0.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
To load "buildapp":
  Install 1 Quicklisp release:
    buildapp
; Fetching #<URL "http://beta.quicklisp.org/archive/buildapp/2014-02-11/buildapp-1.5.2.tgz">
; 15.91KB
==================================================
16,287 bytes in 0.00 seconds (15905.27KB/sec)
; Loading "buildapp"
[package buildapp];
; compilation aborted because of fatal error:
;   Symbol "EXIT" not found in the SB-EXT package.
;
;     Line: 43, Column: 22, File-Position: 1709
;
;     Stream: #<SB-SYS:FD-STREAM
;               for "file /home/acidjunk/GIT/pgloader/build/quicklisp/dists/quicklisp/software/buildapp-1.5.2/utils.lisp"
;               {1002B2C763}>
;   

debugger invoked on a ASDF:COMPILE-ERROR in thread
#<THREAD "initial thread" RUNNING {1002999803}>:
  Error while invoking #<COMPILE-OP (:VERBOSE NIL) {100659CC13}> on
  #<CL-SOURCE-FILE "buildapp" "utils">

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RETRY   ] Retry compiling #<CL-SOURCE-FILE "buildapp" "utils">.
  1: [ACCEPT  ] Continue, treating
                compiling #<CL-SOURCE-FILE "buildapp" "utils"> as having been
                successful.
  2: [ABORT   ] Give up on "buildapp"
  3: [CONTINUE] Ignore runtime option --eval "(ql:quickload \"buildapp\")".
  4:            Skip rest of --eval and --load options.
  5:            Skip to toplevel READ/EVAL/PRINT loop.
  6: [QUIT    ] Quit SBCL (calling #'QUIT, killing the process).

((SB-PCL::FAST-METHOD ASDF:PERFORM (ASDF:COMPILE-OP ASDF:CL-SOURCE-FILE))
 #<unavailable argument>
 #<unavailable argument>
 #<ASDF:COMPILE-OP (:VERBOSE NIL) {100659CC13}>
 #<ASDF:CL-SOURCE-FILE "buildapp" "utils">)
0] 

pgloader build failure

After running 'make pgloader' i get "System "pgloader" not found" message. Seems like quicklisp somewhy doesn't try to look for the system definition in the current directory. I removed my existing quicklisp installation, ~/.sbcl and retried to run make - same result.

Please, find the complete output here: http://pastebin.com/avpyjus2
ii sbcl 2:1.1.13-1 amd64

Any suggestions?

Issue with pgloader installation (lisp v.) on ubuntu server

After installing sbcl, libmysqlclient-dev, libsqlite3-dev, quicklist as stated in Readme.md file I came across error while trying to install pgloader using make pgloader.

Error from installation:

 make pgloader
sbcl --load ~/quicklisp/setup.lisp                             \
             --eval '(ql:quickload "pgloader")'                        \
             --eval '(quit)'
This is SBCL 1.0.55.0.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
To load "pgloader":
  Load 1 ASDF system:
    pgloader
; Loading "pgloader"
........................
debugger invoked on a TYPE-ERROR in thread
#<THREAD "initial thread" RUNNING {1002999483}>:
  The value NIL is not of type STRING.

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [TRY-RECOMPILING] Recompile params and try loading it again
  1: [RETRY          ] Retry
                       loading FASL for #<CL-SOURCE-FILE "pgloader" "src" "params">.
  2: [ACCEPT         ] Continue, treating
                       loading FASL for #<CL-SOURCE-FILE "pgloader" "src" "params">
                       as having been successful.
  3: [ABORT          ] Give up on "pgloader"
  4: [CONTINUE       ] Ignore runtime option --eval "(ql:quickload \"pgloader\")".
  5:                   Skip rest of --eval and --load options.
  6:                   Skip to toplevel READ/EVAL/PRINT loop.
  7: [QUIT           ] Quit SBCL (calling #'QUIT, killing the process).

(PARSE-INTEGER NIL)[:OPTIONAL]
0] 2
.
touch build/libs.stamp
touch: cannot touch `build/libs.stamp': No such file or directory
make: *** [build/libs.stamp] Error 1

Hope this helps in anyway.

Loading hstore columns

I'm trying to load a CSV custom written for loading into postgres with an hstore column.
I'm getting errors similar to the following from pgloader:

debugger invoked on a CL-POSTGRES-ERROR:INTERNAL-ERROR in thread
#<THREAD "lparallel" RUNNING {1009552F23}>:
  Database error XX000: Unexpected end of string
CONTEXT: COPY eventstmp, line 4521, column data: ""email"=>"
2014-02-18T01:42:04.711000+02:00 ERROR Database error XX000: Unexpected end of string

Can pgloader be made to load hstores or is it an issue with the Postmodern library?

MATERIALIZE VIEWS parse error should be more informative

I had MATERIALIZE VIEWS after the INCLUDING ONLY TABLE NAMES MATCHING clause and received the following

STYLE-WARNING: Undefined alien: "SSLv2_client_method"
An unhandled error condition has been signalled:
load database
   Could not parse subexpression ";" when parsing

 Expression PGLOADER.PARSER::COMMANDS
    Subexpression (+ PGLOADER.PARSER::COMMAND)
    Subexpression PGLOADER.PARSER::COMMAND
    Subexpression (AND
                   (OR PGLOADER.PARSER::LOAD-ARCHIVE
                       PGLOADER.PARSER::LOAD-CSV-FILE
                       PGLOADER.PARSER::LOAD-FIXED-COLS-FILE
                       PGLOADER.PARSER::LOAD-DBF-FILE
                       PGLOADER.PARSER::LOAD-MYSQL-DATABASE
                       PGLOADER.PARSER::LOAD-SQLITE-DATABASE
                       PGLOADER.PARSER::LOAD-SYSLOG-MESSAGES)
                   PGLOADER.PARSER::END-OF-COMMAND)
    Subexpression PGLOADER.PARSER::END-OF-COMMAND
    Subexpression (AND PGLOADER.PARSER::IGNORE-WHITESPACE #\;
                       PGLOADER.PARSER::IGNORE-WHITESPACE)
    Subexpression ";"

  Encountered at:
    load database
     from
    ^ (Line 1, Column 0, Position 0)


;
; compilation unit aborted
;   caught 1 fatal ERROR condition

Took me a bit to figure out the issue, but once I swapped them it worked.

MySQL foreign key support doesn't always quote identifiers

I'm using the quote identifiers and foreign keys options. One of our tables has, perhaps foolishly, used mixed case in the column names.

CREATE TABLE `event` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `eventType_id` smallint(6) NOT NULL,
  `eventDate` datetime NOT NULL,
  `user_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FK_event_user_id` (`user_id`),
  KEY `IX_event_eventtypeid_user_id` (`eventType_id`,`user_id`),
  CONSTRAINT `FK_event_eventtype_id` FOREIGN KEY (`eventType_id`) REFERENCES `eventtype` (`id`),
  CONSTRAINT `FK_event_user_id` FOREIGN KEY (`user_id`) REFERENCES `auth_axialuser` (`user_ptr_id`)
) ENGINE=InnoDB AUTO_INCREMENT=924197 DEFAULT CHARSET=utf8;

When executing the migration, I get an error:

2014-05-22T02:46:45.367000+04:00 ERROR Database error 42703: column "eventtype_id" referenced in foreign key constraint does not exist
QUERY: ALTER TABLE "event" ADD CONSTRAINT "FK_event_eventtype_id" FOREIGN KEY(eventType_id) REFERENCES "eventtype"(id)

I'm assuming this is because the eventType_id is not quoted in the ALTER TABLE.

CSV loading broken

Hi,

CSV loading seems to be entirely broken, as even running the tests fails.

FYI, running on Slackware Linux with just compiled SBCL 1.1.15.

All tests and configuration based on the git documentation raise the following:

An unhandled error condition has been signalled:
   The value (#P"the-pathname-from-the-config-file")
   is not of type
     (OR (VECTOR CHARACTER) (VECTOR NIL) BASE-STRING PATHNAME FILE-STREAM).

Some DB connection errors cause infinite polling loop on DB connection.

I had a problem where pgloader would startup and seem to hang, with CPU at 100% when I was trying to test the 'sakila' conversion. I finally traced this to a DB connection issue. If the connection for the mysql DB has an incorrect or
missing password, the loader goes into an infinite loop.

ascii character set breaks pgloader

If a column on a mysql table has ASCII as the character set AND is using ascii_bin as the collation method, the thread copying over that data is halted with a type-error.
An example of a table that will halt pgloader:

create table t (s text character set ascii collate ascii_bin);
insert into t values('a'), ('b'), ('c');

Need options to handle poor MySQL encoding

I've got a MySQL 5.0 server, with tables using a latin1 collation, but the data are encoded using UTF8 on client side.
When loading the data using pgloader into postgresql, the encoding is destroyed and the strings end up being encoded twice (eg. Andrรฉ ==> Andrรƒยฉ).
I tried hacking the charset settings in the mysql connection, but it didn't change anything so far :
(mysql-query "SET NAMES 'latin1'")
(mysql-query "SET character_set_results = 'latin1';")
(mysql-query "SET character_set_results = NULL;")

I noticed the data sent on the wire is proper UTF8, so I guess the double encoding happen on the pgloader side.

Problem with CSV

Hi,

pgloader has problems with csv-values like this one:

"C:\\some\\nasty\\windows\\path\\"

It responds with:

ERROR We finished reading a quoted value and got more characters before a separator or EOL 69

I used the following WITH options:

WITH truncate,
    fields optionally enclosed by '"',
    fields escaped by backslash-quote,
    fields terminated by ','

The csv was created with mysqldump:

mysqldump -u root --fields-optionally-enclosed-by='"' --fields-terminated-by=',' --fields-escaped-by='\' --tab /tmp/ dbname tablename

Seems like there is a bug in pgloader's csv-parser?

Report after using pgloader for migrating a mantisbt db from mysql to postgresql

Hi @dimitri,

I used pgloader in order to translate a mysql @mantisbt database to postgresql. pgloader have been very useful. However some difficulties appear that can be interesting to describe.

I followed this post. It advices to use mysql2postgres but this tool fails to connect to my mysql server. So I tried pgloader.

Good point:

  • At the end it works, the makefile download every dependencies.
  • Instead of patching mysql2posgres for converting tinyint(4) to boolean, the cast can be describe in the configuration.

The problems:

  • sbcl (1.0.57.0.debian) refuse to run with address randomization on x86_64 so I needed to deactivate it during the conversion with echo 0 > /proc/sys/kernel/randomize_va_space
  • can't connect to postgresql by the unix socket
  • the syntax error messages are really not user friendly
  • pgloader.exe --help need the usage line or the user don't know where to put the configuration
  • the longblob of mantis are not correctly converted. It is the same with mysql2postgres so I used the solution from the post (except with more \):
echo "TRUNCATE TABLE mantis_bug_file_table;" > mantis_bug_file_table.sql

mysqldump -umantis_user -ppassword --database mantis  --table mantis_bug_file_table --skip-extended-insert --complete-insert --hex-blob \
| sed -n -e "/INSERT /s/\`//g" -e "s/',0x\([0-9A-Fa-f\]*\),/',E'\\\\\\\\x\1\',/g p" >> mantis_bug_file_table.sql

sudo -u www-data psql bugtracker < mantis_bug_file_table.sql

The pgloader configuration:

LOAD DATABASE
     FROM      mysql://mantis_user:password@localhost/mantis
     INTO postgresql://mantis:password@localhost/bugtracker

     WITH include no drop, truncate, create no tables, create no indexes, reset sequences, no foreign keys

     CAST type tinyint when (= precision 4) to boolean using tinyint-to-boolean

     EXCLUDING TABLE NAMES MATCHING mantis_bug_file_table
;

Thank you a lot for this nice tool,

Cannot load from stdin

LOAD CSV
     FROM stdin
     INTO postgresql://u:p@localhost/strand?activity_audit
WITH truncate,
        skip header = 0,
        fields terminated by '}'
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';

And I get the following error message:

STYLE-WARNING: Undefined alien: "SSLv2_client_method"
2014-04-26T20:46:06.055000+01:00 LOG Starting pgloader, log system is ready.
2014-04-26T20:46:06.095000+01:00 INFO Starting monitor
2014-04-26T20:46:06.101000+01:00 LOG Main logs in '/tmp/pgloader/pgloader.log'
2014-04-26T20:46:06.101000+01:00 LOG Data errors in '/tmp/pgloader/'

debugger invoked on a TYPE-ERROR in thread
#<THREAD "lparallel" RUNNING {10075A64A3}>:
  The value #<SYNONYM-STREAM :SYMBOL SB-SYS:*STDIN* {1000217283}>
  is not of type
    (OR (VECTOR CHARACTER) (VECTOR NIL) BASE-STRING PATHNAME FILE-STREAM).
2014-04-26T20:46:06.211000+01:00 NOTICE COPY strand.activity_audit
2014-04-26T20:46:06.211000+01:00 NOTICE TRUNCATE activity_audit;

Latest version from git

Heap exhausted

Hi Dimitri,
I am migrating a database from MySQL to PostgreSQL and there is a table that contains 250000 rows and is 2.9 GB big, it's mostly generated html pages so there is one TEXT column. pgloader crashes on this one with the following :

Heap exhausted during garbage collection: 65536 bytes available, 96080 requested.
 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB   LUB  !move  Alloc  Waste   Trig    WP  GCs Mem-age
   0:     0     0     0     0     0     0     0     0     0        0     0 42949672    0   0  0,0000
   1:     0     0     0     0     0     0     0     0     0        0     0 42949672    0   0  0,0000
   2:     0     0     0     0     0     0     0     0     0        0     0 42949672    0   0  0,0000
   3: 14393 124113     0     0  1267 86366     0  2636     0 2494998992 462935600  2000000    0   0  1,1623
   4:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0,0000
   5:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0,0000
   6:     0     0     0     0  2468   978     0     0     0 112918528     0  2000000 2227   0  0,0000
   Total bytes allocated    = 3615294096
   Dynamic-space-size bytes = 4294967296
GC control variables:
   *GC-INHIBIT* = true
   *GC-PENDING* = true
   *STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 4008(tid 41946624):
Heap exhausted, game over.

Maybe the MySQL table content should be unbuffered for large tables, if that's possible ? Do you have any solution for that ?

I am using the last versions of pgloader and sbcl.

Thank you.

pgoader.exe returns error after compilation

I compiled a sbcl 1.1.15. I had to change the call to buildapp in the makefile to make it to pass:

  • changed the "--dynamic-space-size" value to 1560M to fit in the LXC container
  • removed the "--compress-core" unsupported by this build of sbcl

Compilation finally went fine and it ends up with a "pgloader.exe" in the build directory. When I run it from the command line, it only crashes like shown below:

$ ./build/pgloader.exe --debug
FATAL: can't find root-dir: #P"//tmp/pgloader//"
sb-impl::*default-external-format* :ANSI_X3.4-1968
tmpdir: #P"/tmp/pgloader/"

Did I do something wrong ?

[patch] better connection parameters

This patch makes all DB connection parameters optional, which is more in line with psycopg/libpq API. All of these parameters default to reasonable values, same as with all other postgres client tools: "user" and "base" name default to OS user id, port defaults to 5432, "host", if omitted, causes a UNIX domain socket to be used.

    diff --git a/pgloader/db.py b/pgloader/db.py
    index 7fb737e..f036d6a 100644
    --- a/pgloader/db.py
    +++ b/pgloader/db.py
    @@ -51,16 +51,13 @@ elif PSYCOPG_VERSION == 2:
     class db:
         """ a db connexion and utility class """
         def __init__(self,
    -                 host, port, base, user, passwd,
    +                 dsn,
                      client_encoding = PG_CLIENT_ENCODING,
                      copy_every = 10000, commit_every = 1000, connect = True):
             """ Connects to the specified database """
             self.log     = log
             self.dbconn  = None
    -        self.dsn     = "host=%s port=%d user=%s dbname=%s password=%s" \
    -                       % (host, port, user, base, passwd)
    -        self.connect = "-h %s -p %s -U %s" % (host, port, user)
    -        self.base    = base
    +        self.dsn     = dsn

             # those parameters can be overwritten after db init
             # here's their default values
    diff --git a/pgloader/pgloader.py b/pgloader/pgloader.py
    index 5b1becd..afbdfa2 100644
    --- a/pgloader/pgloader.py
    +++ b/pgloader/pgloader.py
    @@ -157,12 +157,21 @@ class PGLoader(threading.Thread):
                 return

             try:
    -            self.db = db(config.get(section, 'host'),
    -                         config.getint(section, 'port'),
    -                         config.get(section, 'base'),
    -                         config.get(section, 'user'),
    -                         config.get(section, 'pass'),
    -                         connect = False)
    +            dsn = [];
    +            if config.has_option(section, 'host'):
    +                dsn.append ("host=%s" % config.get(section, 'host'))
    +            if config.has_option(section, 'port'):
    +                dsn.append ("port=%d" % config.getint(section, 'port'))
    +            if config.has_option(section, 'base'):
    +                dsn.append ("dbname=%s" % config.get(section, 'base'))
    +            if config.has_option(section, 'user'):
    +                dsn.append ("user=%s" % config.get(section, 'user'))
    +            if config.has_option(section, 'pass'):
    +                dsn.append ("password=%s" % config.get(section, 'pass'))
    +            if config.has_option(section, 'sslmode'):
    +                dsn.append ("sslmode=%s" % config.get(section, 'sslmode'))
    +
    +            self.db = db(" ".join (dsn), connect = False)

                 for opt in ['client_encoding', 'datestyle', 'lc_messages']:
                     if config.has_option(section, opt):

Performance issue with pgloader 2.3.2-1

Hi,
I use pgloader v2.3.2-1 because the company I work for does not want to use v3.
I use postgresql 9.2. on Red Hat 6.3 64 bits.

I have a performance issue when I compare COPY and PGLOADER.
With a csv file of 1 784 954 lines on the server :
TRUNCATE + COPY takes 10 seconds
pgloader (with localhost database) takes 04m32.154s
(There are no rejects.)

I tried to change parameters in my conf file but nothing really changes and the gap is so high I think it's not a matter of tuning. I think I'm doing something wrong but I don't see what it could be...

Can you help me please?

Thank you!

My config file :
[pgsql]
host = localhost
port = 5432
base = ***
user = ***
pass = ***
log_file = /_/_/logs/pgloader.log
log_min_messages = INFO
client_min_messages = WARNING

lc_messages = C
pg_option_client_encoding = 'unicode'
pg_option_standard_conforming_strings = on
pg_option_work_mem = 1024MB

copy_every = 16000

null = ""
empty_string = "\ "

max_parallel_sections = 6

section_threads = 4

split_file_reading = true

rrqueue_size = 5000

[common_tmpl]
template = True
format = csv
field_sep = |
trailing_sep = False

[mytable]
use_template = common_tmpl
columns = *
filename = /mytable.csv
table = mytable

For copy function :
COPY mytable FROM '/mytable' WITH (DELIMITER '|', ENCODING 'unicode');

Sqlite: Migration goes wrong on camelcased columns and/or reserved keywords

SQLITE FILE:
https://dl.dropboxusercontent.com/u/20756661/storage.sqlite.bz2

COMMAND FILE

load database
    from 'storage.sqlite'         
    into postgresql:///migrated

WITH include drop, create tables, create indexes, reset sequences 

SET work_mem to '16MB', maintenance_work_mem to '512 MB';

When running:

pgloader sqlite.loader 
2014-04-26T04:42:38.047000+04:00 LOG Starting pgloader, log system is ready.
2014-04-26T04:42:38.078000+04:00 LOG Main logs in '///tmp/pgloader///pgloader.log'
2014-04-26T04:42:38.084000+04:00 LOG Data errors in '//tmp/pgloader//'
2014-04-26T04:42:38.211000+04:00 WARNING Postgres warning: table "auth_user" does not exist, skipping
2014-04-26T04:42:38.412000+04:00 WARNING Postgres warning: table "auth_group" does not exist, skipping
2014-04-26T04:42:38.412000+04:00 WARNING Postgres warning: table "auth_membership" does not exist, skipping
2014-04-26T04:42:38.412000+04:00 WARNING Postgres warning: table "auth_permission" does not exist, skipping
2014-04-26T04:42:38.412000+04:00 WARNING Postgres warning: table "auth_event" does not exist, skipping
2014-04-26T04:42:38.412000+04:00 WARNING Postgres warning: table "auth_cas" does not exist, skipping
2014-04-26T04:42:38.613000+04:00 WARNING Postgres warning: table "district" does not exist, skipping
2014-04-26T04:42:38.613000+04:00 WARNING Postgres warning: table "newsletter_type" does not exist, skipping
2014-04-26T04:42:38.613000+04:00 WARNING Postgres warning: table "newsletter" does not exist, skipping
2014-04-26T04:42:38.613000+04:00 WARNING Postgres warning: table "newsletter_items" does not exist, skipping
2014-04-26T04:42:38.814000+04:00 WARNING Postgres warning: table "newsletter_boxes" does not exist, skipping
2014-04-26T04:42:38.814000+04:00 WARNING Postgres warning: table "newsletter_subscribers" does not exist, skipping
2014-04-26T04:42:38.814000+04:00 WARNING Postgres warning: table "newsletter_membership" does not exist, skipping
2014-04-26T04:42:38.814000+04:00 WARNING Postgres warning: table "newsletter_statistics" does not exist, skipping
2014-04-26T04:42:38.814000+04:00 WARNING Postgres warning: table "newsletter_queue" does not exist, skipping
2014-04-26T04:42:39.014000+04:00 WARNING Postgres warning: table "subscribers_queue" does not exist, skipping
2014-04-26T04:42:39.015000+04:00 WARNING Postgres warning: table "organisation_category" does not exist, skipping
2014-04-26T04:42:39.015000+04:00 WARNING Postgres warning: table "organisation" does not exist, skipping
2014-04-26T04:42:39.015000+04:00 WARNING Postgres warning: table "organisation_target" does not exist, skipping
2014-04-26T04:42:39.015000+04:00 WARNING Postgres warning: table "organisation_target_link" does not exist, skipping
2014-04-26T04:42:39.015000+04:00 WARNING Postgres warning: table "organisation_contact" does not exist, skipping
2014-04-26T04:42:39.215000+04:00 WARNING Postgres warning: table "organisation_imported" does not exist, skipping
2014-04-26T04:42:39.216000+04:00 WARNING Postgres warning: table "district_codes_imported" does not exist, skipping
2014-04-26T04:42:39.216000+04:00 WARNING Postgres warning: table "aggregator_feed" does not exist, skipping
2014-04-26T04:42:39.216000+04:00 WARNING Postgres warning: table "aggregator_rss" does not exist, skipping
2014-04-26T04:42:39.416000+04:00 WARNING Postgres warning: table "aggregator_log" does not exist, skipping
2014-04-26T04:42:39.416000+04:00 WARNING Postgres warning: table "scheduler_task" does not exist, skipping
2014-04-26T04:42:39.416000+04:00 WARNING Postgres warning: table "scheduler_run" does not exist, skipping
2014-04-26T04:42:39.417000+04:00 WARNING Postgres warning: table "scheduler_worker" does not exist, skipping
2014-04-26T04:42:39.617000+04:00 WARNING Postgres warning: table "events" does not exist, skipping
2014-04-26T04:42:39.617000+04:00 WARNING Postgres warning: table "contact" does not exist, skipping
2014-04-26T04:42:39.617000+04:00 WARNING Postgres warning: table "queue" does not exist, skipping
2014-04-26T04:42:39.817000+04:00 WARNING Postgres warning: table "page" does not exist, skipping
2014-04-26T04:42:39.817000+04:00 WARNING Postgres warning: table "page_archive" does not exist, skipping
2014-04-26T04:42:39.818000+04:00 WARNING Postgres warning: table "page_item" does not exist, skipping
2014-04-26T04:42:40.018000+04:00 WARNING Postgres warning: table "page_text" does not exist, skipping
2014-04-26T04:42:40.018000+04:00 WARNING Postgres warning: table "page_text_archive" does not exist, skipping
2014-04-26T04:42:40.019000+04:00 WARNING Postgres warning: table "page_image" does not exist, skipping
2014-04-26T04:42:40.019000+04:00 WARNING Postgres warning: table "page_file" does not exist, skipping
2014-04-26T04:42:40.219000+04:00 WARNING Postgres warning: table "page_picasa" does not exist, skipping
2014-04-26T04:42:40.219000+04:00 WARNING Postgres warning: table "page_youtube" does not exist, skipping
2014-04-26T04:42:40.219000+04:00 WARNING Postgres warning: table "page_facebook" does not exist, skipping
2014-04-26T04:42:40.220000+04:00 WARNING Postgres warning: table "page_twitter" does not exist, skipping
2014-04-26T04:42:40.420000+04:00 WARNING Postgres warning: table "page_link" does not exist, skipping
2014-04-26T04:42:40.420000+04:00 WARNING Postgres warning: table "page_faq" does not exist, skipping
2014-04-26T04:42:40.420000+04:00 WARNING Postgres warning: table "page_dealer" does not exist, skipping
2014-04-26T04:42:40.420000+04:00 WARNING Postgres warning: table "page_slider" does not exist, skipping
2014-04-26T04:42:40.621000+04:00 WARNING Postgres warning: table "page_form" does not exist, skipping
2014-04-26T04:42:40.621000+04:00 WARNING Postgres warning: table "page_fbcomments" does not exist, skipping
2014-04-26T04:42:40.621000+04:00 WARNING Postgres warning: table "home" does not exist, skipping
2014-04-26T04:42:40.621000+04:00 WARNING Postgres warning: table "page_item_custom" does not exist, skipping
2014-04-26T04:42:40.621000+04:00 WARNING Postgres warning: table "plugin_ckeditor_upload" does not exist, skipping
2014-04-26T04:42:40.821000+04:00 WARNING Postgres warning: table "plugin_tagging_tag" does not exist, skipping
2014-04-26T04:42:40.821000+04:00 WARNING Postgres warning: table "plugin_tagging_link" does not exist, skipping
2014-04-26T04:42:40.822000+04:00 WARNING Postgres warning: table "plugin_tagging_subscription" does not exist, skipping
2014-04-26T04:42:40.822000+04:00 WARNING Postgres warning: table "box" does not exist, skipping
2014-04-26T04:42:41.022000+04:00 WARNING Postgres warning: table "aggregator_twitter" does not exist, skipping

Then it freezes and the data seems partially imported.

Add to PyPI

As this is a Python package it would be great to be able to have it on PyPI and be installable especially in virtual environments with a simply bin/pip install pgloader and run it like bin/pgloader

Allow specifying custom row terminator

A file format I need to load uses SOH 0x01 for field separators and STX 0x02 for row separation. Can you please implement the ability to specify a custom row terminator?

Sqlite type DOUBLE not supported

When importing from sqlite, some types give a fatal error.

An unhandled error condition has been signalled:
Database error 42704: type "double" does not exist
QUERY: CREATE TABLE activity
(
id INTEGER,
name CHAR(70) not null default '',
ga_id CHAR(70),
g_id CHAR(70),
g_label CHAR(70),
url CHAR(120),
location CHAR(40),
max_participants INTEGER,
nr_prefs INTEGER,
nr_sessions INTEGER,
combined_prefs CHAR(1),
amount DOUBLE,
d_amount DOUBLE,
d_threshold INTEGER,
d_allin DOUBLE,
w_start TIMESTAMP,
w_end TIMESTAMP,
w_register TIMESTAMP,
register_info CHAR(250),
email CHAR(30),
phone CHAR(30),
booking_feedback CHAR(1),
email_required CHAR(1),
autopay_required CHAR(1),
pay_options INTEGER,
ps_orderprefix CHAR(512),
ps_appid CHAR(512),
is_sequence CHAR(1),
asktitle CHAR(1),
forced_language CHAR(6)
);

Include compiled man pages

I'm writing a Homebrew formula to install pgloader on OS X, but the dependency on pandoc makes installing a properly-formatted man page difficult. Any chance we could get pre-compiled documentation version-controlled in the repository?

(Thanks for pgloader!)

[feature request] progress estimation

It would be good to have a feature whereby in normal/verbose mode, pgloader would report how much data has been processed and what remains along with best guess estimate.
With the naive pgsql "\copy" I've used pv -i mydata.csv | psql -c "\copy ..." which gave me a crude progress, but due to the STDIN issues in #53 this doesn't work.

Issues using pgloader lisp version on Ubuntu

I realize that this repository is not for lisp version but could not find any other way to reach you. I was following your instructions from blogpost - Loading Geolocation Data - to load GeoLite data, but am facing certain errors.

When I run the program I get the following error:

2013-10-17T20:28:31.041000+05:30 NOTICE drop index if exists geolite.blocks_ip4r_idx;
2013-10-17T20:28:31.042000+05:30 NOTICE truncate table geolite.blocks, geolite.location cascade;
The value
  #S(LPARALLEL.BIASED-QUEUE:BIASED-QUEUE
     :LOCK #<SB-THREAD:MUTEX "Anonymous lock" (free)>
     :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {1002BCC8E3}>
     :HIGH (NIL)
     :LOW (NIL))
is not of type
  LPARALLEL.KERNEL::SCHEDULER.

The tables are getting created but then loading of CSV is not starting. Could you please point in right direction. I have no prior knowledge of LISP to debug.

Sqlite BLOB column not supported.

When trying to convert a sqlite DB with some BLOB columns I run into a fatal error:
ERROR Database error 42704: type "blob" does not exist.

It now fails with:

isOyqMSSOyKymoyC+DmHFgy8eMIz9UD6pXDF08OnH0s59c0/NlwDDB/gxlE+n/3xP8A/qduy6/WcTPXf/3s27/DlAnKW2Acmy5SW3IyMomJaUWIunU6lyxdBAE0YoC6DV/5Qby8kG/K6BSToaKDGRVG3EoB3KDObHzQYxN5W0Qbsg2vdBkGogFgsAsmqH5K6DTVRWq7INp+6DKA/miA/dVW0UBdAPC6qMFnRmKm1WWoNeqqN5IrFBg11IMSPNUb8BEY2UUGH9VQDdBrINr3Qa3miMit3KiMyKBVDR+CgPzVG7IjN5IrWooP/9k="
is not of type
  (OR NULL (SIMPLE-ARRAY (UNSIGNED-BYTE 8) (*))).

I've added a SQLite file with 'files' table including a 'file_data' BLOB field. You'll find two records, one PDF and one JPEG file.

https://dl.dropboxusercontent.com/u/20756661/storage.sqlite.tgz

Unix sockets

Hi,
Trying to set pgloader to use a secured configuration (only socket connections can log in to modify data) fails with different errors relating to INTO line parsing.
Config example:

LOAD CSV
 FROM '/full/file/path'
 INTO postgresql://unix:/tmp/.s.PGSQL.5432/stats?a_table
;

This errors out with 'Could not parse subexpression "/" when parsing'
Trying to add ':5432' (or even just a colon) after the path errors out with 'The value "/tmp/.s.PGSQL.5432" is not of type LIST.'

Same thing if I omit the filename itself.
Am I missing something?

Thanks

missing charset.lisp, asdf version

Using 35ca492 under Debian stable

During compilation ("make pgloader"): "charsets.lisp" is missing, I symlinked:

cd src
ln -s ../build/quicklisp/local-projects/qmynd/src/common/charsets.lisp

... and it resumed

Then a real problem (probably not corelated) arises:

mkdir -p build/bin
build/bin/buildapp      --logfile /tmp/build.log                \
                         --require sb-posix                      \
                         --require sb-bsd-sockets                \
                         --require sb-rotate-byte                \
                         --asdf-path .                           \
                         --asdf-tree build/quicklisp/local-projects     \
                         --manifest-file build/manifest.ql             \
                         --asdf-tree build/quicklisp/dists              \
                         --asdf-path .                           \
                         --load-system pgloader               \
                         --load src/hooks.lisp                   \
                         --entry pgloader:main                   \
                         --dynamic-space-size 4096               \
                         --compress-core                    \
                         --output build/bin/pgloader
;; loading system "pgloader"
Fatal MISSING-DEPENDENCY-OF-VERSION:
  Component "asdf" does not match version 2.22.3, required by
  #<SYSTEM "asdf-finalizers">

The local (Debian-packaged) asdf is v2.22-1, but AFAIK it is not used during compilation.

Thx :-)

Some characters make import to fail

debugger invoked on a SB-FORMAT:FORMAT-ERROR in thread
#<THREAD "lparallel" RUNNING {DA3C7A1}>:
  error in FORMAT: no more arguments
  > "143421     [email protected]   arese~@$$       2011-07-31 09:22:04
"
                                         ^
while processing indirect format string:
  ~a ~a ~?~&
         ^

It may be a logger problem since I do not have this issue when invoking pgloader without --debug. I could reproduce the bug on another record with a text field containing the sequence '~ '. The debugger complains the format space is not recognized.

empty strings seems to make field shift

I have an issue with a bug MySQL table containing empty strings. It makes csv fields to shift an not match the expected Pg field.

Btw, Here is an example of a failing row:

CONTEXT: COPY accommodation, line 1, column slideshow: "2007-05-07 09:39:11"
2014-03-27T17:02:26.306000Z INFO error recovery at 298/25000, processing bad row
2014-03-27T17:02:26.306000Z DEBUG BEGIN
2014-03-27T17:02:26.306000Z DEBUG SET LOCAL maintenance_work_mem TO '128MB'
2014-03-27T17:02:26.306000Z DEBUG SET LOCAL work_mem TO '12MB'
2014-03-27T17:02:26.306000Z DEBUG SET LOCAL search_path TO 'public'
2014-03-27T17:02:26.306000Z INFO error recovery at 298/25000, trying 24702 rows
2014-03-27T17:02:26.507000Z ERROR Database error 22P02: invalid input syntax for type boolean: "2007-05-07 09:39:11"

The field following the column 'slideshow' is a date but an empty string somewhere made the fields to shift left hence the date is now aligned with the slideshow column.

Is there a way to see the whole record that failed so it would be possible to search for the problem in the database ?

Could not parse subexpression ";" when parsing

Using master with config file copied straight from the tests (only altered the db names):

vagrant@precise64:~$ cat /vagrant/my.load
load database from mysql://localhost/horde
              into postgresql:///horde

with drop tables, truncate, create tables, create indexes,
     reset sequences,
     downcase identifiers

set work_mem to '128MB', maintenance_work_mem to '512 MB'

cast type datetime to timestamptz drop default using zero-dates-to-null,
     type date drop not null drop default using zero-dates-to-null,
     type tinyint to boolean using tinyint-to-boolean;

I tried it with and without the semicolon at the end. The results are identical.

...I get the following error message:

vagrant@precise64:~$ ./pgloader/pgloader.lisp /vagrant/my.load --debug
Loading quicklisp and the pgloader project and its dependencies...
An unhandled error condition has been signalled:
   Could not parse subexpression ";" when parsing

 Expression PGLOADER.PARSER::COMMANDS
    Subexpression (+ PGLOADER.PARSER::COMMAND)
    Subexpression PGLOADER.PARSER::COMMAND
    Subexpression (AND
                   (OR PGLOADER.PARSER::LOAD-ARCHIVE
                       PGLOADER.PARSER::LOAD-CSV-FILE
                       PGLOADER.PARSER::LOAD-FIXED-COLS-FILE
                       PGLOADER.PARSER::LOAD-DBF-FILE
                       PGLOADER.PARSER::LOAD-MYSQL-DATABASE
                       PGLOADER.PARSER::LOAD-SQLITE-DATABASE
                       PGLOADER.PARSER::LOAD-SYSLOG-MESSAGES)
                   PGLOADER.PARSER::END-OF-COMMAND)
    Subexpression PGLOADER.PARSER::END-OF-COMMAND
    Subexpression (AND PGLOADER.PARSER::IGNORE-WHITESPACE #\;
                       PGLOADER.PARSER::IGNORE-WHITESPACE)
    Subexpression ";"

  Encountered at:
    load database from mysql
    ^ (Line 1, Column 0, Position 0)


2014-02-17T11:41:07.025000Z LOG Starting pgloader, log system is ready.
2014-02-17T11:41:07.044000Z LOG Main logs in '/tmp/pgloader/pgloader.log'
2014-02-17T11:41:07.054000Z LOG Data errors in '/tmp/pgloader/'

/tmp/pgloader/pgloader.log says actually nothing:

vagrant@precise64:~$ cat /tmp/pgloader/pgloader.log 
2014-02-17T11:41:07.025000Z LOG Starting pgloader, log system is ready.
2014-02-17T11:41:07.044000Z LOG Main logs in '/tmp/pgloader/pgloader.log'
2014-02-17T11:41:07.054000Z LOG Data errors in '/tmp/pgloader/'

Halp!

lisp pgloader build failure

After running the install commands shown (including the fixed echo line echo (:tree pwd) ...) I get

The name "PGLOADER.LOGS" does not designate any package.

when building. This is the first time I've used lisp so I have no idea what's going on, how lisp handles packages, etc...

READ error during COMPILE-FILE: Package MYSQL does not exist.

The full error we get:

vagrant@psql-master:~$ pgloader -l pgloader.conf 
; 
; caught ERROR:
;   READ error during COMPILE-FILE:
;   
;     Package MYSQL does not exist.
;   
;       Line: 2, Column: 34, File-Position: 47
;   
;       Stream: #<SB-SYS:FD-STREAM for "file /home/vagrant/pgloader.conf"
;                 {1007092013}>
; 
; compilation unit aborted
;   caught 1 fatal ERROR condition
;   caught 1 ERROR condition

caused by this configuration file:

LOAD DATABASE
        FROM    mysql://root@localhost/horde
        TO      postgresql://vagrant@localhost/horde

I'm assuming that this may be an issue with how I built pgloader, since there are no symbols in the resulting binary that would hint at MySQL support.
I built it on a Ubuntu precise system, with the help of debian-bootstrap.sh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.