Giter VIP home page Giter VIP logo

Comments (17)

firecat53 avatar firecat53 commented on August 19, 2024

FYI, I just migrated to Mariadb 5.5.30 and it throws the exact same error as MySql does when using mysql-python 1.2.4. Everything else that I've tested works fine and exactly the same :) 1.2.3 works perfectly in all my use cases.

Scott

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

Any idea why it seems to be doing a rollback in your code?

from mysqldb1.

firecat53 avatar firecat53 commented on August 19, 2024

Here is the code. The rollback is just done when a database error is
encountered.

def sql_statement(conn,stmnt,sqlargs=None):
    cursor = conn.cursor()
    try:
        if sqlargs is not None and \
           isinstance(sqlargs[0], (tuple,list)) and \
           (stmnt.upper().startswith('INSERT') or
            stmnt.upper().startswith('REPLACE') or
            stmnt.upper().startswith('UPDATE')):
            # If sqlargs is passed as a list of list(or tuple of tuples) then
            # use the executemany method (check if the first item in the tuple
            # is a list or tuple)
            cursor.executemany(stmnt, sqlargs)
        elif sqlargs is not None and "%s" in stmnt:
            cursor.execute(stmnt, sqlargs)
        else:
            cursor.execute(stmnt)
    except dbapi.Error as e:
        conn.rollback()
        print(stmnt)
        raise Exception("Database Error {}: {}".format(
                                            e.args[0], e.args[1]))
    else:
        conn.commit()
    return cursor

Thanks!
Scott

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

I suggest printing the statement and exception info before performing the rollback and re-raising the exception. The rollback is covering up the real issue.

from mysqldb1.

firecat53 avatar firecat53 commented on August 19, 2024

Sorry about that...I see what you mean! Here's the traceback without the rollback:

Traceback (most recent call last):
  File "utilities/csv_to_database.py", line 310, in <module>
    get_csv(args.filename, args.tablename, args.template)
  File "utilities/csv_to_database.py", line 299, in get_csv
    return process_contents(conn, csv, table, cols, types)
  File "utilities/csv_to_database.py", line 198, in process_contents
    cur.executemany(stmnt, rows(csv, converters))
  File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 252, in
executemany
    r = self._query('\n'.join([query[:p], ',\n'.join(q), query[e:]]))
  File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 344, in
_query
    rowcount = self._do_query(q)
  File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 308, in
_do_query
    db.query(q)
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')

Several weeks ago I tried setting max_allowed_packet = 256M in /etc/mysql/my.cnf instead of the default 1M (per some research I did) and that initially seemed to help when running it on my laptop (6GB RAM), but it didn't help on my server (1 GB RAM). I have no idea if that's related at all, but I thought you should know.

Thanks!
Scott

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

I guess then this is either a very large query, or a very large result set, or both.

There's an attribute on the cursor which has the last executed statement, after parameters have been added. Probably too large to print but perhaps you could print the type and len of it? Sorry but the exact name escapes me. The query should be str when it is sent, but the intermediate one will be unicode.

from mysqldb1.

firecat53 avatar firecat53 commented on August 19, 2024

Yes, it's a very large text csv file (~75,000 lines and 337MB) that is being
imported into the database. Here is the info you requested. The actual insert
statement used is in my first post. Again, it calls executemany to do the
actual data insertion.

ipdb> zz=cur._last_executed
ipdb> len(zz)
329637016
ipdb> type(zz)
<type 'str'>
ipdb> zz[0:650]
"INSERT INTO temp
(GTIN,ItemNumber,Status,TrxCode,Discontinued,Created,ItemDescription,ColorGroup,ColorCode,Color,HexValue,SizeCode,Size,PieceWhsPrice,DozenWhsPrice,PackQty,CaseWhsPrice,CusWhsPrice,CusStdPrice,CaseQty,Weight,ModelImgXLarge,ModelImgLarge,ModelImgMedium,ModelImgSmall,ModelImgThumb,StillImgXLarge,StillImgLarge,StillImgMedium,StillImgSmall,StillImgThumb,Detail1ImgXLarge,Detail1ImgLarge,Detail1ImgMedium,Detail2ImgXLarge,Detail2ImgLarge,Detail2ImgMedium,Page,Care,MillCode,Mill,StyleCode,StyleDescr,MarketingHTMLDescr,MarketingPlainTextDescr,Colors,Category,Keywords,Feature)
VALUES \n('08402350863625','75175000','A','N','N','10/08/12',......"

Scott

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

So then it would be doing a multi-row INSERT. In 1.2.3, the regex for detecting INSERT statements was... less than perfect (still is, but it's better). It may not have been detecting your INSERT properly in 1.2.3 and thus was doing single-row inserts over the list.

I suggest either looping over your rows and using execute (which should be no slower than what you had in 1.2.3), or iterating over slices of your rows, say, 100 at a time, and see how that works. Multi-row INSERT should still have some performance benefits, even with InnoDB and transactions, but maybe you should benchmark that. You might even do better turning autocommit on temporarily when inserting chunks; multi-row INSERT is still atomic.

from mysqldb1.

firecat53 avatar firecat53 commented on August 19, 2024

I will work on

  1. Looping over rows
  2. Using executemany on row slices

But...does this mean that the issue with large executemany statements will not be fixed in 1.2.4+ ? Is it just a memory issue or something else?

Thanks,
Scott

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

The way the mysql api works is, the query and parameters get put together into one big string, so it's entirely possible to either run out of memory or go over mysql's packet size limits. The only thing I could do is have the row chunking in executemany and let it do that transparently. How does that sound?

from mysqldb1.

firecat53 avatar firecat53 commented on August 19, 2024

Here's some very basic benchmarks (one run only for each):
CSV file itemx3.out 337Mb, 75,176 lines

1.2.4 - original code
Failed (OperationalError: (2006, 'MySQL server has gone away'))

1.2.4 with 100 item chunks into executemany
c.get_csv("itemx3.out", 'temp')
Saving values in database...
Wall time: 182.78 s

1.2.4 with 100 item chunks into executemany using conn.autocommit(True)
(although I really don't understand what this changes)
c.get_csv("itemx3.out", 'temp')
Saving values in database...
Wall time: 174.92 s

1.2.4 with 200 item chunks into executemany
Failed (OperationalError: (2006, 'MySQL server has gone away'))

1.2.3 with 100 item chunks into executemany
%time c.get_csv("itemx3.out", 'temp')
Saving values in database...
Wall time: 171.44 s

1.2.3 - original code
%time c.get_csv("itemx3.out", 'temp')
Saving values in database...
Wall time: 272.57 s

Here's the 'chunking code' (taken from a StackOverflow answer):

from itertools import chain, islice
def ichunked(seq, chunksize):
    """Yields items from an iterator in iterable chunks."""
    it = iter(seq)
    while True:
        yield chain([it.next()], islice(it, chunksize-1))

def chunked(seq, chunksize):
    """Yields items from an iterator in list chunks."""
    for chunk in ichunked(seq, chunksize):
        yield list(chunk)          

cur = conn.cursor()
# rows is a generator object yielding processed rows from the CSV file
for chunk in chunked(rows(csv, converters), 100):
    cur.executemany(stmnt, chunk)
cur.close()

So, in a nutshell...using the 'chunking' code I can now use 1.2.4 and it is
operating faster than the original code was (like you said...it was probably
going line-by-line). If you feel it would be of value to add the 'chunking'
code transparently into the mysqldb executemany function, it seems to me that
could be useful...but I am far from an expert!!

Thanks!
Scott

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

The autocommit benchmark is interesting, but if you only ran each one once, it's not conclusive. If I had to guess... Since you seem to be writing temporary tables, and I'm guessing you're using InnoDB by default, this is probably how it breaks down:

When you INSERT a row into InnoDB, your session gets an exclusive lock on that row. Never mind that it's a temporary table which no other session can see... If you are doing a multi-row INSERT, I believe it's doing a gap lock for the range, i.e. one lock instead of 100. Also multi-row INSERT is faster because there's less client-server communication.

Autocommit might be faster because there are fewer locks being held at any given time (with InnoDB, each statement is a full transaction). I assume you're doing one large commit at the end.

I think it would also be interesting to try some different storage engines for your temporary table. MySQL seems to use whatever your default storage engine is for temporary tables, which I'm guessing is InnoDB, so try MyISAM and, if your server can handle it, MEMORY.

http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-bulk-data-loading.html

from mysqldb1.

firecat53 avatar firecat53 commented on August 19, 2024

Thank you for the education! Lol...although I guess I wasn't clear enough in my initial conditions:

  • I'm already using MyISAM. When I first started this project several years ago, I tried InnoDB, but because of the nature of the database usage, MyISAM made more sense and was faster.
  • I'm not writing to temporary tables...just a regular table named temp :)
  • My server(s) definitely don't have the resources to try the MEMORY storage engine! 1GB system memory....

I've seen the optimizing page in the mysql docs before. Most of it doesn't apply to me, though. These large inserts are done into new tables with no keys assigned until afterwards.

This isn't a public-facing high-usage database, so my needs are pretty minimal compared to most people.

Ultimately...my immediate problem is solved by using the 'chunked' inputs to executemany, so I can now upgrade to 1.2.4. If you'd like me test some more or if you are planning on making changes to mysqldb based on this ticket then we can leave the issue open for now.

Thanks so much for your help!
Scott

Edit: If you did build in 'chunking' into the executemany function, would it be able to automatically detect available memory and packet size and determine the appropriate 'chunk' size?

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

Yes: SHOW VARIABLES LIKE 'max_allowed_packet' will get the packet size. If we want to maximize the size of packets, it might be feasible to keep a running total of the size of the query as it is generated, and then flush it out once the limit is reached, and then start again, rather than to use a fixed chunk size, which would have to be empirically-determined by the application somehow.

from mysqldb1.

eljeffeg avatar eljeffeg commented on August 19, 2024

I seem to be having similar problems with 1.2.4 periodically on Python 2.7. Not sure if this helps:

2013-07-15T16:36:23.667447+00:00 app[web.1]: *** glibc detected *** python: double free or corruption (out): 0x00007fc614146450 ***
2013-07-15T16:36:23.667763+00:00 app[web.1]: [E 130715 16:36:23 torndb:217] Error connecting to MySQL on us-cdbr-azure-east-b.cloudapp.net
2013-07-15T16:36:23.682416+00:00 app[web.1]: *** glibc detected *** python: malloc(): smallbin double linked list corrupted: 0x00007fc614025530 ***
2013-07-15T16:36:23.693491+00:00 app[web.1]: ======= Backtrace: =========
2013-07-15T16:36:23.693491+00:00 app[web.1]: /lib/libc.so.6(+0x78bb6)[0x7fc627c21bb6]
2013-07-15T16:36:23.693491+00:00 app[web.1]: /lib/libc.so.6(cfree+0x73)[0x7fc627c28483]
2013-07-15T16:36:23.693580+00:00 app[web.1]: /usr/lib/libmysqlclient_r.so.16(vio_delete+0x19)[0x7fc6241f7939]
2013-07-15T16:36:23.693671+00:00 app[web.1]: /usr/lib/libmysqlclient_r.so.16(end_server+0x32)[0x7fc6241f3852]
2013-07-15T16:36:23.693671+00:00 app[web.1]: /usr/lib/libmysqlclient_r.so.16(cli_safe_read+0x107)[0x7fc6241f3b97]
2013-07-15T16:36:23.693671+00:00 app[web.1]: /usr/lib/libmysqlclient_r.so.16(+0x84299)[0x7fc6241f4299]
2013-07-15T16:36:23.693764+00:00 app[web.1]: /usr/lib/libmysqlclient_r.so.16(mysql_real_query+0x2c)[0x7fc6241f256c]
2013-07-15T16:36:23.693764+00:00 app[web.1]: /app/.heroku/python/lib/python2.7/site-packages/_mysql.so(+0x5f3b)[0x7fc624587f3b]
2013-07-15T16:36:23.693764+00:00 app[web.1]: python(PyEval_EvalFrameEx+0x51f6)[0x4a9566]
2013-07-15T16:36:23.693764+00:00 app[web.1]: python(PyEval_EvalFrameEx+0x6485)[0x4aa7f5]
2013-07-15T16:36:23.693764+00:00 app[web.1]: ======= Memory map: ========

from mysqldb1.

farcepest avatar farcepest commented on August 19, 2024

Reopen if you have additional information

from mysqldb1.

eljeffeg avatar eljeffeg commented on August 19, 2024

If I recall, I went back to 1.2.3c1 and it fixed it for me. That's what I'm running now.

from mysqldb1.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.