http://pastebin.com/J4eGcWDC

There is also another bug where state is incorrectly reset in the <code class="notrans

I note that: <div class="highlight highlight-source-python notranslate position-re

FWIW I agree with <a class="user-mention notranslate" data-hovercard-type="user" data-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

doubleflush bug about scraperwiki-python HOT 11 CLOSED

sensiblecodeio commented on July 22, 2024

doubleflush bug

from scraperwiki-python.

Comments (11)

pwaller commented on July 22, 2024

Nice sleuthing. Spotted a couple of bugs as a result.

Boiled the test case down to this:

#!/usr/bin/env python
import scraperwiki

data = [{'rowx': None} for x in range(10000)]

try:
    scraperwiki.sql.save(['rowx'], data)
except Exception as e:
    print "Masked exception:", repr(e)

scraperwiki.sql.save(['rowx'], data)

from scraperwiki-python.

pwaller commented on July 22, 2024

I'm not yet sure what to do in this case. What's happening is that the first save fails due to an exception (because the user passed bad data). The user masks this, then goes on to try another save, which fails.

If I were writing the code I wouldn't a) pass bad data or b) mask the exception, i.e, I would want it to blow up.

The problem is that the user is doing something which causes save() to explode but they expect it to be non-fatal. By buffering the save and replaying it at a later date, we cause the explosion to happen elsewhere, which isn't optimal.

I think the solution is to sanity-check the input in append before buffering it.

from scraperwiki-python.

pwaller commented on July 22, 2024

There is also another bug where state is incorrectly reset in the flush() which needs to be preserved during a long append(), which I'll submit a pull request for.

from scraperwiki-python.

pwaller commented on July 22, 2024

I note that:

scraperwiki.sql.save(['rowx'], {})

is fine, but:

scraperwiki.sql.save(['rowx'], [{}])

raises ValueError: You passed no sample values, or all the values you passed were null..

It feels like it violates a generalisation principle that passing no data isn't a NOOP. But I don't know what the user might expect.

It feels pretty evil that dumptruck deletes keys with None as the value. What if you want to set a column to null? Then after having deleted them it explodes if nothing is left.

/cc @IanHopkinson @StevenMaude @scraperdragon

from scraperwiki-python.

scraperdragon commented on July 22, 2024

I think the solution is to sanity-check the input in append before buffering it.

Agreed, but not sure entirely what the input's requirements are. There may be some odd edge cases (dates spring to mind).

It feels like it violates a generalisation principle that passing no data isn't a NOOP

Agreed.

It feels pretty evil that dumptruck deletes keys with None as the value.

Agreed. I think this is an artifact of the way the dataproxy used to work / was intended to work, where:

scraperwiki.sql.save({'my_id':5, 'animal': 'dog'})
scraperwiki.sql.save({'my_id':5, 'pet_name': 'Rover'})

would lead to a row of `{'my_id':5, 'animal': 'dog', 'pet_name': 'Rover'}).
However, I don't remember this ever working.

from scraperwiki-python.

pwaller commented on July 22, 2024

I'm happy with the behaviour you just stated, that's how it should work. However what it does it takes:

scraperwiki.sql.save([], {'my_id':5, 'animal': 'dog'})
scraperwiki.sql.save([], {'my_id':5, 'pet_name': 'Rover', 'animal': None})

And where I would expect animal to be nulled, it just ignores the animal field and has the effect of the code from your example, it's as if you had run -

scraperwiki.sql.save([], {'my_id':5, 'animal': 'dog'})
scraperwiki.sql.save([], {'my_id':5, 'pet_name': 'Rover'})

from scraperwiki-python.

pwaller commented on July 22, 2024

We still have to somehow deal with the case that real_save legitimately explodes (someone just yanked the hard-disk, etc) and the user "legitimately" does a catch all and then proceeds to later try writing again. In that case I think we have to scream a big warning that there was an error whilst flushing the buffer, and that accept that that error might jump into the fray at any later point.

Ultimately this is all just hacking around the fact that dumptruck is (extremely) suboptimal for doing individual row insertions. If @sean-duffy or someone attacks the core of the problem by rewriting that logic in sqlalchemy then it should be possible to make it so that .save() is as cheap as an INSERT. I guess some time will have to be spent considering the transaction semantics of that, however.

from scraperwiki-python.

pwaller commented on July 22, 2024

(From my last point I conclude that we may wish to indefinitely defer the buffering question entirely in favour of the rewrite to use sqlalchemy - getting this right could be very hard and might not give a well behaved solution in all circumstances, whereas fundamentally fixing the backend probably will result in obviously correct behaviour).

from scraperwiki-python.

drj11 commented on July 22, 2024

FWIW I agree with @scraperdragon. The thing about passing (key,None) pairs is just wrong.

The thing about exceptions is that it's an exception. All bets are off. You should not reasonably expect any future .save()s to work (though @pwaller is right, we could make it so that this case was more clearly diagnosed).

A rewrite in sqlalchemy is clearly a good idea, but so is doing a million other things. (PRs accepted).

from scraperwiki-python.

sean-duffy commented on July 22, 2024

@drj11 A sqlalchemy rewrite of dumptruck is what I'm currently doing:
https://github.com/scraperwiki/dumptruck/tree/sqlalchemy-rewrite

from scraperwiki-python.

drj11 commented on July 22, 2024

Almost all irrelevant now we have the SQLAlchemy rewrite

from scraperwiki-python.

doubleflush bug about scraperwiki-python HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent