stiivi / bubbles Goto Github PK

[NOT MAINTAINED] Bubbles – Python ETL framework

Home Page: http://bubbles.databrewery.org

License: Other

Python 99.91% Shell 0.09%

bubbles's Issues

This package still supported?

This package still supported?
This package has changed the method parameters?
Examples from articles http://okfnlabs.org/blog/2014/09/01/bubbles-python-etl.html are not working.

Import in example code fail

From bubbles / doc / operations.rst

The following example code fails:

from bubbles import default_context as c
from bubbles import get_object

source = get_object("csv", "data.csv")
duplicates = c.op.duplicates(source)

$python operations.py
Traceback (most recent call last):
File "operations.py", line 2, in
from bubbles import get_object
ImportError: cannot import name get_object

Link to project page http://bubbles.databrewery.org is broken.

Examples not working

I have downloaded the latest bubbles source code & installed.

My development machine's environment:

Python 3.3.4
Linux 3.10.32-2-MANJARO x86_64 GNU/Linux

1. Running bubbles / examples / hello.py produces the following error:

Traceback (most recent call last):
  File "hello.py", line 6, in <module>
    d = bubbles.data_object("csv_source", resource=URL, infer_fields=True)
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/objects.py", line 37, in data_object
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 94, in __call__
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 107, in create
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/backends/text/objects.py", line 232, in __init__
TypeError: 'infer_fields' is an invalid keyword argument for this function

2. Running bubbles / examples / hello_sql.py produces the following error:

Traceback (most recent call last):
  File "hello_sql.py", line 23, in <module>
    p.run()
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/pipeline.py", line 230, in run
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 165, in run
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 24, in evaluate
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/graph.py", line 61, in evaluate
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/objects.py", line 37, in data_object
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 94, in __call__
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 107, in create
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/backends/text/objects.py", line 232, in __init__
TypeError: 'infer_fields' is an invalid keyword argument for this function

3. Running bubbles / examples / aggregate_over_window.py produces the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 123, in get
KeyError: 'iterable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "aggregate_over_window.py", line 46, in <module>
    p.run()
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/pipeline.py", line 230, in run
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 165, in run
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 24, in evaluate
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/graph.py", line 61, in evaluate
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/objects.py", line 37, in data_object
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 94, in __call__
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 98, in create
  File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 126, in get
bubbles.errors.InternalError: Unknown extension 'iterable' of type object

Create Quick Reference manual

Create a quick reference manual (PDF preferably) with:

list of operations
list of stores
list of object types and their representations

Darn

This seemed to be the most complete Python ETL package. Why don't Python nerds need to ETL?

Error trying to convert from PostgreSQL to Sqlite

I have data in a Postgres DB that I'd like to inject into Sqlite using a Pipeline.

source = open_store("sql", 'postgres://....')
#target = open_store("csv", "./data")
target = open_store("sql", 'sqlite:///data/data.sql')

stores = {
  'source': source,
  'target': target,
}
p = Pipeline(stores=stores)
p.source('source', 'xxx')
p.create('target', 'xxx')
p.run()

I get the following error:

sqlalchemy.exc.CompileError: (in table 'xxx', column 'yyy'): Compiler <sqlalchemy.dialects.sqlite.base.SQLiteTypeCompiler object at 0x102c10910> can't render element of type <class 'sqlalchemy.dialects.postgresql.base.DOUBLE_PRECISION'>

Support for JSON table schema

See http://dataprotocols.org/json-table-schema/ for more info

Field part reference for compound field types

Allow use of parts of compound/indexable field types such as dates and arrays in operations. Example:

p.filter_by_value(FieldPart("event_date", "year"), 2013)

Advantages:

less steps, no need to explicit extraction
better readability

Disadvantages:

more requirements for implementing backend operations
operations might implement this selectively, depending on argument, which might cause inconsistencies

Requirements:

Field.is_composed() - True for date, array, record
DataObject.concrete_field(field_or_part)

Affected methods:

prepare_aggregation_measures()
prepare_key()
many operations

Recommendation: have this in OperationContext when argument annotations or when operation prototype metadata are implemented.

import bubbles fails

Python 2.7 - 64bit - Windows

In [1]: import bubbles
File "C:\Anaconda\lib\site-packages\bubbles\core.py", line 222
def operation(*signature, name=None):
^
SyntaxError: invalid syntax

Proposing a PR to fix a few small typos

Issue Type

[x] Bug (Typo)

Steps to Replicate and Expected Behaviour

Examine bubbles/execution/context.py and observe sucessfully, however expect to see successfully.
Examine bubbles/execution/pipeline.py and observe successfuly, however expect to see successfully.
Examine bubbles/execution/pipeline.py and observe sucessful, however expect to see successful.
Examine bubbles/backends/sql/ops.py and observe sequentialy, however expect to see sequentially.
Examine bubbles/execution/context.py and observe regardles, however expect to see regardless.
Examine bubbles/execution/pipeline.py and observe refereced, however expect to see referenced.
Examine bubbles/expression.py and observe properely, however expect to see properly.
Examine bubbles/execution/pipeline.py and observe prerequisities, however expect to see prerequisites.
Examine bubbles/backends/sql/objects.py and observe preferrence, however expect to see preference.
Examine bubbles/objects.py and observe oustanding, however expect to see outstanding.
Examine bubbles/dev.py and observe objcects, however expect to see objects.
Examine bubbles/operation.py and observe multiople, however expect to see multiple.
Examine doc/introduction.rst and observe heterogenous, however expect to see heterogeneous.
Examine bubbles/metadata.py and observe consutrcuted, however expect to see constructed.
Examine bubbles/metadata.py and observe arithmentic, however expect to see arithmetic.
Examine bubbles/ops/rows.py and observe agains, however expect to see against.
Examine bubbles/objects.py and observe actualy, however expect to see actually.
Examine bubbles/objects.py and observe acessed, however expect to see accessed.

Notes

Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.

https://github.com/timgates42/bubbles/pull/new/bugfix_typos

Thanks.

join_details should use same column names if no keys are specified

join_details should accept no keys. Columns with same names from both objects should be used as keys.

Variation: only one column with same name is used, if more than one is found, an exception is raised.

Composed operations have no way of dealing with consumables

Operations that are composed of other operations have no mechanisms to deal with consumable objects as the Pipeline and ExecutionEngine does. If an object to be consumed multiple times, the operation just eats it and then provides faulty result.

Suggestion: move handling of consumables into context and use context.retain(obj, retain_times=1)

Date to/from string conversions should use SQL format

Operations such as string_to_date should use format as SQL databases use (see PostgreSQL for example).

Reason: more human readable than the strptime() format with %'s

Note that the SQL format is a bit richer and might have different first element indexes (1 vs 0) in some cases.

Simplify handling of tuple list (ordering, aggregations)

This:

p.sort([["firstname", "asc"]])

looks unintuitive, despite being correct.

Allow:

p.sort("firstname")

This would be nice to have, but should not be allowed as it is ambiguous:

p.sort(["firstname", "asc"])

Does it mean to sort firstname ascending or it means sort by fields firstname and asc in default (ascending) order?

Retry in nested operation should not replace parent

Curent behavior:

Example situation: a operation PARENT is composed of other CHILD operations

Operation PARENT is called
PARENT calls CHILD
CHILD raises RetryOperation
Context catches the RetryOperation and retries CHILD
Context returns after finishing the CHILD instead of PARENT

PARENT gets never completed.

Expected behavior:

Have a context stack for operation calls.

Operation PARENT is called within PARENT context.
PARENT calls CHILD
New context for CHILD is created
CHILD raises RetryOperation
CHILD context catches the RetryOperation and retries CHILD
CHILD context returns to PARENT
PARENT continues and returns to PARENT context

Example code not working

Running bubbles / examples / hello.py fails because
1)
p.source(bubbles.data_object("csv_source", URL, infer_fields=True)) is missing a name parameter. Should be

p.source(bubbles.data_object("csv_source", URL, infer_fields=True),"foo-name")

Furthermore, the example doesn't seem to produce any output. What is supposed to happen when p.pretty_print() executes?

Pip install bubbles fails

OS X Mountain Lion, Python 2.7 with Virtualenvwrapper: Installation of bubbles fails due to syntax errors:

pip install bubbles
Downloading/unpacking bubbles
Downloading bubbles-0.1.tar.gz (40kB): 40kB downloaded
Running setup.py egg_info for package bubbles

Installing collected packages: bubbles
Running setup.py install for bubbles
SyntaxError: ('invalid syntax', ('/Users/peder/Envs/cubes/lib/python2.7/site-packages/bubbles/core.py', 222, 30, 'def operation(*signature, name=None):\n'))

Successfully installed bubbles
Cleaning up...
(cubes)peder@garros:/source/crs-cubes$ pip freeze local
Cython==0.19.1
Flask==0.9
Jinja2==2.6
MarkupSafe==0.18
SQLAlchemy==0.7.9
Werkzeug==0.8.3
argparse==1.2.1
bubbles==0.1
chardet==2.1.1
csvkit==0.6.1
cubes==0.10.2post1
dbf==0.95.004
itsdangerous==0.23
json-table-schema==0.1
line-profiler==1.0b3
lxml==3.2.3
messytables==0.12.0
openpyxl==1.6.2
python-dateutil==1.5
python-magic==0.4.3
six==1.4.1
wsgiref==0.1.2
xlrd==0.9.2
(cubes)peder@garros:/source/crs-cubes$ python
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import bubbles
Traceback (most recent call last):
File "", line 1, in
File "/Users/peder/Envs/cubes/lib/python2.7/site-packages/bubbles/init.py", line 4, in
from .core import *
File "/Users/peder/Envs/cubes/lib/python2.7/site-packages/bubbles/core.py", line 222
def operation(*signature, name=None):
^

SyntaxError: invalid syntax

Unable to find operation source_object

@Stiivi I've been trying do the sql example, but always I run the program show this message: bubbles.errors.OperationError: Unable to find operation source_object. My bubbles version is 0.1 and SQLAlchemy 1.0.14.

Can you help me or show me the error code?

I want to use bubbles to migrate a legacy database (mssql) to new database (mysql) at my work company.

import bubbles


URL = "https://raw.github.com/Stiivi/cubes/master/examples/hello_world/data.csv"

stores = {
    "target": bubbles.open_store("sql", "sqlite:///")
}

p = bubbles.Pipeline(stores=stores)
p.source_object("csv_source", resource=URL, encoding="utf8")
p.retype({"Amount (US$, Millions)": "integer"})

p.create("target", "data")

p.aggregate("Category", "Amount (US$, Millions)")
p.pretty_print()
p.run()

Thanks.

Consumable retention policy

Define consumable retention policy. Currently the retention is expected to be provided by the object, which is in most of the cases sub-optimal such as consuming all data into list of Python objects. Also this implementation is not aware of context in which the node is being executed. Suggestion:

ExecutionEngine subclasses might insert retention/caching nodes after consumables. Advantage: simple implementation, aware of broader processing context. Disadvantage: might not be backend aware.
Retention operations: retain(object, times) Advantage: backend aware. Disadvantage: not aware of the broader processing context.

stiivi / bubbles Goto Github PK

bubbles's Issues

1. Running bubbles / examples / hello.py produces the following error:

2. Running bubbles / examples / hello_sql.py produces the following error:

3. Running bubbles / examples / aggregate_over_window.py produces the following error:

Issue Type

Steps to Replicate and Expected Behaviour

Notes

SyntaxError: invalid syntax

Recommend Projects

Recommend Topics

Recommend Org