stiivi / bubbles Goto Github PK
View Code? Open in Web Editor NEW[NOT MAINTAINED] Bubbles – Python ETL framework
Home Page: http://bubbles.databrewery.org
License: Other
[NOT MAINTAINED] Bubbles – Python ETL framework
Home Page: http://bubbles.databrewery.org
License: Other
This package still supported?
This package has changed the method parameters?
Examples from articles http://okfnlabs.org/blog/2014/09/01/bubbles-python-etl.html are not working.
From bubbles / doc / operations.rst
The following example code fails:
from bubbles import default_context as c
from bubbles import get_object
source = get_object("csv", "data.csv")
duplicates = c.op.duplicates(source)
$python operations.py
Traceback (most recent call last):
File "operations.py", line 2, in
from bubbles import get_object
ImportError: cannot import name get_object
I have downloaded the latest bubbles source code & installed.
My development machine's environment:
Python 3.3.4
Linux 3.10.32-2-MANJARO x86_64 GNU/Linux
Traceback (most recent call last):
File "hello.py", line 6, in <module>
d = bubbles.data_object("csv_source", resource=URL, infer_fields=True)
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/objects.py", line 37, in data_object
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 94, in __call__
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 107, in create
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/backends/text/objects.py", line 232, in __init__
TypeError: 'infer_fields' is an invalid keyword argument for this function
Traceback (most recent call last):
File "hello_sql.py", line 23, in <module>
p.run()
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/pipeline.py", line 230, in run
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 165, in run
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 24, in evaluate
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/graph.py", line 61, in evaluate
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/objects.py", line 37, in data_object
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 94, in __call__
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 107, in create
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/backends/text/objects.py", line 232, in __init__
TypeError: 'infer_fields' is an invalid keyword argument for this function
Traceback (most recent call last):
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 123, in get
KeyError: 'iterable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "aggregate_over_window.py", line 46, in <module>
p.run()
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/pipeline.py", line 230, in run
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 165, in run
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/engine.py", line 24, in evaluate
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/execution/graph.py", line 61, in evaluate
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/objects.py", line 37, in data_object
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 94, in __call__
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 98, in create
File "/usr/lib/python3.3/site-packages/bubbles-0.2-py3.3.egg/bubbles/extensions.py", line 126, in get
bubbles.errors.InternalError: Unknown extension 'iterable' of type object
Create a quick reference manual (PDF preferably) with:
This seemed to be the most complete Python ETL package. Why don't Python nerds need to ETL?
I have data in a Postgres DB that I'd like to inject into Sqlite using a Pipeline.
source = open_store("sql", 'postgres://....')
#target = open_store("csv", "./data")
target = open_store("sql", 'sqlite:///data/data.sql')
stores = {
'source': source,
'target': target,
}
p = Pipeline(stores=stores)
p.source('source', 'xxx')
p.create('target', 'xxx')
p.run()
I get the following error:
sqlalchemy.exc.CompileError: (in table 'xxx', column 'yyy'): Compiler <sqlalchemy.dialects.sqlite.base.SQLiteTypeCompiler object at 0x102c10910> can't render element of type <class 'sqlalchemy.dialects.postgresql.base.DOUBLE_PRECISION'>
See http://dataprotocols.org/json-table-schema/ for more info
Allow use of parts of compound/indexable field types such as dates and arrays in operations. Example:
p.filter_by_value(FieldPart("event_date", "year"), 2013)
Advantages:
Disadvantages:
Requirements:
Field.is_composed()
- True
for date, array, recordDataObject.concrete_field(field_or_part)
Affected methods:
prepare_aggregation_measures()
prepare_key()
Recommendation: have this in OperationContext
when argument annotations or when operation prototype metadata are implemented.
Python 2.7 - 64bit - Windows
In [1]: import bubbles
File "C:\Anaconda\lib\site-packages\bubbles\core.py", line 222
def operation(*signature, name=None):
^
SyntaxError: invalid syntax
[x] Bug (Typo)
sucessfully
, however expect to see successfully
.successfuly
, however expect to see successfully
.sucessful
, however expect to see successful
.sequentialy
, however expect to see sequentially
.regardles
, however expect to see regardless
.refereced
, however expect to see referenced
.properely
, however expect to see properly
.prerequisities
, however expect to see prerequisites
.preferrence
, however expect to see preference
.oustanding
, however expect to see outstanding
.objcects
, however expect to see objects
.multiople
, however expect to see multiple
.heterogenous
, however expect to see heterogeneous
.consutrcuted
, however expect to see constructed
.arithmentic
, however expect to see arithmetic
.agains
, however expect to see against
.actualy
, however expect to see actually
.acessed
, however expect to see accessed
.Semi-automated issue generated by
https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md
To avoid wasting CI processing resources a branch with the fix has been
prepared but a pull request has not yet been created. A pull request fixing
the issue can be prepared from the link below, feel free to create it or
request @timgates42 create the PR. Alternatively if the fix is undesired please
close the issue with a small comment about the reasoning.
https://github.com/timgates42/bubbles/pull/new/bugfix_typos
Thanks.
join_details
should accept no keys. Columns with same names from both objects should be used as keys.
Variation: only one column with same name is used, if more than one is found, an exception is raised.
Operations that are composed of other operations have no mechanisms to deal with consumable objects as the Pipeline and ExecutionEngine does. If an object to be consumed multiple times, the operation just eats it and then provides faulty result.
Suggestion: move handling of consumables into context and use context.retain(obj, retain_times=1)
Operations such as string_to_date
should use format as SQL databases use (see PostgreSQL for example).
Reason: more human readable than the strptime()
format with %
's
Note that the SQL format is a bit richer and might have different first element indexes (1 vs 0) in some cases.
This:
p.sort([["firstname", "asc"]])
looks unintuitive, despite being correct.
Allow:
p.sort("firstname")
This would be nice to have, but should not be allowed as it is ambiguous:
p.sort(["firstname", "asc"])
Does it mean to sort firstname
ascending or it means sort by fields firstname
and asc
in default (ascending) order?
Curent behavior:
Example situation: a operation PARENT is composed of other CHILD operations
RetryOperation
RetryOperation
and retries CHILDPARENT gets never completed.
Expected behavior:
Have a context stack for operation calls.
RetryOperation
RetryOperation
and retries CHILDRunning bubbles / examples / hello.py fails because
1)
p.source(bubbles.data_object("csv_source", URL, infer_fields=True)) is missing a name parameter. Should be
p.source(bubbles.data_object("csv_source", URL, infer_fields=True),"foo-name")
Furthermore, the example doesn't seem to produce any output. What is supposed to happen when p.pretty_print() executes?
OS X Mountain Lion, Python 2.7 with Virtualenvwrapper: Installation of bubbles fails due to syntax errors:
pip install bubbles
Downloading/unpacking bubbles
Downloading bubbles-0.1.tar.gz (40kB): 40kB downloaded
Running setup.py egg_info for package bubbles
Installing collected packages: bubbles
Running setup.py install for bubbles
SyntaxError: ('invalid syntax', ('/Users/peder/Envs/cubes/lib/python2.7/site-packages/bubbles/core.py', 222, 30, 'def operation(*signature, name=None):\n'))
Successfully installed bubbles
Cleaning up...
(cubes)peder@garros:/source/crs-cubes$ pip freeze local/source/crs-cubes$ python
Cython==0.19.1
Flask==0.9
Jinja2==2.6
MarkupSafe==0.18
SQLAlchemy==0.7.9
Werkzeug==0.8.3
argparse==1.2.1
bubbles==0.1
chardet==2.1.1
csvkit==0.6.1
cubes==0.10.2post1
dbf==0.95.004
itsdangerous==0.23
json-table-schema==0.1
line-profiler==1.0b3
lxml==3.2.3
messytables==0.12.0
openpyxl==1.6.2
python-dateutil==1.5
python-magic==0.4.3
six==1.4.1
wsgiref==0.1.2
xlrd==0.9.2
(cubes)peder@garros:
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
import bubbles
Traceback (most recent call last):
File "", line 1, in
File "/Users/peder/Envs/cubes/lib/python2.7/site-packages/bubbles/init.py", line 4, in
from .core import *
File "/Users/peder/Envs/cubes/lib/python2.7/site-packages/bubbles/core.py", line 222
def operation(*signature, name=None):
^SyntaxError: invalid syntax
@Stiivi I've been trying do the sql example, but always I run the program show this message: bubbles.errors.OperationError: Unable to find operation source_object
. My bubbles version is 0.1 and SQLAlchemy 1.0.14.
Can you help me or show me the error code?
I want to use bubbles to migrate a legacy database (mssql) to new database (mysql) at my work company.
import bubbles
URL = "https://raw.github.com/Stiivi/cubes/master/examples/hello_world/data.csv"
stores = {
"target": bubbles.open_store("sql", "sqlite:///")
}
p = bubbles.Pipeline(stores=stores)
p.source_object("csv_source", resource=URL, encoding="utf8")
p.retype({"Amount (US$, Millions)": "integer"})
p.create("target", "data")
p.aggregate("Category", "Amount (US$, Millions)")
p.pretty_print()
p.run()
Thanks.
Define consumable retention policy. Currently the retention is expected to be provided by the object, which is in most of the cases sub-optimal such as consuming all data into list of Python objects. Also this implementation is not aware of context in which the node is being executed. Suggestion:
ExecutionEngine
subclasses might insert retention/caching nodes after consumables. Advantage: simple implementation, aware of broader processing context. Disadvantage: might not be backend aware.retain(object, times)
Advantage: backend aware. Disadvantage: not aware of the broader processing context.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.