Comments (4)
Please consider adding a section that compares bonobo to PETL.
from bonobo.
Yes, comparisons to other tools are planned.
In the list (feel free to complete it) :
- airflow
- bubbles
- dataiku
- dataprep
- dask
- hadoop and ecosystem
- luigi
- pandas
- pentaho
- petl
- pygrametl
- pypes
- pytoolz
- talend
- ...
If some expert on any of those tools is available to help me make the more honest comparison possible, it'd be amazing.
from bonobo.
ciao, bonobo might be something that i need as a pythonic replacement of xslt, thus i consulted the docs to get a grip of it. i didn't find out whether it fits, but i found some questions that would help me to figure it out. maybe that helps you when you update the docs (which i would strongly suggest as the library looks promising, but it's hard to judge if it'd be suited for a task.)
- what exact facilities are available to control the evaluation logic of a graph?
- can a graph contain another graph?
- how would one access contextual data from a transformation?
- are there parameter injections like pytest's fixtures?
- are there yet any concepts how to process trees, like xml?
- how is a plugin distinguished from a python import in a module that contains transformation callables?
on a sidenote, what the heck is marketing-automation? how would that make the world a better place?
from bonobo.
Hi @funkyfuture
Not easy to understand what you're looking for. You're saying "pythonic replacement of xslt", and bonobo can transform xml into something else (or into another xml). Which sounds like what you say, but not certain about your use case and whether or not it would be an idea worth considering.
I'll try to answer your questions here, even if this would maybe suit more a discussion on slack than comments in another ticket. I'll consider your questions for a future F.A.Q. section in the doc (along with others, of course)
What exact facilities are available to control the evaluation logic of a graph?
This question I don't understand. Graph are not "evaluated" but are a tool to define the flow of data. Nodes in a graph are linked directionally, and there are FIFO queues between output of a node and input of the next, when the graph is executed (those queues are only created by the executor, and thus executions are isolated). Feel free to explain what you meant in different words if I did not answer.
Can a graph contain another graph?
There are no tools today in bonobo to insert a graph as a subgraph. It would be great to allow so, but there is a few design questions behind this, like what node you use as input and output of the subgraph, etc. Probably something that will come way after 1.0.
How would one access contextual data from a transformation? / are there parameter injections like pytest's fixtures?
You have the question and the answer here. You have parameter injections like pytest fixtures, and it is the way to go to access contextual data in a transformation. The API may evolve a bit though, because I feel it's a bit hackish, as it is. I mean, it's the right concept, but the exact syntax used make me feel it's not the best experience we can have. To understand how it works today, look at https://github.com/python-bonobo/bonobo/blob/0.2/bonobo/io/csv.py#L63 and class hierarchy.
Are there yet any concepts how to process trees, like xml?
There was the "xml mapper" in bonobo ancestor that had a bit of logic to explain how to go from a xml "blob" to lines of data (cf https://github.com/hartym/rdc.etl/blob/dev/rdc/etl/transform/map/xml.py). It's not exactly "tree processing", but as an ETL is a line-by-line processor, you need to be able to transform your tree in something more flat, and there may be a lot of different options to do so. Think depth first, width first, skip items or not, preprocess depending on type, etc. It may be better to just write your flattening logic in a function, then process it with regular tools as it's not a tree anymore.
How is a plugin distinguished from a python import in a module that contains transformation callables?
Transformation callables are just regular callables, and there is nothing that differentiate it from regular python callables. You can even use some callables both in an imperative programming context and in a transformation graph, no problem. Plugins in bonobo is a different concept that allows one to "enhance" executions in a generic way. For example, the console plugin enhance execution with a nice ANSI output that displays statistics while the execution is running (https://github.com/python-bonobo/bonobo/blob/0.2/bonobo/ext/console/plugin.py). I'd say, no need to think about this for standard ETL cases, it's more a way to extend the framework in itself than userland.
On a sidenote, what the heck is marketing-automation? how would that make the world a better place?
It is tagged as such because I have use cases where I use bonobo for marketing automation. It's probably a derivative usage and not the main point, but I guess there is such a use case (think IFTTT or Zappier, but programmatic).
Bonobo never promised to "make the world a better place", but I'd say it's a good thing for you if you're wasting time on repetitive marketing tasks and bonobo helps you automate it. My own sidenote: I don't understand why people tend to think marketing is a bad thing.
I hope it answers your questions, if not, let's have a chat on slack so I better understand your points.
from bonobo.
Related Issues (20)
- Documentation missing - Not clear how to check if there were any errors in transformation nodes HOT 1
- Allow packaging==20.3 HOT 6
- DEBUG cannot be enabled by env var, even though it is said to be possible HOT 3
- resolve start and end of named chain are identical as _input and _output HOT 4
- Filter Node error HOT 2
- in the absence of docs - need some real world, readable examples HOT 1
- How to configure logging in bonobo to always log to stdout
- incompatibility with pip==20.x
- bonobo_sqlalchemy.Select remove easy limit for oracle HOT 1
- Graceful termination hangs when filtering data
- AttributeError: Graph and get_cursor() HOT 1
- Documentation example broken HOT 3
- CsvReader does not accept files in another directory HOT 2
- CsvReader does not accept absolute path names in Windows HOT 7
- No output in docker container
- [FR] loop (sub-)chains HOT 1
- Example for using the RateLimited feature?
- processors.py need to be updated for Python 3.10+ for collections HOT 7
- Bug: Could not load 'init': cannot import name 'soft_unicode' from 'markupsafe' HOT 2
- Is the project being abondoned? HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bonobo.