Comments (4)
I would help with this if there is interest. The purpose of Hadoopy isn't to recreate this functionality, it is to create a thin core python interface for streaming. I use whirr and oozie for cluster and job management respectively (Hadoopy is designed to be compatible with these tools). I can see more casual users not wanting to use these more powerful but complex tools, opting for a more integrated approach.
There are a few things we need to take into account.
- Practically, I'd need to relicense my code so that it is compatible (David and Andrew are the only other contributors). This shouldn't be a problem and I'd be willing to do that (I'd most likely dual license it).
- Should it be part of dumbo, optional, or a separate fork? I think the cleanest solution is that dumbo can optionally use Hadoopy as a backend if it is available.
- Backwards compatibility is going to be an important focus. I'd want to find a diverse set of Dumbo users to work with us running legacy code. Unit tests can help here.
from dumbo.
I'd definitely be interested and I'd be happy to review code or help out with figuring out how to hook things up or so. As I'm pretty busy these days I probably won't be able to help with the actual coding though, but it looks like we might already have enough manpower to get something done I guess. So bring on the code -- I look forward to having a look at it and trying it out.. :)
from dumbo.
Okay, this sounds like something worth pursuing. (At least, I would really like it. I had to switch back to dumbo for some last minute tests in a paper recently because I needed some of the libegg/libjar/etc. features.)
One question: Would you need to dual license it if dumbo just used it as a black-box backend? (I am not up to speed on how python's "import" acts with respect to licenses.) I agree that this is the cleanest approach.
from dumbo.
Not sure about the licensing either, but surely we could figure something out...
from dumbo.
Related Issues (20)
- Add access to filepath in MultiMapper
- Crash if mapper or reducer does not yield anything HOT 1
- dumbo cat can be slow in case of many part files
- Implement params access via global variable like os.environ
- JoinReducer/JoinCombiner to allow full outer join HOT 2
- MultiMapper fails with single-parameter mappers HOT 1
- MultiMapper does not support cleanup functionality
- tunnel/proxy HOT 2
- cdh4, centos 6.3, cannot get simple dumbo job to run. HOT 1
- " -file option is deprecated, please use generic option -files instead." HOT 1
- Support for SequenceFiles in local runs HOT 1
- Integration Amazon EMR
- Reading text as typedbytes affects lines with encoding other than utf8
- memlimit enabled by default
- Custom Input File Formats
- Set reducer‘s numbers failed
- The -fake option does not work as described when using Job.run()
- links in README are broken
- installation problem: could not find typedbytes HOT 1
- website is down
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dumbo.