Comments (4)
I can not understand what you mean。
from bigflow.
i mean, in addition to sum(), count(), could bigflow support mean()/variance() and other popular statistical function for PCollection ?
from bigflow.
Actually, you can use:
def mean(p):
return p.sum() / p.count()
# this is a sugar for p.sum().map(lambda s, c: s / c, p.count())
to implement mean
in one line.
then, you can use them in apply_values
,
e.g.
p.group_by_key()\
.apply_values(mean)
At the same time, if you want to use it to a global pcollection, you can just use apply
:
p.apply(mean)
or just call it directly:
mean(p)
Because it's easy to implement these functions, so we don't regard them as built-in methods.
If you find it difficult to write these functions, you can always use transforms.make_tuple(pobject1, pobject2)
.
E.g. You can use transforms.make_tuple
to implement mean
like this:
def mean(p):
return transforms.make_tuple(p.sum(), p.count()).map(lambda (s, c): s/c)
And you can implement a method to get both sum and mean, and use them in apply_values
like this:
def sum_and_mean(p):
return transforms.make_tuple(p.sum(), p.apply(mean))
p.group_by_key().apply_values(sum_and_mean)
from bigflow.
I think there should be a module to provide available or useful functions.
from bigflow.
Related Issues (20)
- Enable travis' build matrix
- Hive Support HOT 2
- Compiling error due to mvn HOT 5
- readline install failed HOT 2
- Maybe we should build Bigflow to a python egg
- Online demo isn't working HOT 3
- Translate readme into English HOT 1
- urllib2.URLError HOT 3
- Is it possible to run sklearn on bigflow? HOT 1
- bigflow用来文本挖掘可以提高速度吗? HOT 1
- build failed HOT 3
- 关于 bigflow.transforms.join 的一些疑问 HOT 1
- manage byproducts HOT 4
- Could bigflow support either SQL or a familiar DataFrame API to query structured data. HOT 3
- Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:compile (default-compile) on project flume-runtime: Compilation failure HOT 1
- A type in the introduction HOT 1
- 会有python3版本的吗 HOT 1
- 在线试用网页 挂掉了?
- 又一个烂尾的开源项目
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigflow.