Comments (2)
这个实际上是个 SQL 或者用 DataFrame 很容易解决的问题。方法就是用窗口函数,我以 DataFrame 为例。
首先,你肯定是有一列表示时间的。我们假设叫 dt
。
In [18]: df
a b dt
0 a 1 2017-12-20 10:20:30
1 a 1 2017-12-20 10:21:11
2 a 1 2017-12-20 11:01:22
3 b 2 2017-12-11 02:06:11
4 b 2 2017-12-10 03:11:55
In [19]: df.dtypes
Out[19]:
odps.Schema {
a string
b int64
dt datetime
}
In [28]: df2 = df[df, df.groupby('a', 'b').sort('dt', ascending=False).rank().r
...: ename('rank')]
In [30]: df2
a b dt rank
0 a 1 2017-12-20 11:01:22 1
1 a 1 2017-12-20 10:21:11 2
2 a 1 2017-12-20 10:20:30 3
3 b 2 2017-12-11 02:06:11 1
4 b 2 2017-12-10 03:11:55 2
In [31]: df2[df2.rank == 1][df]
a b dt
0 a 1 2017-12-20 11:01:22
1 b 2 2017-12-11 02:06:11
由于 MaxCompute 不能在过滤中用窗口函数,因此我们先创建 df2
,也就是追加了一列,然后再过滤,最终再选择 df 的字段,也就是去掉 rank
列。
Update:
PyODPS 窗口函数文档:http://pyodps.readthedocs.io/zh_CN/latest/df-window-zh.html
from aliyun-odps-python-sdk.
哇,谢谢!
from aliyun-odps-python-sdk.
Related Issues (20)
- OSError: [Errno 22] Invalid argument HOT 4
- udtf中无法读取zip资源 HOT 1
- 导入ODPS时发生错误OSError: [Errno 22] Invalid argument HOT 2
- 写数据时报错 AttributeError: 'memoryview' object has no attribute 'encode' HOT 1
- http.client.ResponseNotReady raised in 0.11.4 HOT 4
- bug in pyodps join
- 使用pyodps-pack打包第三方包的时候,加--exclude参数就会报错 HOT 2
- 本地使用pyodps包会报错:TypeError: Object of type bytes is not JSON serializable HOT 5
- Support sqlalchemy version 2 HOT 2
- 如何利用非docker模式打包依赖包 HOT 2
- 使用table.open_reader读到partition字段为None HOT 3
- 如何更新字段注释 HOT 1
- py3.11
- superset 3.x integration error HOT 2
- "decimal.InvalidOperation": 写入到本地mysql 数据表失败 HOT 1
- DataFrame.persist fail in Python3.9 HOT 2
- 什么时候能支持python3.12呀
- Could not pip install pyodps HOT 1
- AttributeError: `np.float_` was removed in the NumPy 2.0 release. Use `np.float64` instead. HOT 3
- _odps_type_to_sqlalchemy_type 类型转换出错 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aliyun-odps-python-sdk.