aliyun / aliyun-odps-python-sdk Goto Github PK
View Code? Open in Web Editor NEWODPS Python SDK and data analysis framework
Home Page: http://pyodps.readthedocs.io
License: Apache License 2.0
ODPS Python SDK and data analysis framework
Home Page: http://pyodps.readthedocs.io
License: Apache License 2.0
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-KyLmcD/pyodps/
像是环境问题,同一份代码在另一台机器上运行正常,这台机器python3.5/2.7.13试过都有这问题
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/utils.py", line 792, in check_header_validity
if not pat.match(value):
TypeError: expected string or bytes-like object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./test_aliyun_mc.py", line 55, in
UploadData()
File "./test_aliyun_mc.py", line 18, in UploadData
upload_session = tunnel.create_upload_session(table.name) #, partition_spec='pt=test')
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/tunnel/tabletunnel/tabletunnel.py", line 92, in create_upload_session
compress_option=compress_option)
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/tunnel/tabletunnel/uploadsession.py", line 60, in init
self._init()
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/tunnel/tabletunnel/uploadsession.py", line 74, in _init
resp = self._client.post(url, {}, params=params, headers=headers)
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/rest.py", line 115, in post
return self.request(url, 'post', data=data, **kwargs)
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/rest.py", line 94, in request
prepared_req = req.prepare()
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/models.py", line 257, in prepare
hooks=self.hooks,
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/models.py", line 303, in prepare
self.prepare_headers(headers)
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/models.py", line 427, in prepare_headers
check_header_validity(header)
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/utils.py", line 796, in check_header_validity
"not %s" % (value, type(value)))
requests.exceptions.InvalidHeader: Header value 0 must be of type str or bytes, not <class 'int'>
options也没有notebook_repr_widget参数
ODPSError: InstanceId: 20171221102131257g9c5vlu
ODPS-0130071:[0,0] Semantic analysis exception - INT type is not enabled in current mode
ODPS-0130071:[0,0] Semantic analysis exception - INT type is not enabled in current mode
SQL结果数据限制10000条,有什么好办法全部拿到结果数据
如题
谢谢
同一个程序在linux上运行正常,但在win10上经常在这出现异常
record = table.new_record([。。。。])
writer.write(record)
Traceback (most recent call last):
File ".\test_aliyun_mc.py", line 71, in
UploadData()
File ".\test_aliyun_mc.py", line 57, in UploadData
skipped += 1
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 234, in exit
self.close()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 276, in close
super(RecordWriter, self).close()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 227, in close
super(BaseRecordWriter, self).close()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 67, in close
self.flush_all()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 70, in flush_all
self.flush()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 62, in flush
self.output.write(data)
File "F:\Python27\lib\site-packages\odps\tunnel\io\stream.py", line 82, in write
raise_exc(ex_type, ex_value, tb)
File "F:\Python27\lib\site-packages\odps\lib\lib_utils.py", line 98, in raise_exc
six.exec('raise ex_type, ex, tb', glb, locals())
File "F:\Python27\lib\site-packages\odps\lib\six.py", line 719, in exec_
exec("""exec code in globs, locs""")
File "", line 1, in
File "F:\Python27\lib\site-packages\odps\tunnel\io\stream.py", line 51, in async_func
self._resp = post_call(self.data_generator())
File "F:\Python27\lib\site-packages\odps\tunnel\tabletunnel.py", line 258, in upload
self._client.put(url, data=data, params=params, headers=headers)
File "F:\Python27\lib\site-packages\odps\rest.py", line 128, in put
return self.request(url, 'put', data=data, **kwargs)
File "F:\Python27\lib\site-packages\odps\rest.py", line 107, in request
proxies=self._proxy)
File "F:\Python27\lib\site-packages\requests\sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "F:\Python27\lib\site-packages\requests\adapters.py", line 488, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: [Errno 10054]
按照教程做,%enter 报错 This room(default) is not configured 是什么问题
wiki上的例子都是数字过滤,涉及string如何处理。
我把日期也直接存为了string,比如「20161123」,如何选择这一天的记录?
update:
改用filter方法,提示「keyword can't be an expression」
org.filter(org.product='vt_finance', org.day='20161122', hour='23').head(5)
你好请问下,比如我根据table和partition(partition是每天)得到reader,如何将这个reader转化为dataframe,谢谢
reader = table.open_reader(partition=‘20180110’)
pyinstaller -F xxx.py成功了
但是运行exe后报错
No such file or directory: 'C:\Users\ll\AppData\Local\Temp\_MEI127122\odps\lib\cloudpickle.py'
/Users/tux/.pyenv/versions/2.7.10/bin/python2.7 /Users/tux/projects/python/other/Aliyun/AliyunODPS/test.py
Traceback (most recent call last):
File "/Users/tux/projects/python/other/Aliyun/AliyunODPS/test.py", line 22, in
record = tb.new_record([1, 'daixijun', 'hashpassword', 20, True, datetime.now(), datetime.now()])
File "/Users/tux/github/aliyun-odps-python-sdk/odps/models/table.py", line 479, in new_record
return types.Record(schema=self.schema, values=values)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 358, in init
self._sets(values)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 380, in _sets
[self._set(i, value) for i, value in enumerate(values)]
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 369, in _set
val = validate_value(value, data_type)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 868, in validate_value
data_type.validate_value(res)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 622, in validate_value
timestamp = self.to_timestamp(val)
AttributeError: 'Datetime' object has no attribute 'to_timestamp'
Process finished with exit code 1
在maxCompute的文档里只有JAVA的udf。我在pyodps的sdk里看到可以编写自定义的函数,但是没有看到具体接口实现的文档。
现在我想统计每张表的数据量,在table对象或partition对象上并没有这个属性,请问除了使用SQL的count(*)还有其他方法吗?
pyodps能否实现上传文本数据到odps
data = data_1.join(data_2, on='id', how='left').to_pandas()
这里会需要写一个临时文件,但是默认的写入位置我没有权限,请问怎么修改这个写入位置啊?
OSError: [Errno 13] Permission denied: '/home/xxx/.pyodps/tempobjs/default/0077c4bac21ef5d6fd61e15d41dbcfek'
from odps.tunnel import TableTunnel
table = o.get_table('my_table')
tunnel = TableTunnel(odps)
upload_session = tunnel.create_upload_session(table.name, partition_spec='pt=test')
with upload_session.open_record_writer(0) as writer:
record = table.new_record()
record[0] = 'test1'
record[1] = 'id1'
writer.write(record)
record = table.new_record(['test2', 'id2'])
writer.write(record)
upload_session.commit([0])
这里使用只有一个block_id,感觉速度不够快,怎么能够加快这个操作呢?因为record非常多,需要的时间较长。
如果通过 pandas.Dataframe生成一个 odps.df.DataFrame,
无法通过persist 写table
目前看是找不到对应engine, 没有entrance, 有什么办法可以指定project吗?
如题
python setup.py install
File "setup.py", line 118
with open('requirements.txt') as f:
^
SyntaxError: invalid syntax
脚本执行结果:
http://imgur.com/a/U96NG
命令行执行结果:
http://imgur.com/a/U6LLW
脚本中的执行结果显然是不符合要求的, 但是业务处理要在脚本中进行,求教如何让脚本执行结果和命令行相同??(代码完全一致)
我想直接从一个ODPS的SQL结果生成DataFrame,请问有这样的操作吗?
我的ODPS的表的数据量非常大,每天是千万级别的,这种情况下能用dataframe吗,我想预览下 head(5),半天也出不来数据 😅
in python3.5 virtualenv
160 else:
161 if in_ipython_frontend():
--> 162 class InstancesProgress(widgets.DOMWidget):
163 _view_name = build_trait(Unicode, 'InstancesProgress', sync=True)
164 _view_module = build_trait(Unicode, 'pyodps/progress', sync=True)
AttributeError: 'NoneType' object has no attribute 'DOMWidget'
初始化时(示例代码如下),endpoint
是什么意思?如果可能,能否给出详细的解释,谢谢!
from odps import ODPS
o = ODPS('**your-access-id**', '**your-secret-access-key**', '**your-default-project**',
endpoint='**your-end-point**')
我有个odps 中的字段string中存放是BINARY类型(从mysql同步的)
在使用open_record_reader 中获取数据报错
with download_session.open_record_reader(0,download_session.count) as reader:
for record in reader:
traceback:
Traceback (most recent call last):
...........
for record in reader:
File "/Users/silenceper/Library/Python/2.7/lib/python/site-packages/odps/tunnel/tabletunnel/reader.py", line 168, in __next__
record = self.read()
File "/Users/silenceper/Library/Python/2.7/lib/python/site-packages/odps/tunnel/tabletunnel/reader.py", line 142, in read
val = utils.to_text(self._reader.read_string())
File "/Users/silenceper/Library/Python/2.7/lib/python/site-packages/odps/utils.py", line 266, in to_text
return binary.decode(encoding)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 3: invalid start byte
一次性插入数据报错,
odps.models.table,line 476, in write
IndexError:list index out of range
我使用了DataFrame完成了计算,但是怎么把结果写入表的某个partition呢?
想将从数据库中读取大量数据到本地的csv
用odps.df接口似乎会一次读取所有数据, 导致内存不足
目前只能通过open_reader() 读取
有更好的办法吗?
pyodps 比 在data-ide web界面上执行相同的操作要快
比如:tb.head(10)和 select * from tb limit 10;
请问除了web上有queueing的时间之外;两者集群调度,计算资源上有什么区别吗?
请问是否可以用dataframe 执行sql 返回result 为df类型?
例如: DataFrame(o.execute_sql('sql query')),
如果可以是否有示例代码,或者有其他办法
如果我全部采用dataframe接口,如何通过dataframe获取完整的数据?
另外,
open_reader可以采用,limit_enabled=False方法实现,但是也是仅仅一次调用,每次都要配置。
采用enter 空间的做法,进入项目空间。就无法再初始化配置中,直接配置此项配置,源码中,在进入空间中,没有获取options.tunnel.limited_instance_tunnel这项配置,是否考虑添加额外配置,可以允许配置?
用replace 函数
报错为tuple index out of range。
dataframe 名为list, 列名为p, 其中有些行为空,有些有数,有些含有+86
代码为 list.p.repalce('+86', '')
我尝试用另外一个数据源的数据,也有相同问题,
我下载了iris的数据,然后传至公共服务器,然后用replace,还是报错相同问题,
是不是本身这个function 有问题,或者能否在文档中举例说明如何使用replace?
UDAFRunner 里面看到udaf是将所有数据均分为两份,分别放到buffer0 buffer1 里面。在merge函数里合并这两个buffer。但是我发现实际上,可能所有数据都在一个buffer里面。导致数据错误
我是做一个将string合并的函数。输入n个string。输出是n个string的依次首位相连的string。
下面是代码。
@Annotate('string->string')
class array2string(BaseUDAF):
def new_buffer(self):
return list()
def iterate(self, buffer, unit):
buffer.append(unit)
def merge(self, buffer, pbuffer):
if len(buffer) == 0:
buffer.append(pbuffer[0])
for i in range(len(pbuffer)):
buffer.append(pbuffer[i])
buffer.append('')
else:
pass
for i in range(len(pbuffer)):
if 2*i+1 < len(buffer):
buffer[2*i+1] = pbuffer[i]
else:
pass
def terminate(self, buffer):
return ';'.join(buffer)
with t.open_reader(partition='pt=test') as reader:
partition
这个参数如何设置可以取同一个分区字段的多个分区(或关系)
执行这样一段脚本
%%sql
select a.*
from tablea a left join tableb b
on a.id=b.id
where b.id is null;
然后会报错ODPS-0130161:Parse exception - line 2:35 cannot recognize input near 'left' 'join' 'tableb' in join type specifier
然而在odps直接执行是可以的
简单来讲,就是怎么在DataFrame中插入新的列?
出生日期
字段的值(导出为None)# 运行环境:python2.7.13
if __name__ == "__main__":
xm_odps_connector = OdpsConnector().get_connector()
chk_sql = "select to_date('1969-12-31 23:59:59', 'yyyy-mm-dd hh:mi:ss') as today from dual;"
with xm_odps_connector.execute_sql(chk_sql).open_reader() as reader:
for record in reader:
dt = Record.get_by_name(record, 'today')
if dt:
print(dt.strftime('%Y-%m-%d %H:%M:%S'))
else:
print('9999-12-31 00:00:00')
Traceback (most recent call last):
File "odps\tunnel\io\reader_c.pyx", line 159, in odps.tunnel.io.reader_c.BaseTunnelRecordReader._read_datetime
File "C:\Anaconda3\envs\python27\lib\site-packages\odps\utils.py", line 349, in to_datetime
return _fromtimestamp(seconds).replace(microsecond=microseconds)
ValueError: timestamp out of range for platform localtime()/gmtime() function
Exception ValueError: 'timestamp out of range for platform localtime()/gmtime() function' in 'odps.tunnel.io.reader_c.BaseTunnelRecordReader._set_datetime' ignored
PyCharm complains as follows when I try running unittest.
It seems caused by this http://stackoverflow.com/questions/29501029/managed-to-break-my-venv-is-it-possible-to-fix
Traceback (most recent call last):
File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/site.py", line 62, in <module>
import os
File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/os.py", line 49, in <module>
import posixpath as path
File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/posixpath.py", line 17, in <module>
import warnings
File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/warnings.py", line 8, in <module>
import types
File "/home/lyman/workspace/ali/odps/pyodps/odps/types.py", line 20, in <module>
import re
File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/re.py", line 282, in <module>
import copy_reg
File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/copy_reg.py", line 7, in <module>
from types import ClassType as _ClassType
ImportError: cannot import name ClassType
Process finished with exit code 1
现在想用 str.replace 正则替换,但是发现 compile 出来的SQL是带 pyodps_udf_xxxx 的,于是跟踪源码发现,strings.Replace 确实没在 compiler 中实现。
目前发现 strings.Contains 是实现了内部函数以及正则,于是想参考实现 strings.Replace,但是发现补充 elif 后打断点程序并不会执行(不会进入 visit_string_op )。跟了一下代码,但还是不清楚哪儿有问题,麻烦指导一下子,谢谢~
1.odps.ml.statistics 文档中很多方法没有具体的例子,同时没给出相关的return的值,在使用过程中并不友好,希望各位大神增添相关的文档,调用更加清楚。
2.请教一下各位大神,odps.ml中 是否直接就是和PAI一套的计算框架
如果udaf函数输出是多列,那么输入参数如何写?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.