aliyun / aliyun-odps-python-sdk Goto Github PK

ODPS Python SDK and data analysis framework

License: Apache License 2.0

Perl 0.03% Shell 0.02% Python 92.71% Jupyter Notebook 0.57% JavaScript 2.74% CSS 0.07% C 0.37% Cython 3.14% TypeScript 0.35%

aliyun-odps-python-sdk's Issues

pip 安装失败。。。

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-KyLmcD/pyodps/

像是环境问题，同一份代码在另一台机器上运行正常，这台机器python3.5/2.7.13试过都有这问题
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/utils.py", line 792, in check_header_validity
if not pat.match(value):
TypeError: expected string or bytes-like object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./test_aliyun_mc.py", line 55, in
UploadData()
File "./test_aliyun_mc.py", line 18, in UploadData
upload_session = tunnel.create_upload_session(table.name) #, partition_spec='pt=test')
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/tunnel/tabletunnel/tabletunnel.py", line 92, in create_upload_session
compress_option=compress_option)
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/tunnel/tabletunnel/uploadsession.py", line 60, in init
self._init()
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/tunnel/tabletunnel/uploadsession.py", line 74, in _init
resp = self._client.post(url, {}, params=params, headers=headers)
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/rest.py", line 115, in post
return self.request(url, 'post', data=data, **kwargs)
File "/usr/local/lib/python3.5/site-packages/pyodps-0.5.6-py3.5.egg/odps/rest.py", line 94, in request
prepared_req = req.prepare()
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/models.py", line 257, in prepare
hooks=self.hooks,
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/models.py", line 303, in prepare
self.prepare_headers(headers)
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/models.py", line 427, in prepare_headers
check_header_validity(header)
File "/usr/local/lib/python3.5/site-packages/requests-2.12.4-py3.5.egg/requests/utils.py", line 796, in check_header_validity
"not %s" % (value, type(value)))
requests.exceptions.InvalidHeader: Header value 0 must be of type str or bytes, not <class 'int'>

Jupyter Notebook没有交互式图标和进度展示功能

options也没有notebook_repr_widget参数

如何把execute_sql的结果转换为dataframe

to_pandas报错

ODPSError: InstanceId: 20171221102131257g9c5vlu
ODPS-0130071:[0,0] Semantic analysis exception - INT type is not enabled in current mode
ODPS-0130071:[0,0] Semantic analysis exception - INT type is not enabled in current mode

SQL结果数据太多如何处理？

SQL结果数据限制10000条，有什么好办法全部拿到结果数据

df.head(n) 对于分区列的值均解析为 NoneType

直接对分区表执行 df.head(n)，相应分区列debug发现其类型解析正确，但是值均解析为 NoneType；
df._types ->

df._values ->

如果使用 df['xxx', '分区列'].head(n)，则可解析出相应列和值。

odps-sdk在python3.6的环境下运行报错

错误如下图

dataframe 可以生成csv 或者 excel 文件吗

如题
谢谢

程序在win10上经常出现异常

同一个程序在linux上运行正常，但在win10上经常在这出现异常
record = table.new_record([。。。。])
writer.write(record)

Traceback (most recent call last):
File ".\test_aliyun_mc.py", line 71, in
UploadData()
File ".\test_aliyun_mc.py", line 57, in UploadData
skipped += 1
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 234, in exit
self.close()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 276, in close
super(RecordWriter, self).close()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 227, in close
super(BaseRecordWriter, self).close()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 67, in close
self.flush_all()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 70, in flush_all
self.flush()
File "F:\Python27\lib\site-packages\odps\tunnel\io\writer.py", line 62, in flush
self.output.write(data)
File "F:\Python27\lib\site-packages\odps\tunnel\io\stream.py", line 82, in write
raise_exc(ex_type, ex_value, tb)
File "F:\Python27\lib\site-packages\odps\lib\lib_utils.py", line 98, in raise_exc
six.exec('raise ex_type, ex, tb', glb, locals())
File "F:\Python27\lib\site-packages\odps\lib\six.py", line 719, in exec_
exec("""exec code in globs, locs""")
File "", line 1, in
File "F:\Python27\lib\site-packages\odps\tunnel\io\stream.py", line 51, in async_func
self._resp = post_call(self.data_generator())
File "F:\Python27\lib\site-packages\odps\tunnel\tabletunnel.py", line 258, in upload
self._client.put(url, data=data, params=params, headers=headers)
File "F:\Python27\lib\site-packages\odps\rest.py", line 128, in put
return self.request(url, 'put', data=data, **kwargs)
File "F:\Python27\lib\site-packages\odps\rest.py", line 107, in request
proxies=self._proxy)
File "F:\Python27\lib\site-packages\requests\sessions.py", line 639, in send
r = adapter.send(request, **kwargs)
File "F:\Python27\lib\site-packages\requests\adapters.py", line 488, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: [Errno 10054]

如何创建分区，并给这个分区添加一条数据

按照教程做，%enter 报错 This room(default) is not configured 是什么问题

过滤数据，string类型怎么处理呢？

wiki上的例子都是数字过滤，涉及string如何处理。
我把日期也直接存为了string，比如「20161123」，如何选择这一天的记录？

update:
改用filter方法，提示「keyword can't be an expression」

org.filter(org.product='vt_finance', org.day='20161122', hour='23').head(5)

怎么将reader转化为dataframe

你好请问下，比如我根据table和partition（partition是每天）得到reader，如何将这个reader转化为dataframe，谢谢
reader = table.open_reader(partition=‘20180110’)

怎么打包有odps模块的py文件？

pyinstaller -F xxx.py成功了
但是运行exe后报错
No such file or directory: 'C:\Users\ll\AppData\Local\Temp\_MEI127122\odps\lib\cloudpickle.py'

数据上传时日期时间格式缺少to_timestamp方法

/Users/tux/.pyenv/versions/2.7.10/bin/python2.7 /Users/tux/projects/python/other/Aliyun/AliyunODPS/test.py
Traceback (most recent call last):
File "/Users/tux/projects/python/other/Aliyun/AliyunODPS/test.py", line 22, in
record = tb.new_record([1, 'daixijun', 'hashpassword', 20, True, datetime.now(), datetime.now()])
File "/Users/tux/github/aliyun-odps-python-sdk/odps/models/table.py", line 479, in new_record
return types.Record(schema=self.schema, values=values)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 358, in init
self._sets(values)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 380, in _sets
[self._set(i, value) for i, value in enumerate(values)]
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 369, in _set
val = validate_value(value, data_type)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 868, in validate_value
data_type.validate_value(res)
File "/Users/tux/github/aliyun-odps-python-sdk/odps/types.py", line 622, in validate_value
timestamp = self.to_timestamp(val)
AttributeError: 'Datetime' object has no attribute 'to_timestamp'

Process finished with exit code 1

pyodps sdk支持编写udf吗

在maxCompute的文档里只有JAVA的udf。我在pyodps的sdk里看到可以编写自定义的函数，但是没有看到具体接口实现的文档。

读取表的总数据量除了执行SQL还有其他方法吗？

现在我想统计每张表的数据量，在table对象或partition对象上并没有这个属性，请问除了使用SQL的count(*)还有其他方法吗？

pyodps能否实现上传文本数据到odps

使用DataFrame中的join操作，会产生临时文件，但是它写的位置我没有权限，这个位置怎么改啊？

data = data_1.join(data_2, on='id', how='left').to_pandas()

这里会需要写一个临时文件，但是默认的写入位置我没有权限，请问怎么修改这个写入位置啊？
OSError: [Errno 13] Permission denied: '/home/xxx/.pyodps/tempobjs/default/0077c4bac21ef5d6fd61e15d41dbcfek'

使用tunnel时，如何用多个block啊？

from odps.tunnel import TableTunnel

table = o.get_table('my_table')

tunnel = TableTunnel(odps)
upload_session = tunnel.create_upload_session(table.name, partition_spec='pt=test')

with upload_session.open_record_writer(0) as writer:
    record = table.new_record()
    record[0] = 'test1'
    record[1] = 'id1'
    writer.write(record)

    record = table.new_record(['test2', 'id2'])
    writer.write(record)

upload_session.commit([0])

这里使用只有一个block_id，感觉速度不够快，怎么能够加快这个操作呢？因为record非常多，需要的时间较长。

有没有类似pandas的 loc 和 iloc 接口，可以访问DataFrame的行列？

如何通过odps.df.DataFrame 写入odps的table

如果通过 pandas.Dataframe生成一个 odps.df.DataFrame,
无法通过persist 写table
目前看是找不到对应engine, 没有entrance, 有什么办法可以指定project吗?

请问写入操作如何override或者清理表里的数据

如题

python setup.py install ERROR

python setup.py install
File "setup.py", line 118
with open('requirements.txt') as f:
^
SyntaxError: invalid syntax

dataframe命令行和脚本执行结果不同(求助!!!)

脚本执行结果:
http://imgur.com/a/U96NG

命令行执行结果:
http://imgur.com/a/U6LLW

脚本中的执行结果显然是不符合要求的, 但是业务处理要在脚本中进行,求教如何让脚本执行结果和命令行相同??(代码完全一致)

如何从SQL的结果生成DataFramae？

我想直接从一个ODPS的SQL结果生成DataFrame，请问有这样的操作吗？

dataframe处理的数据量的问题

我的ODPS的表的数据量非常大，每天是千万级别的，这种情况下能用dataframe吗，我想预览下 head(5)，半天也出不来数据 😅

python3.5 jupter nootbook: from odps import ODPS AttributeError: 'NoneType' object has no attribute 'DOMWidget'

in python3.5 virtualenv
160 else:
161 if in_ipython_frontend():
--> 162 class InstancesProgress(widgets.DOMWidget):
163 _view_name = build_trait(Unicode, 'InstancesProgress', sync=True)
164 _view_module = build_trait(Unicode, 'pyodps/progress', sync=True)

AttributeError: 'NoneType' object has no attribute 'DOMWidget'

endpoint参数含义？

初始化时（示例代码如下），endpoint是什么意思？如果可能，能否给出详细的解释，谢谢！

from odps import ODPS

o = ODPS('**your-access-id**', '**your-secret-access-key**', '**your-default-project**',
            endpoint='**your-end-point**')

reader :UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 3: invalid start byte

我有个odps 中的字段string中存放是BINARY类型(从mysql同步的)

在使用open_record_reader 中获取数据报错

        with download_session.open_record_reader(0,download_session.count) as reader:
            for record in reader:

traceback:

Traceback (most recent call last):
  ...........

    for record in reader:
  File "/Users/silenceper/Library/Python/2.7/lib/python/site-packages/odps/tunnel/tabletunnel/reader.py", line 168, in __next__
    record = self.read()
  File "/Users/silenceper/Library/Python/2.7/lib/python/site-packages/odps/tunnel/tabletunnel/reader.py", line 142, in read
    val = utils.to_text(self._reader.read_string())
  File "/Users/silenceper/Library/Python/2.7/lib/python/site-packages/odps/utils.py", line 266, in to_text
    return binary.decode(encoding)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 3: invalid start byte

pyodps，writer.write(records)出错

一次性插入数据报错，
odps.models.table，line 476, in write
IndexError：list index out of range

odps的DataFrame怎么写入odps表的某个partition，有没有接口啊？

我使用了DataFrame完成了计算，但是怎么把结果写入表的某个partition呢？

DataFrame的API里面concat、union等函数会把DataFrame转成Collection，但是怎么把这些Collection转成DataFrame 呢？

批量读取数据到dataframe

想将从数据库中读取大量数据到本地的csv
用odps.df接口似乎会一次读取所有数据, 导致内存不足
目前只能通过open_reader() 读取
有更好的办法吗?

pyodps 比在data-ide web界面上执行相同的操作要快

pyodps 比在data-ide web界面上执行相同的操作要快
比如：tb.head(10)和 select * from tb limit 10;
请问除了web上有queueing的时间之外；两者集群调度，计算资源上有什么区别吗？

odps.df 操作 odpsSql

请问是否可以用dataframe 执行sql 返回result 为df类型？
例如： DataFrame(o.execute_sql('sql query')),
如果可以是否有示例代码，或者有其他办法

pyodps.dataframe.value_counts 限制10000条数据？

如果我全部采用dataframe接口，如何通过dataframe获取完整的数据？
另外，
open_reader可以采用，limit_enabled=False方法实现，但是也是仅仅一次调用，每次都要配置。

采用enter 空间的做法，进入项目空间。就无法再初始化配置中，直接配置此项配置，源码中，在进入空间中，没有获取options.tunnel.limited_instance_tunnel这项配置，是否考虑添加额外配置，可以允许配置？

一张表里面有多个record，在合并时我只想取最新的哪一个，其他的都丢弃，请问有没有函数实现这个功能？

pyodps 中 replace的用法报错

用replace 函数
报错为tuple index out of range。
dataframe 名为list, 列名为p, 其中有些行为空，有些有数，有些含有+86
代码为 list.p.repalce('+86', '')

我尝试用另外一个数据源的数据，也有相同问题，
我下载了iris的数据，然后传至公共服务器，然后用replace，还是报错相同问题，
是不是本身这个function 有问题，或者能否在文档中举例说明如何使用replace？

UDAF 执行过程

UDAFRunner 里面看到udaf是将所有数据均分为两份，分别放到buffer0 buffer1 里面。在merge函数里合并这两个buffer。但是我发现实际上，可能所有数据都在一个buffer里面。导致数据错误

我是做一个将string合并的函数。输入n个string。输出是n个string的依次首位相连的string。
下面是代码。
@Annotate('string->string')
class array2string(BaseUDAF):
def new_buffer(self):
return list()
def iterate(self, buffer, unit):
buffer.append(unit)
def merge(self, buffer, pbuffer):
if len(buffer) == 0:
buffer.append(pbuffer[0])
for i in range(len(pbuffer)):
buffer.append(pbuffer[i])
buffer.append('')

    else:
        pass
        
        for i in range(len(pbuffer)):
            if 2*i+1 < len(buffer):
                buffer[2*i+1] = pbuffer[i]
            else:
                pass
    	
def terminate(self, buffer):
    return ';'.join(buffer)

如何同时获取多个分区的数据

with t.open_reader(partition='pt=test') as reader:

partition这个参数如何设置可以取同一个分区字段的多个分区（或关系）

BUG求助

sql中含有left join 会报错

执行这样一段脚本
%%sql
select a.*
from tablea a left join tableb b
on a.id=b.id
where b.id is null;

然后会报错ODPS-0130161:Parse exception - line 2:35 cannot recognize input near 'left' 'join' 'tableb' in join type specifier

然而在odps直接执行是可以的

在DataFrame中，我想把apply函数作用后的结果，插入到原来的DataFrame中，请问怎么操作啊？

简单来讲，就是怎么在DataFrame中插入新的列？

Datetime时间范围问题，请提供个解决方案

问题描述：

sdk中支持的时间范围为：1970-01-01 00:00:00~3000-12-31 23:59:59，无法导出存储出生日期字段的值(导出为None)

# 运行环境：python2.7.13
if __name__ == "__main__":
    xm_odps_connector = OdpsConnector().get_connector()
    chk_sql = "select to_date('1969-12-31 23:59:59', 'yyyy-mm-dd hh:mi:ss') as today from dual;"
    with xm_odps_connector.execute_sql(chk_sql).open_reader() as reader:
        for record in reader:
            dt = Record.get_by_name(record, 'today')
            if dt:
                print(dt.strftime('%Y-%m-%d %H:%M:%S'))
            else:
                print('9999-12-31 00:00:00')

错误信息

Traceback (most recent call last):
  File "odps\tunnel\io\reader_c.pyx", line 159, in odps.tunnel.io.reader_c.BaseTunnelRecordReader._read_datetime
  File "C:\Anaconda3\envs\python27\lib\site-packages\odps\utils.py", line 349, in to_datetime
    return _fromtimestamp(seconds).replace(microsecond=microseconds)
ValueError: timestamp out of range for platform localtime()/gmtime() function
Exception ValueError: 'timestamp out of range for platform localtime()/gmtime() function' in 'odps.tunnel.io.reader_c.BaseTunnelRecordReader._set_datetime' ignored

problems with running unittest in PyCharm

PyCharm complains as follows when I try running unittest.
It seems caused by this http://stackoverflow.com/questions/29501029/managed-to-break-my-venv-is-it-possible-to-fix

Traceback (most recent call last):
  File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/site.py", line 62, in <module>
    import os
  File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/os.py", line 49, in <module>
    import posixpath as path
  File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/posixpath.py", line 17, in <module>
    import warnings
  File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/warnings.py", line 8, in <module>
    import types
  File "/home/lyman/workspace/ali/odps/pyodps/odps/types.py", line 20, in <module>
    import re
  File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/re.py", line 282, in <module>
    import copy_reg
  File "/home/lyman/.pyenv/versions/2.7.2/lib/python2.7/copy_reg.py", line 7, in <module>
    from types import ClassType as _ClassType
ImportError: cannot import name ClassType

Process finished with exit code 1

aliyun / aliyun-odps-python-sdk Goto Github PK

aliyun-odps-python-sdk's Issues

问题描述：

错误信息

Recommend Projects

Recommend Topics

Recommend Org