justinzm / gopup Goto Github PK

View Code? Open in Web Editor NEW

2.5K 43.0 384.0 706 KB

数据接口：百度、谷歌、头条、微博指数,宏观数据，利率数据，货币汇率，千里马、独角兽公司，新闻联播文字稿，影视票房数据，高校名单，疫情数据…

Home Page: http://www.gopup.cn

Python 100.00%

data-analysis covid19-data index-data economic-data datasets gopup python data-science data

gopup's Introduction

Hi there 👋

🧑‍💻 I’m good at Python and PHP.

🌱 I’m currently learning Blockchain Learning.

gopup's People

Contributors

Stargazers

Watchers

Forkers

kenuoseclab xrosliang magicknight axuew colinshin 666git purewater2011 pengfeihu2019 xjbclz baifengbai lhongjum kkk88898 lukechou qt-pay yaoshuyin supersong101 jdslhk bugroom gain-wyj maoyue20191007 shiyaoa funny-pig tao1x wjq821178452 situker sowillcom cqwzlw pengchengyin alexfeisi chinahunanxhp mymingmingteam lshywwl zhangzsx teamo996 dylanliuxl iwillcodeu notspicyzhan yushu-liu chaoshengt azheng526 frankfan007 yydmss codesfly jankinnn jackyvan kpskylonely gaochengcheng imgoogege ricklitong hezai911 songkz victor3387 likeucode oozeryze ethpony isgasho hzwjs lyrl mathlemon zsx-jojo allensmile caofancpu 2733284198 jack810914 anandyuan sumsung007 python043 ambitiouscc jayden-liang jy1989 snitfk juicepxt wanderer-zx camelot2013 forkpython heytool stoneyaofx zorrock kitokotoko recorvery blake2002 newmke chenguanyyao celestialcoder03 tianxiang133 byst4nder lmingzhi yuqwert weiplanet zivex whatisforever quegithub yunpub hawllee mzhuang1 g-linch gdh756462786 jungejune charrylee0426 desertsea2002

gopup's Issues

gp.lpr_data(startDate， endDate)日期格式是什么

Traceback (most recent call last):
File "E:/g/tz/mlltz/for_gp.py", line 30, in
get_lprdata()
File "E:/g/tz/mlltz/for_gp.py", line 21, in get_lprdata
lpr.to_csv('./lpr.csv', encoding='gb2312')
AttributeError: 'NoneType' object has no attribute 'to_csv'
lpr_data中文文档没写，不知道日期咋填

算数指数返回的数据都是None？

直接复制运行cookbook中的代码，返回的dataframe都是None。

中文文档中百度搜索数据数据仓库cookie字段应该改为单引号赋值

在文档中，对于百度搜索数据这个数据仓库的爬取示例代码有误。

因为从百度指数网站得到的cookie中带有双引号，所以在示例代码的cookie字段处，应该改为单引号赋值~

接口示例应该为：

`
import gopup as gp

cookie = '此处输入您在网页端登录百度指数后的 cookie 数据'

index_df = gp.baidu_search_index(word="口罩", start_date='2020-01-01', end_date='2020-03-01', cookie=cookie)

print(index_df)
`

AttributeError: module 'gopup' has no attribute 'covid_163'

这是什么原因呢同时求个tanken [email protected]

请问谷歌指数能否选择地区？

目前是否是显示**地区的热度，请问能否选择国家呢？

以及如果是日为单位，请问该怎么设定呢？谢谢！

中文文档打不开

头条指数好像不能用了

好多调用接口示例只显示部分数据

好多调用接口示例只显示部分数据，有省略号。如果想看全部是需要自己去琢磨源码吗？

头条指数获取失败，显示None

**历史时点数据只返回过去两个月的数据，这个能开放更长时间的接口吗？

博主您好，希望通过您的平台，想获取疫情至今为止，**历史时点数据，尤其是带有境外输入以及本土病例区分的这样一份数据，感谢

电影数据有些问题

一直报UnicodeDecodeError，不知道为什么

获取实时电影票房报“SyntaxError: 缺少标识符、字符串或数字”错误

运行接口示例代码

import gopup as gp
df_index = gp.realtime_boxoffice()
print(df_index)

后报 SyntaxError: 缺少标识符、字符串或数字错误，各位大佬怎么解决呀，使用0.3.4和0.2.8版本都有此问题

需求图谱可以加个时间吗？想自定义爬取日期的数据

微博指数获取时间

你好，我想问一下微博指数获取可以爬取自定义时间段的数据吗，如果可以的话应该怎么改呀？谢谢~

SyntaxError: 缺少标识符、字符串或数字

您好，请问您GitHub上gopup项目中，获取电影票房信息时，js爆出了 SyntaxError: 缺少标识符、字符串或数字的错误，请问该如何解决呢？

大佬，你的中文文档和官网打不开了

如题

百度迁徙指数提取报错

大佬，这两天开始百度迁徙指数提取报错了，有空还请麻烦看一下啊，感谢！

大佬，运行完后百度指数的数据对不上，请问这个是什么原因呢

import gopup as gp
cookie = '我的cookie数据'
index_df = gp.baidu_info_index(word="共享经济", start_date='2018-01-01', end_date='2018-02-01', cookie=cookie)
print(index_df)

运行后得出
共享经济
date
2018-01-01 1570
2018-01-02 3114
2018-01-03 0
2018-01-04 672
2018-01-05 840
2018-01-06 2367
2018-01-07 2594
2018-01-08 1040
2018-01-09 847
2018-01-10 3162
2018-01-11 109
2018-01-12 584
2018-01-13 1172
2018-01-14 589
2018-01-15 1130
2018-01-16 269
2018-01-17 1067
2018-01-18 1434
2018-01-19 917
2018-01-20 929
2018-01-21 452
2018-01-22 372
2018-01-23 607
2018-01-24 415
2018-01-25 75549
2018-01-26 21709
2018-01-27 43497
2018-01-28 55024
2018-01-29 45434
2018-01-30 4504
2018-01-31 2330
2018-02-01 2169

但这和我在百度指数官网上的数据对不上

百度指数接口数据不全

百度指数接口数据，获取近一年的数据只返回了两个多月的数据，其他的数据都是NaN

百度搜索指数、资讯指数、媒体指数爬取出现错误TypeError: string indices must be integers

File "D:\python\lib\site-packages\gopup\index\index_baidu.py", line 267, in baidu_search_index
all_data = data["userIndexes"][0][type]["data"]
TypeError: string indices must be integers
请问这个是被反爬了吗？

百度指数-request block

gp.baidu_search_index 接口，r = requests.get(url=url, headers=headers)返回的data为空，message:request block，是被反爬了吗？

covid_163获取世界历史时点数据出错，显示WinError 10054

使用：
import gopup as gp
covid_163_df = gp.covid_163(indicator="世界历史时点数据")
print(covid_163_df)
出错：
[WinError 10054] 远程主机强迫关闭了一个现有的连接。

TypeError: string indices must be integers

import gopup as gp
cookie==。。。 #正确的赋值后
index_df = gp.baidu_search_index(word="罩", start_date='2020-12-01', end_date='2020-12-25', cookie=cookie)
print(index_df)

报错
TypeError Traceback (most recent call last)
in
2 # 怎样看cookie https://jingyan.baidu.com/article/76a7e409284a80fc3a6e1566.html
3 cookie =。。。
----> 4 index_df = gp.baidu_search_index(word="罩", start_date='2020-12-01', end_date='2020-12-25', cookie=cookie)
5 print(index_df)

C:\ProgramData\Anaconda3\lib\site-packages\gopup\index\index_baidu.py in baidu_search_index(word, start_date, end_date, cookie, type)
264 r = requests.get(url=url, params=params, headers=headers)
265 data = r.json()["data"]
--> 266 all_data = data["userIndexes"][0][type]["data"]
267 uniqid = data["uniqid"]
268 ptbk = get_ptbk(uniqid, cookie)

TypeError: string indices must be integers

大佬，百度搜索指数好像出错了

百度搜索指数与实际数据不一致

VIP接口token怎么获取？是收费的吗？

收费方式是怎么样的，我愿意付费

微博指数个别关键词不可查

调用weiboindex 查询 model Y 和比亚迪唐正常，在查询小鹏P7 和 model 3 两个关键词提示无数据

已经尝试过小鹏 P7 小鹏p7 小鹏 p7 ， model3, Model3 model 3 Model 3 以上几个方式都有问题。确认官网页面可用。是否特定解析方式有问题？

import gopup as gp

请问微博指数能否有分地区版本的接口？

感谢您的工作，的确为我们提供了太多的便利，近来项目需求上，有微博指数分地区版本的数据的需求，这部分模块，能否添加？

建议添加12306车站、车次数据库

1. 从12306下载车站信息

通过分析12306的网站代码，发现全国车站信息的URL

https://kyfw.12306.cn/otn/resources/js/framework/station_name.js

2. 解析车站信息

解析1的数据，输出成以下格式

ID  电报码  站名    拼音        首字母  拼音码
0   BOP    北京北  beijingbei  bjb   bjb

3. 从12306下载车次信息

通过分析12306的网站代码，发现全国车次信息的URL。这个文件存储了当前60天的所有车次信息，大约有35MB。

https://kyfw.12306.cn/otn/resources/js/query/train_list.js

4. 解析车次信息

解析3的数据，按照日期分割成以下格式。

类型  列车编号       车次  起点  终点
D    24000000D10R  D1   北京  沈阳

12306将全国列车分成了7类，C-城际高速，D-动车，G-高铁，K-普快，T-特快，Z-直达，O-其他列车。这里我们仅抽取 C-城际高速，D-动车，G-高铁的数据。

5. 根据车次和车站解析时刻表URL

首先Merge所有日期的车次，以车次和列车编号为KEY，去除重复后得到全部车次一览。
然后根据各车站的电报码，得出下载时刻表用的URL。如下：

https://kyfw.12306.cn/otn/czxx/queryByTrainNo?train_no=列车编号&from_station_telecode=出发车站电报码&to_station_telecode=到达车站电报码&depart_date=出发日期

#####注意点
a) 部分车次仅在特定日期运营（比如:工作日，周末，节假日等）
b) 同一车次，在不同日期，运营时刻和停靠车站可能不一样
c) 同一车次同一列车编号，在不同日期，运营时刻和停靠车站完全一致

6. 从12306下载时刻表信息

根据步骤5中得到的时刻表URL，下载所有时刻表信息。（JSON格式）

7. 解析时刻表信息

解析6的数据，分别输出完整的“车站”，“车次”，“时刻表”成以下CSV格式

ID  电报码  站名    拼音        首字母  拼音码
0   BOP    北京北  beijingbei  bjb   bjb

车次   起点   终点 出发时间  到达时间 类别  服务
C1002 延吉西 长春  5：47    8：04   动车  2

车次   站序  站名   到站时间  出发时间  停留时间  是否开通
C1002 1    延吉西  ----    6:20     ----     TRUE
      2    长春    8:25    8:25     ----     TRUE

参考： https://github.com/metromancn/Parse12306

获取艺人商业价值报编码错误UnicodeDecodeError: 'gbk'....

`In [9]: df_index = gp.realtime_artist()
Exception in thread Thread-248:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\threading.py", line 932, in _bootstrap_inner
self.run()

File "C:\ProgramData\Anaconda3\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)

File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1366, in _readerthread
buffer.append(fh.read())

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 1444: illegal multibyte sequence`

百度指数好像无法获得了？

现实的是none，cookies用的是index页面下的BUDSS里面的content字符，似乎也无法显示

百度历史疫情热搜获取出错

大佬,有更多的宏观数据提供吗

需要更多数据,如货币M2增速等

google_index不能使用了啊

大佬，谷歌指数只返回"None"

外商直接投资数据(FDI),api 出现语法错误

df_index = gp.get_fdi_data()
print(df_index)
368 data_df['当月(亿元)'] = data_df['当月(亿元)'].map(lambda x: int(x)/100000) --> 369 data_df['累计(亿元)'] = data_df['累计(亿元)'].map(lambda x: int(x)/100000)
ValueError: invalid literal for int() with base 10: ''

数据展示显示

在使用的这个gopup包抓取的数据之后，显示的是部分的数据，那我应该怎样显示全部的数据呢？？？

算术指数数据返回None

算术指数数据返回None，搜索的关键词是股票。
不过在官网直接搜索是可以的，怀疑是反爬升级了。
感叹一下爬虫工程师真难啊。

token如何获取？

微博指数是只能获取三个月的吗？

百度指数的相关接口似乎挂掉了

上周可以运行的代码，这种就不行了。更新了cookie后也不行。

百度媒体指数似乎不能用了

百度指数的网页里面也少了媒体指数这个部分，要咋办啊，写论文急用orz

百度指数的cookies应该复制哪一个

我现在复制的是name为index.html的cookies之中的，name为BDUSS，我复制的是它的Value值，打印出来的是None

请问应该复制哪一个cookies，本人对cookies没什么使用经验，忘不吝赐教，谢谢！

文档错误

http://doc.gopup.cn/#/data/life_data?id=%e8%80%81%e9%bb%84%e5%8e%86

文档中 描述: 获取唐朝诗人姓名及诗词作品数量 不是对 老黄历 的描述，应该是笔误。

调用marco_cmlrd 时报“XLRDError”

import gopup as gp
df_index = gp.marco_cmlrd()
print(df_index)

报错如下

XLRDError Traceback (most recent call last)
in
----> 1 df_index = gp.marco_cmlrd()
2 print(df_index)

~\AppData\Roaming\Python\Python39\site-packages\gopup\economic\marco_cn.py in marco_cmlrd()
20 """
21 url = "http://114.115.232.154:8080/handler/download.ashx"
---> 22 excel_data = pd.read_excel(url, sheet_name="Data", header=0, skiprows=1)
23 excel_data["Period"] = pd.to_datetime(excel_data["Period"]).dt.strftime("%Y-%m")
24 excel_data.columns = [

~\AppData\Roaming\Python\Python39\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
294 )
295 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296 return func(*args, **kwargs)
297
298 return wrapper

~\AppData\Roaming\Python\Python39\site-packages\pandas\io\excel_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
302
303 if not isinstance(io, ExcelFile):
--> 304 io = ExcelFile(io, engine=engine)
305 elif engine and engine != io.engine:
306 raise ValueError(

~\AppData\Roaming\Python\Python39\site-packages\pandas\io\excel_base.py in init(self, path_or_buffer, engine)
865 self._io = stringify_path(path_or_buffer)
866
--> 867 self._reader = self._enginesengine
868
869 def fspath(self):

~\AppData\Roaming\Python\Python39\site-packages\pandas\io\excel_xlrd.py in init(self, filepath_or_buffer)
20 err_msg = "Install xlrd >= 1.0.0 for Excel support"
21 import_optional_dependency("xlrd", extra=err_msg)
---> 22 super().init(filepath_or_buffer)
23
24 @Property

~\AppData\Roaming\Python\Python39\site-packages\pandas\io\excel_base.py in init(self, filepath_or_buffer)
349 # N.B. xlrd.Book has a read attribute too
350 filepath_or_buffer.seek(0)
--> 351 self.book = self.load_workbook(filepath_or_buffer)
352 elif isinstance(filepath_or_buffer, str):
353 self.book = self.load_workbook(filepath_or_buffer)

~\AppData\Roaming\Python\Python39\site-packages\pandas\io\excel_xlrd.py in load_workbook(self, filepath_or_buffer)
33 if hasattr(filepath_or_buffer, "read"):
34 data = filepath_or_buffer.read()
---> 35 return open_workbook(file_contents=data)
36 else:
37 return open_workbook(filepath_or_buffer)

~\AppData\Roaming\Python\Python39\site-packages\xlrd_init_.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows, ignore_workbook_corruption)
168 # files that xlrd can parse don't start with the expected signature.
169 if file_format and file_format != 'xls':
--> 170 raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
171
172 bk = open_workbook_xls(

XLRDError: Excel xlsx file; not supported

百度指数的区域化选择是否可以实现？

在查看百度指数时，发现可选择全国，也可分地区。想问大佬和各位目前是否可以实现分地区的下载？具体方式是怎么样的呢？感谢！！

求教大佬百度搜索指数报错

使用百度搜索指数的接口，使用的样例的代码：
index_df = gp.baidu_search_index(word="口罩", start_date='2020-01-01', end_date='2020-03-01', cookie=cookie)
提示如下错误：
Traceback (most recent call last):
File "<pyshell#10>", line 1, in
index_df = gp.baidu_search_index(word="口罩", start_date='2020-01-01', end_date='2020-03-01', cookie=cookie)
File "C:\Users\zeyu_\AppData\Local\Programs\Python\Python39\lib\site-packages\gopup\index\index_baidu.py", line 264, in baidu_search_index
all_data = data["userIndexes"][0][type]["data"]
TypeError: string indices must be integers
求大佬指点，谢谢！

微博话题热度信息获取时的bug

当查询微博某一话题一天的热度时，会返回未来时间节点的热度信息，比如我在9.18日18:00调用获取微博某话题当天的热度信息请求，会返回9.18日22:00该话题的热度信息，我认为这是很荒谬的，因为22:00还没有到，而且该接口也无法提供预测信息。

算数数据下面的内容都返回None

抱着较劲的想法，算数数据下面的每一个我都测试了一下，全都是返回None

partially initialized module 'gopup' has no attribute 'weibo_index'

这是什么问题

import gopup as gp

df_index = gp.weibo_index(word="疫情", time_type="3month")
print(df_index)

报错
AttributeError: partially initialized module 'gopup' has no attribute 'weibo_index' (most likely due to a circular import)

大佬帮我看看这是出什么问题了？

Traceback (most recent call last):
File "E:/pythonProject2/01.17.py", line 3, in
df_index = g.weibo_user(keyword="雷军")
File "E:\python\lib\site-packages\gopup\pro\client.py", line 48, in query
raise Exception(result['msg'])
Exception: 'Response' object has no attribute 'msg'