Giter VIP home page Giter VIP logo

amemv-crawler's Introduction

amemv-crawler

请在Python3下运行

README-EN

这是一个Python的脚本,配置运行后可以下载指定抖音用户的全部视频(含收藏),也可以下载指定主题(挑战)或音乐下的全部视频。

怎么样方便地讨论交流

  • 直接在Github上开新的issue;

注意

大家好,这个项目是一个练手项目,源码仅作为和大家一起学习Python使用,你可以免费: 拷贝、分发和派生当前源码。你不可以用于商业目的及其他恶意用途 感谢 @Means88#120

另外本项目要完成的功能是将视频成功下载,有一些朋友在 issue 中提出了一些超预期的需求,比如视频改名、下载图片、视频宽高、发布数据和播放点赞等等, 这些完善可能是对项目十分有利的,但是我没有时间去一一处理,所以对于这样的需求请不要在发issue上来了,可以直接提 Pull requests 上来。

还有一些是对 ascpmas 的探讨,对于这些也不在我们的项目范围内,最后是服务端对抓取的一些限制,如抓取频率、IP等等,如果你遇到了这样的问题, 可能你的下载量已经超出了学习目的,对此我也拒绝支持并表示非常抱歉。

对于上述所不支持的问题以外,欢迎大家多提issue,同时也仅支持在 issues 中反馈问题, 使用 email 和我联系的同学,以后我就不在回复啦,私人邮箱很少登录,回复也不及时,哈哈。😄 最后希望和大家共同学习和进步。

环境安装

配置好你的Python、node环境,然后pip install requests .

或者

$ git clone https://github.com/loadchange/amemv-crawler.git
$ cd amemv-crawler
$ pip install -r requirements.txt

大功告成,直接跳到下一节配置和运行.

配置和运行

有两种方式来指定你要下载的抖音号分享链接,一是编辑share-url.txt,二是指定命令行参数.

第一种方法:编辑share-url.txt文件

找到一个文字编辑器,然后打开文件share-url.txt,把你想要下载的抖音号分享链接编辑进去,以逗号/空格/tab/表格鍵/回车符分隔,可以多行.例如, 这个文件看起来是这样的:

https://www.douyin.com/share/user/85860189461?share_type=link&tt_from=weixin&utm_source=weixin&utm_medium=aweme_ios&utm_campaign=client_share&uid=97193379950&did=30337873848,

https://www.iesdouyin.com/share/challenge/1593608573838339?utm_campaign=clien,

https://www.iesdouyin.com/share/music/6536362398318922509?utm_campaign=client_share&app=aweme&utm_medium=ios&iid=30337873848&utm_source=copy

获取用户分享链接的方法(挑战、音乐 类似)

然后保存文件,双击运行amemv-video-ripper.py或者在终端(terminal)里面 运行python amemv-video-ripper.py

第二种方法:使用命令行参数(仅针对会使用操作系统终端的用户)

如果你对Windows或者Unix系统的命令行很熟悉,你可以通过指定运行时的命令行参数来指定要下载的站点:

某些平台下注意给URL增加引号

python amemv-video-ripper.py --url URL1,URL2

分享链接以逗号分隔,不要有空格.

如果是用户URL默认不下载喜欢列表,需要增加 --favorite

python amemv-video-ripper.py --url URL --favorite

视频的下载与保存

程序运行后,会默认在当前路径下面生成一个跟抖音ID名字相同的文件夹, 视频都会放在这个文件夹下面.

运行这个脚本,不会重复下载已经下载过的视频,所以不用担心重复下载的问题.同时,多次运行可以 帮你找回丢失的或者删除的视频.

然后重新运行下载命令.

高级应用

如果你想下载整个挑战主题,请在 share-url.txt 文件中添加 挑战的分享URL

如果你想下载按音乐去下载,请在 share-url.txt 文件中添加 音乐的分享URL

如下: 既为抖音号、挑战主题和音乐的三种爬虫方式,需要注意的是,爬虫只对搜索结果第一的结果进行下载,所以请尽量完整的写出你的 主题或音乐名称。

https://www.douyin.com/share/user/85860189461?share_type=link&tt_from=weixin&utm_source=weixin&utm_medium=aweme_ios&utm_campaign=client_share&uid=97193379950&did=30337873848,

https://www.iesdouyin.com/share/challenge/1593608573838339?utm_campaign=clien,

https://www.iesdouyin.com/share/music/6536362398318922509?utm_campaign=client_share&app=aweme&utm_medium=ios&iid=30337873848&utm_source=copy

短地址的情况

http://v.douyin.com/cDo2P/,

http://v.douyin.com/cFuAN/,

http://v.douyin.com/cMdjU/

处理意外

2018-04-14 用户列表接口新增字段_signature,该字段是由douyin_falcon:node_modules/byted-acrawler/dist/runtime 生成的,所以我们需要先fuck byted-acrawler一下,拿到signature,才能继续前行。请安装好python的环境之后 顺手安装node 以便顺利的fuck byted-acrawler

2018-06-22 分享出现短地址,解决办法:读取到 v.douyin.com 的任务时,尝试请求,在302的情况下取Response Headers中Location。

2018-07-02 更新了 douyin_falcon:node_modules/byted-acrawler/dist/runtime,我们保持同步更新 fuck-byted-acrawler.js !

2018-07-12 用户视频接口 https://www.douyin.com/aweme/v1/aweme/post/ 增加参数dytk, 这个参数在页面中直接取。

2018-09-03 修正用户视频列表接口域名 douyin.com to amemv.com

2018-09-25 抖音关闭原无水印720P下载地址,临时降级为有水印方案

2018-10-01 恢复无水印下载

2018-11-20 海外版 Tik Tok 切换无水印视频源

喜欢就打赏吧!

如果您喜欢这个项目, 那就打个赏支持一下作者吧! 非常感谢!

amemv-crawler's People

Contributors

loadchange avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amemv-crawler's Issues

效果超赞!👍👍👍 可否加一个不下载用户收藏(Favorite)的参数

看到这个repo后第一时间下载下来配置并使用,效果很好,没有水印,速度也很快,作者辛苦!
同时发现一个问题:一个用户可能自己的作品不多,但是他喜欢的作品超多(比如我2333),就导致爬虫工作的大部分时间都是在下用户的收藏。但是很多时候我们只需要该作者的作品就够了。
所以是否可以加一个不下载用户收藏的功能,比如 命令类似这样:python amemv-video-ripper.py -nofavorite
最后再次感谢作者的付出,请作者收下我们的膝盖 ^_^

spider based on douyin id

系统给的id不能用来抓取了吗,怎么感觉现在像是通过分享动态生成一个非常短时效的id从接口来拿数据

fuck-byted-acrawler.js 的js打码是你自己改的吗? 我跳过去有乱码“” 这种,你是怎么解析出来的?

fuck-byted-acrawler.js 的js打码是你自己改的吗? 我跳过去有乱码。。。。
__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function(l, e) {
Function(function(l) {
return '�e(e,a,r){�(b[e]||(b[e]=t("x,y","�x "+e+" y"�)(r,a)}�a(e,a,r){�(k[r]||(k[r]=t("x,y","�new xy"�)(e,a)}�r(e,a,r){�n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t�)s[n="$"+t]=r[n];for(t=0,b=s�=a�;t<b;t�)s[t]=a[t];�c(e,0,s)}�c(t,b,k){�u(e){v[x�]=e}�f�{�g=�,t�ing(b�g)}�l�{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(�h,y,d,g,v=[],x=0;;)switch(g=�){case 1:u(!�)�4:�f��5:u(�(e){�a=0,r=e�;���{�c=a<r;�c&&u(e[a�]),c}}(���6:y=�,u(�(y��8:if(g=�,l��g,g=�,y===c)b+=g;else if(y!==l)�y�9:�c�10:u(s(���11:y=�,u(�+y)�12:for(y=f�,d=[],g=0;g<y�;g�)d[g]=y.charCodeAt(g)^g+y�;u(String.fromCharCode.apply(null,d��13:y=�,h=delete �[y]�14:���59:u((g=�)?(y=x,v.slice(x-=g,y�:[])�61:u(�[�])�62:g=�,k[0]=65599k[0]+k[1].charCodeAt(g)>>>0�65:h=�,y=�,�[y]=h�66:u(e(t[b�],�,���67:y=�,d=�,u((g=�).x===c?r(g.y,y,k):g.apply(d,y��68:u(e((g=t[b�])<"<"?(b--,f�):g+g,�,���70:u(!1)�71:�n�72:�+f��73:u(parseInt(f�,36��75:if(�){b��case 74:g=�<<16>>16�g�76:u(k[�])�77:y=�,u(�[y])�78:g=�,u(a(v,x-=g+1,g��79:g=�,u(k["$"+g])�81:h=�,�[f�]=h�82:u(�[f�])�83:h=�,k[�]=h�84:�!0�85:�void 0�86:u(v[x-1])�88:h=�,y=�,�h,�y�89:u(��{�e�{�r(e.y,arguments,k)}�e.y=f�,e.x=c,e}�)�90:�null�91:�h�93:h=��0:��;default:u((g<<16>>16)-16)}}�n=this,t=n.Function,s=Object.keys||�(e){�a={},r=0;for(�c in e)a[r�]=c;�a�=r,a},b={},k={};�r'.replace(/[�-�]/g, function(e) {
return l[15 & e.charCodeAt(0)]
})
}("v[x++]=�v[--x]�t.charCodeAt(b++)-32�function �return �))�++�.substr�var �.length�()�,b+=�;break;case �;break}".split("�")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&eff�kx[!cs"l".Pq%widthl"@q&heightl"vr
getContextx$"2d[!cs#l#,;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2qshadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$j�l s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$bb^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"j�l s&l&z0l!$ +["cs'(0l#i'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0syWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!�+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l &+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl'g,)gk}ejo{�cm,)|ynLijem["cl$b%@d<l&zl'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {
value: !0
})])
});

_signature准确率问题

运行环境 node ,用了大神的算_signature的js 发现有很大概率返回的列表是空值

_signature不稳定

七月二号更新算法后,用新的fuck-byted-acrawler.js拿到的_signature有时能用有时失效,这回确定所用的headers 和amemv-video-ripper.py中的header一致,请问可能是什么原因呢?Thanks~

下载challenge的分享链接出现问题

可以下载大概十几个视频,但接下来就会报错,查了一下好像是超过最大递归次数,但深层原因就不知道了
Traceback (most recent call last):
File "amemv-video-ripper.py", line 412, in
CrawlerScheduler(content)
File "amemv-video-ripper.py", line 128, in init
self.scheduling()
File "amemv-video-ripper.py", line 150, in scheduling
self.download_challenge_videos(challenge)
File "amemv-video-ripper.py", line 165, in download_challenge_videos
video_count = self._download_challenge_media(challenge)
File "amemv-video-ripper.py", line 309, in _download_challenge_media
video_count = get_aweme_list()
File "amemv-video-ripper.py", line 305, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "amemv-video-ripper.py", line 305, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "amemv-video-ripper.py", line 305, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
[Previous line repeated 947 more times]
File "amemv-video-ripper.py", line 298, in get_aweme_list
res = requests.get(url, headers=self.headers)
File "D:\Anaconda3\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "D:\Anaconda3\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "D:\Anaconda3\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "D:\Anaconda3\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "D:\Anaconda3\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "D:\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "D:\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "D:\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
conn.connect()
File "D:\Anaconda3\lib\site-packages\urllib3\connection.py", line 337, in connect
cert = self.sock.getpeercert()
File "D:\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 348, in getpeercert
'subjectAltName': get_subj_alt_name(x509)
File "D:\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 196, in get_subj_alt_name
ext = cert.extensions.get_extension_for_class(
File "D:\Anaconda3\lib\site-packages\cryptography\utils.py", line 158, in inner
result = func(instance)
File "D:\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\x509.py", line 137, in extensions
self._backend, self._x509
File "D:\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 249, in parse
value = handler(backend, ext_data)
File "D:\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 428, in _decode_subject_alt_name
_decode_general_names_extension(backend, ext)
File "D:\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1008, in init
self._general_names = GeneralNames(general_names)
File "D:\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 964, in init
if not all(isinstance(x, GeneralName) for x in general_names):
File "D:\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 964, in
if not all(isinstance(x, GeneralName) for x in general_names):
File "D:\Anaconda3\lib\abc.py", line 182, in instancecheck
if subclass in cls._abc_cache:
File "D:\Anaconda3\lib_weakrefset.py", line 75, in contains
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison

第一次换cookie有用,下载了N个后,现在又出现新的错。

requests.exceptions.SSLError: HTTPSConnectionPool(host='api.amemv.com', port=443): Max retries exceeded with url: /aweme/v1/challenge/search/?ac=WIFI&app_name=aweme&vid=2ED370A7-F09C-4C9E-90F5-872D57F3127C&as=a1c5600cb7576a7e273418&device_type=iPhone8,2&os_api=18&build_number=17805&version_code=1.7.8&ts=1524105474&app_version=1.7.8&channel=App%20Store&device_platform=iphone&mas=008c37d4eaf9b158c3d1b7e3fc0d66008dc45306aae0ff5380d6a8&screen_width=1242&search_source=challenge&iid=28175672430&idfa=00000000-0000-0000-0000-000000000000&openudid=20dae85eeac1da35a69e2a0ffeaeef41c78a2e97&device_id=46166717995&count=20&keyword=%E7%BE%8E%E9%A3%9F%E7%BE%8E%E9%A3%9F%E7%BE%8E%E9%A3%9F&cursor=0&aid=1128 (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)'),))
换了下同事的一个cookie也一样哦

1k多视频下载600多报错,重复尝试结果一样

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 356, in connect
ssl_context=context)
File "/usr/lib/python3.6/site-packages/urllib3/util/ssl
.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/usr/lib/python3.6/ssl.py", line 814, in init
self.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 1068, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 356, in connect
ssl_context=context)
File "/usr/lib/python3.6/site-packages/urllib3/util/ssl
.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/usr/lib/python3.6/ssl.py", line 814, in init
self.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 1068, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "amemv-video-ripper.py", line 412, in
CrawlerScheduler(content)
File "amemv-video-ripper.py", line 112, in init
res = requests.get(url, headers=self.headers)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 495, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

运行报错

看不懂 js,请问该怎么修正?谢谢。
环境:windows 7, python 3.7

E:\Code\Python3\amemv-crawler-master\fuck-byted-acrawler.js:5
let e = {}
^
SyntaxError: Unexpected identifier
at exports.runInThisContext (vm.js:73:16)
at Module._compile (module.js:443:25)
at Object.Module._extensions..js (module.js:478:10)
at Module.load (module.js:355:32)
at Function.Module._load (module.js:310:12)
at Function.Module.runMain (module.js:501:10)
at startup (node.js:129:16)
at node.js:814:3
Traceback (most recent call last):
File ".\amemv-video-ripper.py", line 412, in
CrawlerScheduler(content)
File ".\amemv-video-ripper.py", line 128, in init
self.scheduling()
File ".\amemv-video-ripper.py", line 147, in scheduling
self.download_videos(params)
File ".\amemv-video-ripper.py", line 159, in download_videos
video_count = self._download_user_media(number, dytk)
File ".\amemv-video-ripper.py", line 201, in _download_user_media
signature = self.generateSignature(str(user_id))
File ".\amemv-video-ripper.py", line 138, in generateSignature
return p.readlines()[0]
IndexError: list index out of range

好像有比较大的几率下载到“视频不见啦”

不知道从什么版本开始,下载到 ~121KB 的 “视频不见啦” 的空视频的几率变大了,几百个视频里面有十几二十个这样的视频,要一个一个找出来然后删掉重新下载才行,有时候重下的还是这种,要不停尝试才行。

screen shot 2018-07-20 at 11 39 05

生成的signature不行

我把你的fuck-byted-acrawler.js下载到我本地,用下面的命令生成signature

node fuck-byted-acrawler.js 58585956426

然后带入到url里面

https://www.douyin.com/aweme/v1/aweme/post/?user_id=58585956426&count=21&max_cursor=0&aid=1128&_signature=8Z4HuwAAq4AOYfhE08pWY.GeB6

返回结果为空数组

_signature生成分析

“_signature,该字段是由
douyin_falcon:node_modules/byted-acrawler/dist/runtime生成的”, 请问这个你是如何分析出来的?有点好奇

运行脚本后,爬了几个视频就爬不了了,一段时间后报错,请问要怎么办呢?

Downloading v0200f9a0000bc1ahsmbn5vdr6i8f2h0.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200f9a0000bc1ahsmbn5vdr6i8f2h0&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading ba5798aee12b4bff801598d4abf5cb99.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=ba5798aee12b4bff801598d4abf5cb99&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading 4b784f8aaef2407f935a40e18a6a8811.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=4b784f8aaef2407f935a40e18a6a8811&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.
Downloading f9d5c6cde2fa418f883fc65793c8280d.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=f9d5c6cde2fa418f883fc65793c8280d&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading f88b386af271417987be0b6df7a6065e.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=f88b386af271417987be0b6df7a6065e&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading v0200fbd0000bcjo8tcthbi90j14pcag.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200fbd0000bcjo8tcthbi90j14pcag&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading v0200f9a0000bck76427u0r58fotk7ug.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200f9a0000bck76427u0r58fotk7ug&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.
Downloading v0200fbd0000bc6kaaelg9jt2h43h2ng.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200fbd0000bc6kaaelg9jt2h43h2ng&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 196, in get_subj_alt_name
ext = cert.extensions.get_extension_for_class(
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\utils.py", line 159, in inner
result = func(instance)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\x509.py", line 138, in extensions
self._backend, self._x509
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 238, in parse
value = handler(backend, ext_data)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 417, in _decode_subject_alt_name
_decode_general_names_extension(backend, ext)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1210, in init
self._general_names = GeneralNames(general_names)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1163, in init
if not all(isinstance(x, GeneralName) for x in general_names):
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1163, in
if not all(isinstance(x, GeneralName) for x in general_names):
File "C:\ProgramData\Anaconda3\lib\abc.py", line 182, in instancecheck
if subclass in cls._abc_cache:
File "C:\ProgramData\Anaconda3\lib_weakrefset.py", line 75, in contains
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 379, in
CrawlerScheduler(content)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 119, in init
self.scheduling()
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 131, in scheduling
self.download_challenge_videos(challenge)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 143, in download_challenge_videos
video_count = self._download_challenge_media(challenge)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 280, in _download_challenge_media
video_count = get_aweme_list()
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 276, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 276, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 276, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
[Previous line repeated 947 more times]
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 269, in get_aweme_list
res = requests.get(url, headers=self.headers)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
conn.connect()
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 337, in connect
cert = self.sock.getpeercert()
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 348, in getpeercert
'subjectAltName': get_subj_alt_name(x509)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 202, in get_subj_alt_name
except (x509.DuplicateExtension, x509.UnsupportedExtension,
AttributeError: module 'cryptography.x509' has no attribute 'UnsupportedExtension'
[Finished in 331.1s with exit code 1]
[shell_cmd: python -u "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py"]
[dir: E:\amemv-crawler-master\amemv-crawler-master]
[path: C:\Program Files\curl-7.60.0\I386;C:\ffmpeg\bin;C:\ProgramData\Anaconda3;C:\ProgramData\Anaconda3\Scripts;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin]

能爬取视频属性信息吗?

请问能够爬取视频属性信息吗?比如什么时候发布的,视频宽度和高度,点赞数,评论数,播放数这些

7.2号晚上更新了算法,现在怎么破

7.2号晚上更新了算法,现在怎么破
7.2号晚上更新了算法,现在怎么破
7.2号晚上更新了算法,现在怎么破
/!douyin_falcon:node_modules/byted-acrawler/dist/runtime.js/
__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime",function(l,e){Function(function(l){return'�e(e,a,r){�(b[e]||(b[e]=t("x,y","�x "+e+" y"�)(r,a)}�a(e,a,r){�(k[r]||(k[r]=t("x,y","�new xy"�)(e,a)}�r(e,a,r){�n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t�)s[n="$"+t]=r[n];for(t=0,b=s�=a�;t<b;t�)s[t]=a[t];�c(e,0,s)}�c(t,b,k){�u(e){v[x�]=e}�f�{�g=�,t�ing(b�g)}�l�{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(�h,y,d,g,v=[],x=0;;)switch(g=�){case 1:u(!�)�4:�f��5:u(�(e){�a=0,r=e�;���{�c=a<r;�c&&u(e[a�]),c}}(���6:y=�,u(�(y��8:if(g=�,l��g,g=�,y===c)b+=g;else if(y!==l)�y�9:�c�10:u(s(���11:y=�,u(�+y)�12:for(y=f�,d=[],g=0;g<y�;g�)d[g]=y.charCodeAt(g)^g+y�;u(String.fromCharCode.apply(null,d��13:y=�,h=delete �[y]�14:���59:u((g=�)?(y=x,v.slice(x-=g,y�:[])�61:u(�[�])�62:g=�,k[0]=65599k[0]+k[1].charCodeAt(g)>>>0�65:h=�,y=�,�[y]=h�66:u(e(t[b�],�,���67:y=�,d=�,u((g=�).x===c?r(g.y,y,k):g.apply(d,y��68:u(e((g=t[b�])<"<"?(b--,f�):g+g,�,���70:u(!1)�71:�n�72:�+f��73:u(parseInt(f�,36��75:if(�){b��case 74:g=�<<16>>16�g�76:u(k[�])�77:y=�,u(�[y])�78:g=�,u(a(v,x-=g+1,g��79:g=�,u(k["$"+g])�81:h=�,�[f�]=h�82:u(�[f�])�83:h=�,k[�]=h�84:�!0�85:�void 0�86:u(v[x-1])�88:h=�,y=�,�h,�y�89:u(��{�e�{�r(e.y,arguments,k)}�e.y=f�,e.x=c,e}�)�90:�null�91:�h�93:h=��0:��;default:u((g<<16>>16)-16)}}�n=this,t=n.Function,s=Object.keys||�(e){�a={},r=0;for(�c in e)a[r�]=c;�a�=r,a},b={},k={};�r'.replace(/[�-�]/g,function(e){return l[15&e.charCodeAt(0)]})}("v[x++]=�v[--x]�t.charCodeAt(b++)-32�function �return �))�++�.substr�var �.length�()�,b+=�;break;case �;break}".split("�")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&eff�kx[!cs"l".Pq%widthl"@q&heightl"vrgetContextx$"2d[!cs#l#,;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2qshadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$j�l s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$bb^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"j�l s&l&z0l!$ +["cs'(0l#i'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0syWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!�+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l &+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl'g,)gk}ejo{�cm,)|ynLijem["cl$b%@d<l&zl'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ',[Object.defineProperty(e,"__esModule",{value:!0})])});

用pycharm运行出现这个错误

'node' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
Traceback (most recent call last):
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 411, in
CrawlerScheduler(content)
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 128, in init
self.scheduling()
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 150, in scheduling
self.download_challenge_videos(challenge)
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 165, in download_challenge_videos
video_count = self._download_challenge_media(challenge)
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 276, in _download_challenge_media
signature = self.generateSignature(str(challenge_id))
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 138, in generateSignature
return p.readlines()[0]
IndexError: list index out of range

_signature错误

Hi,我在爬用户信息时的也遇到了_signature字段,但是用fuck-byted-acrawler.js拿到的signature却用不了,请问下是怎么回事呢?
用户url: https://www.douyin.com/share/user/2613650662
抓数据的url: https://www.douyin.com/aweme/v1/aweme/post/?user_id=2613650662&count=21&max_cursor=0&aid=1128&_signature=vsISThAa5eS8bwWIdK-OVL7CEl
_signature生成的js: https://s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/pkg/third_ee38eac.js 里的/!douyin_falcon:node_modules/byted-acrawler/dist/runtime.js/
感谢!~

以用户模板抓取会被抖音限制访问频率

当以用户模板(share/user)抓取的时候,连续抓取超过20个模板后,就会被返回空字符串。需要等待一段时间后再抓取,比如一小时。
但是音乐模板并不会被限制。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.