Giter VIP home page Giter VIP logo

caipanwenshu's Introduction

2020-04-23更新

ciphertext加密,result解密。
为防止滥用,HM4hUBT0dDOn80T生成代码暂不公开,考虑后续适时公开。
现有接口仅供测试,不可用于非法目的。

caipanwenshu's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caipanwenshu's Issues

请问在获取左侧Tree结构时,“上传日期”是什么参数

class DocSpider(scrapy.Spider):
name = 'doc'
allowed_domains = ['gov.cn']
start_urls = ['http://gov.cn/']

def start_requests(self):
    url = 'http://wenshu.court.gov.cn/list/list/?sorttype=1&conditions=searchWord+%E5%90%88%E5%90%8C+++%E5%85%B3%E9%94%AE%E8%AF%8D:%E5%90%88%E5%90%8C'
    yield scrapy.Request(url, callback=self.parse)

def parse(self, response):
    """
    根据日期查询分类数量
    :param response:
    :return:
    """
    cookie = response.headers['Set-Cookie'].split(';')[0][6:]
    vjkl5 = getvjkl5(cookie)
    for index in range(1, 2):
        end_day = (datetime.datetime.now() - datetime.timedelta(days=index)).date().__str__()
        start_day = (datetime.datetime.now() - datetime.timedelta(days=index+1)).date().__str__()
        Param = u'上传日期:{} TO {}'.format(start_day, end_day)
        data = {'Param': Param, 'vl5x': vjkl5}
        yield scrapy.FormRequest('http://wenshu.court.gov.cn/List/TreeContent', headers={'Cookie': cookie},
                                 callback=self.get_tree_list, formdata=data,
                                 meta={'cookie': cookie, 'vjkl5': vjkl5, 'Param': Param,
                                       'type_list': [u'法院地域', u'文书类型', u'法院层级',u'审判程序', u'裁判年份', u'一级案由']})

如上所示,在获取左侧tree结构时,有一个参数名字叫“上传日期”,对照裁判文书网抓包记录中,未发现有参数名字叫这个啊?求示意。

先后用了python3.7和python2.7跑,都跑不通

是不是网站又更新了反爬策略?
报错如下:
2018-12-26 17:13:54 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: wenshu) 2018-12-26 17:13:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Jul 13 2018, 13:06:57) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0f 25 May 2017), cryptography 2.1, Platform Linux-3.10.0-514.26.2.el7.x86_64-x86_64-with-centos-7.2.1511-Core 2018-12-26 17:13:54 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'wenshu.spiders', 'ROBOTSTXT_OBEY': True, 'CONCURRENT_REQUESTS': 8, 'SPIDER_MODULES': ['wenshu.spiders'], 'BOT_NAME': 'wenshu', 'DOWNLOAD_DELAY': 3} 2018-12-26 17:13:54 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.corestats.CoreStats'] 2018-12-26 17:13:54 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2018-12-26 17:13:54 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2018-12-26 17:13:54 [scrapy.middleware] INFO: Enabled item pipelines: ['wenshu.pipelines.WenshuPipeline'] 2018-12-26 17:13:54 [scrapy.core.engine] INFO: Spider opened 2018-12-26 17:13:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-12-26 17:13:54 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2018-12-26 17:13:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://wenshu.court.gov.cn/robots.txt> (referer: None) 2018-12-26 17:14:02 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://wenshu.court.gov.cn/List/List?sorttype=1&conditions=searchWord+1+AJLX++%E6%A1%88%E4%BB%B6%E7%B1%BB%E5%9E%8B:%E5%88%91%E4%BA%8B%E6%A1%88%E4%BB%B6> (referer: None) 2018-12-26 17:14:08 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://wenshu.court.gov.cn/List/TreeContent> (referer: http://wenshu.court.gov.cn/List/List?sorttype=1&conditions=searchWord+1+AJLX++%E6%A1%88%E4%BB%B6%E7%B1%BB%E5%9E%8B:%E5%88%91%E4%BA%8B%E6%A1%88%E4%BB%B6) 2018-12-26 17:14:08 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://wenshu.court.gov.cn/List/TreeContent> (referer: http://wenshu.court.gov.cn/List/TreeContent) 2018-12-26 17:14:15 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://wenshu.court.gov.cn/List/TreeContent> (referer: http://wenshu.court.gov.cn/List/TreeContent) 2018-12-26 17:14:17 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://wenshu.court.gov.cn/List/TreeContent> (referer: http://wenshu.court.gov.cn/List/TreeContent) 2018-12-26 17:14:20 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://wenshu.court.gov.cn/List/ListContent> (referer: http://wenshu.court.gov.cn/List/TreeContent) 2018-12-26 17:14:20 [scrapy.core.scraper] ERROR: Spider error processing <POST http://wenshu.court.gov.cn/List/ListContent> (referer: http://wenshu.court.gov.cn/List/TreeContent) Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output for x in result: File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr> return (_set_referer(r) for r in result or ()) File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr> return (r for r in result or () if _filter(r)) File "/usr/lib64/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr> return (r for r in result or () if _filter(r)) File "/www/wenshu/wenshu/spiders/doc.py", line 95, in get_doc_list key = getkey(format_key_str).encode('utf-8') File "/www/wenshu/wenshu/utils/docid_v27.py", line 105, in getkey c = execjs.compile(js_str) File "/usr/lib/python2.7/site-packages/execjs/__init__.py", line 61, in compile return get().compile(source, cwd) File "/usr/lib/python2.7/site-packages/execjs/_runtimes.py", line 21, in get return get_from_environment() or _find_available_runtime() File "/usr/lib/python2.7/site-packages/execjs/_runtimes.py", line 49, in _find_available_runtime raise exceptions.RuntimeUnavailableError("Could not find an available JavaScript runtime.") RuntimeUnavailableError: Could not find an available JavaScript runtime.

请问如何设置查询条件

比如限定时间2020年,我看了chrome devtool里面的请求是设置queryCondition,但是我在demo.py里加了不起作用,直接失败了。谢谢

vl5x.py中算法可能有问题,有的cookie值算不对。

有时执行正常,有时执行错误。报错如下

2018-12-18 17:46:21 [scrapy.core.scraper] ERROR: Spider error processing <GET http://wenshu.court.gov.cn/list/list/?sorttype=1> (referer: None)
Traceback (most recent call last):
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output
for x in result:
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\spiders\wenshu.py", line 20, in parse
vjkl5 = getvjkl5(cookie)
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\utils\vl5x.py", line 1806, in getvjkl5
vjkl5 = arrFunfunIndex
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\utils\vl5x.py", line 755, in makeKey_150
return md5(makeKey_14(str1) + makeKey_19(str1))[1: 1 + 24]
File "D:\Work\JingShOnline\Spider\wenshu\wenshu\utils\vl5x.py", line 175, in makeKey_14
b = base64.b64encode(s[1:] + s[5:] + s[1:4])
File "c:\users\qiqing\appdata\local\programs\python\python37\lib\base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.