Giter VIP home page Giter VIP logo

weibospider's Introduction

Sina Weibo Spider

Preface

这个微博爬虫的实现近乎贯穿了我本科学习大半的时间。前前后后经过了大约有3次颠覆级别的重构之后,现在看起来也算是稍微有了那么一点点看得过去的样子。经过小半个月的摸鱼划水,我也完成了对这个微博爬虫从架构的重构到具体实现的优化,以及设计提供自动化部署脚本等工作,使得这个爬虫比较与v1.0版本简直是高到不知道哪里去了。(膜法警告:warning:)其中,v1.0版本的代码可以在仓库分支v1.0中查看,确实是一段不堪回首的往事呀。

总而言之,在不断对地对代码进行重构和优化的过程中,我也在不断地学习。前路漫长,要学的东西还有很多,如果可能的话我会长期对这个项目进行维护(前提是有空摸鱼),也会提供尽可能详细的说明文档,欢迎各位大手子提PR或者Issue,同时也求一个小小的star🌟,也算是我为诸位还在为如何从微博采集数据烦恼的研究者们做出的一点微小的贡献吧。

为什么选择M站采集数据?

首先要说明最重要的一点是,本项目是基于开源爬虫框架Scrapy,针对新浪微博的移动站点,即M站,实现的一个单机高并发高性能的轻量微博爬虫。

解释一下什么是新浪微博的M站。随着一堆乱七八糟的技术的迅速发展(别问,问我我也不懂是啥),越来越多的国产手机APP(此处特指安卓下)都倾向于在APP中内置一个游览器内核(一般来说都是chromium,不过可能伴随一些魔改),然后通过前端开发实现APP的快速迭代。M站简单明了的说,就是用于为手机客户端的微博APP提供数据来源。(这一段中包含了我许许多多的胡说八道,如果有各种离谱的错误请尽管喷我)

通常,在实现爬虫之前,开发者都需要对目标站点的反爬措施进行充分的调研,然后才下手开干。对于微博来说,现目前已知有三个不同的域名都能够使用其提供的服务。其分别是PC站M站和一个我不知道该怎么称呼并且十分简陋的站。所谓PC站就是使用PC端游览器访问新浪微博所看到的网站,M站即前文所述,十分简陋站我现在也不知道是干嘛的,就很尴尬。

这三个站在反爬措施的严格程度上差异较大,其中PC站的反爬措施是最严格的,而M站和十分简陋站的反爬措施设置较为宽松。这之中的缘由也简单易懂,PC站一般而言作为诸多爬虫爱好者的首选目标,自然是承受了非常多的爬虫流量,新浪微博当然也会部署最为严格的反爬措施保证他不会被乱七八糟的爬虫搞崩掉。据我不那么完全的观察,github上现存有数个在3-4年前就已经停止维护的针对PC站的微博爬虫。

由于PC站的反爬虫措施十分的严格,开发者需要耗费大量的经历来绕过反爬机制。这些反爬机制(某些机制不一定仅限于PC站,其它站点同样也适用)主要包含有,如:

  1. 对异常IP流量的检测。(说人话就是单个IP的HTTP请求太多了会被封掉)
  2. 对用户数据的保护。(不登陆账号就不给你看完整的用户数据)
  3. 账户登陆IP的异常检测。(同一个账号使用者的IP不能够上一秒还在美国,下一秒就到了澳大利亚)
  4. 以及形形色色、乱七八糟的验证码等等。

同时,本文也大概地总结一下许许多多的前辈他们为了绕过PC的反爬措施需要做的工作:

  1. 首先是购买一批专门用来爬数据的小号,构建一个账户池。
  2. 然后仔细分析研究新浪微博的认证机制, 实现自动化的模拟登录,期间可能还会遇到验证码识别等困难,可能需要接入打码平台或者人工识别。(通过深度学习的方法自动化识别验证码又是另一个问题了)
  3. 通过伪造正常用户的登录过程,模拟登录构建,获取Cookie构建Cookie池,用这些cookie进行下一步的爬取。
  4. 购买一定数量的代理IP,为每个cookie(实际上是账户)绑定代理IP。
  5. 经过冲冲磨难最终才绕过了反爬,在爬取的过程中还要注意各cookie-IP的负载均衡,在cookie失效之后需要即使的清理。
  6. 综上所述,针对PC站爬取数据属实头铁,就算拿到数据之后还需要复杂的数据清洗,才能够得到最终的用户数据,并且采集效率极低,出错率高,很难保证在大规模的数据采集中,能够拿到完整的用户数据。

与之相反的是,微博的M站和十分简陋站的反爬措施就宽松很多。针对十分简陋站,有大佬已经开发了十分完整且可用性极强的爬虫,此处放上传送门nghuyong:WeiboSpider,能够获取千万级别甚至更高数量级的数据,可以采集到较为完整的微博用户数据。但这个爬虫有一个问题在于,即使十分简陋站的反爬较为宽松,但仍然需要购买小号建立cookie池之后才可以进行数据采集。当然,为了提高数据采集的速率,代理IP也是需要的。

综上所述,偷鸡的我选择了微博的M站开发爬虫,无需买小号构建cookie池,甚至也不需要代理IP(当然对采集速度有很大的限制),就实现了一个轻量高效地微博爬虫。

设计原理

简单阐述一下M站微博爬虫的设计原理。打开Firefox游览器(Chrome,Safari啥的都行),输入任意一个微博用户主页的M站网址,这里随便拿一个公众明星鞠婧祎的账户举例通过F12开发者工具,观察微博M站数据加载的过程,结果如下图所示。重点观察红框内的两个请求,事实上微博的M站通过AJAX来异步加载用户数据,红框内对应的两个链接,实际为鞠婧祎这个用户的账户信息获取接口,将这个URL提取出来即为:

https://m.weibo.cn/api/container/getIndex?type=uid&value=3669102477&containerid=1005053669102477

显然,3669102477是鞠婧祎这个账户的UID,而containerid的构造方法为100505+uid,由此分析得出了M站中用户账户资料的数据获取接口。

打开刚刚分析得到的数据接口,获取到JSON格式的用户数据,如下图所示,可以直接存储到MongoDB等非关系型数据库中。

同理可推,只要针对微博M站进行仔细的人工分析,就可以提取出微博用户数据请求的构造方法。并且,通过这样的数据接口获取数据不需要进行用户认证,也能够获取到较为完整的用户数据,意味着即使没有用户cookie,也能够对新浪微博进行大规模的数据采集。

本项目正是根据新浪微博M站这样的特点来构造微博爬虫。

为什么使用针对M站的微博爬虫?

咳咳,虽然不免有王婆卖瓜的嫌疑,但也要对本项目的核心亮点进行一下简要的阐述。

  1. 轻量:本项目的核心代码大概在500行左右,由于选择了最轻松的道路,所以实现的过程十分愉快,也尽可能的保证了项目的可扩展性和易用性,加之提供了自动化部署脚本,保证在使用上能够做到轻松愉悦。
  2. 易用:本项目不需要构建额外的用户池,最多只需要使用额外的代理IP来提高采集速度,就可以实现百万级别的用户数据采集,易用性非常高。
  3. 迅速:由于爬取到的数据本身即为JSON格式,所以基本无需进行数据清洗,也大大提高了爬虫的采集速率。同时,通过M站的数据接口获取的JSON数据信息丰度极高,通过一个请求就能够获取到10条左右的博文数据。

To Start

运行环境

  • 操作系统:常见的Linux发行版目测都是可行的(本机开发测试环境为Ubuntu 20.04)

  • Python >= 3.6.0,本机开发Python版本为3.8.10

  • MongoDB >= 4.2

  • Docker,开发环境的Docker版本为20.10.7,保持Docker版本最新即可

初始化

首先执行下列代码,将爬虫Clone到本地之后,安装相关环境依赖。

git clone [email protected]:CharesFang/WeiboSpider.git
cd WeiboSpider
pip install -r requirements.txt

然后为初始化脚本./init/init.sh赋予权限后执行,创建用于存储数据的MongoDB Docker Container.

sudo chmod 755 ./init/init.sh
./init/init.sh

Init.sh脚本会为创建MongoDB Container运行必要的配置文件,映射目录等。MongoDB Container数据存储在宿主机的目录为"$HOME/mongo".

然后,根据init.sh脚本的提示,执行下列命令,调用MongoDB数据库初始化脚本db_init.js,分别创建admin管理员用户和一般数据库使用者weibo,以及用于存储微博数据的数据库weibotweet, user等集合,请妥善保存这两个用户的密码。

sudo docker exec -it weibo mongo 127.0.0.1:27017 /etc/resource/db_init.js

最后,重写./WeiboSpider/database/DBconnector.py文件中的__init__方法,将自己的密码写入__init__方法中,用于爬虫连接MongoDB数据库。

def __init__(self):
  self.mongo_uri = "127.0.0.1" # 一般不会改写这个参数,因为连接的是本地Docker.
  self.mongo_database = "weibo" # init.sh中创建的`weibo`数据库.
  self.mongo_user_name = "weibo" # init.sh中创建的`weibo`数据库用户`weibo`.
  self.mongo_pass_wd = "Your password."

至此,完成爬虫的初始化设置。

启动爬虫

微博爬虫的调用方式同其他Scrapy爬虫一样,可以通过命令行或者Python脚本两种方法调用。

命令行调用

本项目目前实现了三个爬虫,它们的具体功能和命令行调用方法如下表所示。

Spider Name CMD Function
weibo_spider scrapy crawl weibo_spider -a uid=xxx|xxx 对目标微博用户的账户资料和所有博文进行采集,其中必须传入的参数"-a uid=xxx|xxx"为目标采集用户的uid,多个uid间以 | 分割。
user_info_spider scrapy crawl user_info_spdier -a uid=xxx|xxx 对目标微博用户的账户资料进行采集,参数传递同weibo_spider.
tweet_spider scrapy crwal tweet_spider -a uid=xxx|xxx 对目标微博用户的所有博文进行采集,参数传递同weibo_spider.

Markdown对于某些特殊字符的渲染不是特别到位,导致先前上述表格的显示不完整,现已经修复。

Python脚本调用

Python脚本调用实质上也是通过CMD调用爬虫,可以方便爬虫的调试。调用脚本示例如下。

from scrapy.cmdline import execute


if __name__ == '__main__':
    spider_cmd = "scrapy crawl weibo_spider -a uid=user0|user2"
    execute(spider_cmd.split())

Docs

To be contiuned...暂时先挖一个坑...

Init

WeiboSpider

Base

Conofig

Spiders

Items

Pipelines

Middlewares

Database

Extension

我需要做什么?

定义你自己的爬虫

weibospider's People

Contributors

charesfang avatar dependabot[bot] avatar fi3wey avatar mend-bolt-for-github[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

weibospider's Issues

CVE-2023-0286 (High) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-0286 - High Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

There is a type confusion vulnerability relating to X.400 address processing
inside an X.509 GeneralName. X.400 addresses were parsed as an ASN1_STRING but
the public structure definition for GENERAL_NAME incorrectly specified the type
of the x400Address field as ASN1_TYPE. This field is subsequently interpreted by
the OpenSSL function GENERAL_NAME_cmp as an ASN1_TYPE rather than an
ASN1_STRING.

When CRL checking is enabled (i.e. the application sets the
X509_V_FLAG_CRL_CHECK flag), this vulnerability may allow an attacker to pass
arbitrary pointers to a memcmp call, enabling them to read memory contents or
enact a denial of service. In most cases, the attack requires the attacker to
provide both the certificate chain and CRL, neither of which need to have a
valid signature. If the attacker only controls one of these inputs, the other
input must already contain an X.400 address as a CRL distribution point, which
is uncommon. As such, this vulnerability is most likely to only affect
applications which have implemented their own functionality for retrieving CRLs
over a network.

Publish Date: 2023-02-08

URL: CVE-2023-0286

CVSS 3 Score Details (7.4)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: High
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-x4qr-2fvf-3mr5

Release Date: 2023-02-08

Fix Resolution: openssl-3.0.8;cryptography - 39.0.1;openssl-src - 111.25.0+1.1.1t,300.0.12+3.0.8


Step up your Open Source Security Game with Mend here

tql

逻辑清晰啊,代码很漂亮

pip install 安装依赖出现报错

操作系统:CentOS Linux release 7.9.2009 (Core)
Python 环境:

Python 3.8.0 (default, Mar  9 2021, 08:45:15) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux

执行$ pip install -r requirements.txt后中断报如下错误:

Defaulting to user installation because normal site-packages is not writeable
Collecting lxml>=4.6.3
  Using cached lxml-4.6.3-cp38-cp38-manylinux2014_x86_64.whl (6.8 MB)
Collecting fake_useragent==0.1.11
  Using cached fake-useragent-0.1.11.tar.gz (13 kB)
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-wg2sl425/fake-useragent_6c3f5aa09e7a40b1a98f53238e8cb447/setup.py'"'"'; __file__='"'"'/tmp/pip-install-wg2sl425/fake-useragent_6c3f5aa09e7a40b1a98f53238e8cb447/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-x1ra41v3
         cwd: /tmp/pip-install-wg2sl425/fake-useragent_6c3f5aa09e7a40b1a98f53238e8cb447/
    Complete output (11 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/local/lib/python3.8/site-packages/setuptools/__init__.py", line 18, in <module>
        from setuptools.dist import Distribution
      File "/usr/local/lib/python3.8/site-packages/setuptools/dist.py", line 32, in <module>
        from setuptools import windows_support
      File "/usr/local/lib/python3.8/site-packages/setuptools/windows_support.py", line 2, in <module>
        import ctypes
      File "/usr/local/lib/python3.8/ctypes/__init__.py", line 7, in <module>
        from _ctypes import Union, Structure, Array
    ModuleNotFoundError: No module named '_ctypes'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/d1/79/af647635d6968e2deb57a208d309f6069d31cb138066d7e821e575112a80/fake-useragent-0.1.11.tar.gz#sha256=c104998b750eb097eefc28ae28e92d66397598d2cf41a31aa45d5559ef1adf35 (from https://pypi.org/simple/fake-useragent/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement fake_useragent==0.1.11 (from versions: 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.0.8, 0.0.9, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.1.10, 0.1.11)
ERROR: No matching distribution found for fake_useragent==0.1.11

我对 Python 不太熟,这是不是依赖出了什么问题 //

CVE-2022-2309 (High) detected in lxml-4.6.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl - autoclosed

CVE-2022-2309 - High Severity Vulnerability

Vulnerable Library - lxml-4.6.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl

Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.

Library home page: https://files.pythonhosted.org/packages/31/7d/eaaef39669aba3af5e8912fd21eeaa629da0aed8a9a71235b9ea00e61e36/lxml-4.6.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt,/requirements.txt

Dependency Hierarchy:

  • lxml-4.6.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

NULL Pointer Dereference allows attackers to cause a denial of service (or application crash). This only applies when lxml is used together with libxml2 2.9.10 through 2.9.14. libxml2 2.9.9 and earlier are not affected. It allows triggering crashes through forged input data, given a vulnerable code sequence in the application. The vulnerability is caused by the iterwalk function (also used by the canonicalize function). Such code shouldn't be in wide-spread use, given that parsing + iterwalk would usually be replaced with the more efficient iterparse function. However, an XML converter that serialises to C14N would also be vulnerable, for example, and there are legitimate use cases for this code sequence. If untrusted input is received (also remotely) and processed via iterwalk function, a crash can be triggered.

Publish Date: 2022-07-05

URL: CVE-2022-2309

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Release Date: 2022-07-05

Fix Resolution: lxml - 4.9.1


Step up your Open Source Security Game with Mend here

CVE-2023-4807 (High) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-4807 - High Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Issue summary: The POLY1305 MAC (message authentication code) implementation
contains a bug that might corrupt the internal state of applications on the
Windows 64 platform when running on newer X86_64 processors supporting the
AVX512-IFMA instructions.

Impact summary: If in an application that uses the OpenSSL library an attacker
can influence whether the POLY1305 MAC algorithm is used, the application
state might be corrupted with various application dependent consequences.

The POLY1305 MAC (message authentication code) implementation in OpenSSL does
not save the contents of non-volatile XMM registers on Windows 64 platform
when calculating the MAC of data larger than 64 bytes. Before returning to
the caller all the XMM registers are set to zero rather than restoring their
previous content. The vulnerable code is used only on newer x86_64 processors
supporting the AVX512-IFMA instructions.

The consequences of this kind of internal application state corruption can
be various - from no consequences, if the calling application does not
depend on the contents of non-volatile XMM registers at all, to the worst
consequences, where the attacker could get complete control of the application
process. However given the contents of the registers are just zeroized so
the attacker cannot put arbitrary values inside, the most likely consequence,
if any, would be an incorrect result of some application dependent
calculations or a crash leading to a denial of service.

The POLY1305 MAC algorithm is most frequently used as part of the
CHACHA20-POLY1305 AEAD (authenticated encryption with associated data)
algorithm. The most common usage of this AEAD cipher is with TLS protocol
versions 1.2 and 1.3 and a malicious client can influence whether this AEAD
cipher is used by the server. This implies that server applications using
OpenSSL can be potentially impacted. However we are currently not aware of
any concrete application that would be affected by this issue therefore we
consider this a Low severity security issue.

As a workaround the AVX512-IFMA instructions support can be disabled at
runtime by setting the environment variable OPENSSL_ia32cap:

OPENSSL_ia32cap=:~0x200000

The FIPS provider is not affected by this issue.

Publish Date: 2023-09-08

URL: CVE-2023-4807

CVSS 3 Score Details (7.8)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Local
    • Attack Complexity: Low
    • Privileges Required: Low
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: High
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.openssl.org/news/vulnerabilities.html

Release Date: 2023-09-08

Fix Resolution: openssl-3.0.11,openssl-3.1.3,OpenSSL_1_1_1w, cryptography - 41.0.4


Step up your Open Source Security Game with Mend here

CVE-2024-1892 (High) detected in Scrapy-2.6.2-py2.py3-none-any.whl

CVE-2024-1892 - High Severity Vulnerability

Vulnerable Library - Scrapy-2.6.2-py2.py3-none-any.whl

A high-level Web Crawling and Web Scraping framework

Library home page: https://files.pythonhosted.org/packages/e2/8a/e3870cd597bbd4f47d7e1c97bbb67a6293270b9c413e083058ce6d6c7eb7/Scrapy-2.6.2-py2.py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

A Regular Expression Denial of Service (ReDoS) vulnerability exists in the XMLFeedSpider class of the scrapy/scrapy project, specifically in the parsing of XML content. By crafting malicious XML content that exploits inefficient regular expression complexity used in the parsing process, an attacker can cause a denial-of-service (DoS) condition. This vulnerability allows for the system to hang and consume significant resources, potentially rendering services that utilize Scrapy for XML processing unresponsive.

Publish Date: 2024-02-28

URL: CVE-2024-1892

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2024-1892

Release Date: 2024-02-28

Fix Resolution: scrapy - 2.11.1


Step up your Open Source Security Game with Mend here

CVE-2023-45803 (Medium) detected in urllib3-1.26.12-py2.py3-none-any.whl

CVE-2023-45803 - Medium Severity Vulnerability

Vulnerable Library - urllib3-1.26.12-py2.py3-none-any.whl

HTTP library with thread-safe connection pooling, file post, and more.

Library home page: https://files.pythonhosted.org/packages/6f/de/5be2e3eed8426f871b170663333a0f627fc2924cc386cd41be065e7ea870/urllib3-1.26.12-py2.py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • requests-2.26.0-py2.py3-none-any.whl (Root Library)
    • urllib3-1.26.12-py2.py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

urllib3 is a user-friendly HTTP client library for Python. urllib3 previously wouldn't remove the HTTP request body when an HTTP redirect response using status 301, 302, or 303 after the request had its method changed from one that could accept a request body (like POST) to GET as is required by HTTP RFCs. Although this behavior is not specified in the section for redirects, it can be inferred by piecing together information from different sections and we have observed the behavior in other major HTTP client implementations like curl and web browsers. Because the vulnerability requires a previously trusted service to become compromised in order to have an impact on confidentiality we believe the exploitability of this vulnerability is low. Additionally, many users aren't putting sensitive data in HTTP request bodies, if this is the case then this vulnerability isn't exploitable. Both of the following conditions must be true to be affected by this vulnerability: 1. Using urllib3 and submitting sensitive information in the HTTP request body (such as form data or JSON) and 2. The origin service is compromised and starts redirecting using 301, 302, or 303 to a malicious peer or the redirected-to service becomes compromised. This issue has been addressed in versions 1.26.18 and 2.0.7 and users are advised to update to resolve this issue. Users unable to update should disable redirects for services that aren't expecting to respond with redirects with redirects=False and disable automatic redirects with redirects=False and handle 301, 302, and 303 redirects manually by stripping the HTTP request body.

Publish Date: 2023-10-17

URL: CVE-2023-45803

CVSS 3 Score Details (4.2)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Adjacent
    • Attack Complexity: High
    • Privileges Required: High
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: None
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-g4mx-q9vg-27p4

Release Date: 2023-10-17

Fix Resolution (urllib3): 1.26.18

Direct dependency fix Resolution (requests): 2.27.0


Step up your Open Source Security Game with Mend here

程序可以跑但是log file和mongodb都没有爬到数据

跑完程序以后出现了这个: Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', readpreference='primary', appname='MongoDB Compass', ssl=False), 'WeiboSpider') MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', readpreference='primary', appname='MongoDB Compass', ssl=False)

Process finished with exit code 0

但是同时在log file和mongodb里面都没有看到数据。想请教一下这种情况应该怎么办?我用的是本地的mongodb compass的standalone db。非常感谢各位大神!

CVE-2023-2650 (Medium) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-2650 - Medium Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Issue summary: Processing some specially crafted ASN.1 object identifiers or
data containing them may be very slow.

Impact summary: Applications that use OBJ_obj2txt() directly, or use any of
the OpenSSL subsystems OCSP, PKCS7/SMIME, CMS, CMP/CRMF or TS with no message
size limit may experience notable to very long delays when processing those
messages, which may lead to a Denial of Service.

An OBJECT IDENTIFIER is composed of a series of numbers - sub-identifiers -
most of which have no size limit. OBJ_obj2txt() may be used to translate
an ASN.1 OBJECT IDENTIFIER given in DER encoding form (using the OpenSSL
type ASN1_OBJECT) to its canonical numeric text form, which are the
sub-identifiers of the OBJECT IDENTIFIER in decimal form, separated by
periods.

When one of the sub-identifiers in the OBJECT IDENTIFIER is very large
(these are sizes that are seen as absurdly large, taking up tens or hundreds
of KiBs), the translation to a decimal number in text may take a very long
time. The time complexity is O(n^2) with 'n' being the size of the
sub-identifiers in bytes (*).

With OpenSSL 3.0, support to fetch cryptographic algorithms using names /
identifiers in string form was introduced. This includes using OBJECT
IDENTIFIERs in canonical numeric text form as identifiers for fetching
algorithms.

Such OBJECT IDENTIFIERs may be received through the ASN.1 structure
AlgorithmIdentifier, which is commonly used in multiple protocols to specify
what cryptographic algorithm should be used to sign or verify, encrypt or
decrypt, or digest passed data.

Applications that call OBJ_obj2txt() directly with untrusted data are
affected, with any version of OpenSSL. If the use is for the mere purpose
of display, the severity is considered low.

In OpenSSL 3.0 and newer, this affects the subsystems OCSP, PKCS7/SMIME,
CMS, CMP/CRMF or TS. It also impacts anything that processes X.509
certificates, including simple things like verifying its signature.

The impact on TLS is relatively low, because all versions of OpenSSL have a
100KiB limit on the peer's certificate chain. Additionally, this only
impacts clients, or servers that have explicitly enabled client
authentication.

In OpenSSL 1.1.1 and 1.0.2, this only affects displaying diverse objects,
such as X.509 certificates. This is assumed to not happen in such a way
that it would cause a Denial of Service, so these versions are considered
not affected by this issue in such a way that it would be cause for concern,
and the severity is therefore considered low.

Publish Date: 2023-05-30

URL: CVE-2023-2650

CVSS 3 Score Details (6.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.openssl.org/news/vulnerabilities.html

Release Date: 2023-05-30

Fix Resolution: OpenSSL_1_1_1u,openssl-3.0.9,openssl-3.1.1, cryptography - 41.0.0


Step up your Open Source Security Game with Mend here

CVE-2023-23931 (Medium) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-23931 - Medium Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. In affected versions Cipher.update_into would accept Python objects which implement the buffer protocol, but provide only immutable buffers. This would allow immutable objects (such as bytes) to be mutated, thus violating fundamental rules of Python and resulting in corrupted output. This now correctly raises an exception. This issue has been present since update_into was originally introduced in cryptography 1.8.

Publish Date: 2023-02-07

URL: CVE-2023-23931

CVSS 3 Score Details (6.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: Low
    • Availability Impact: Low

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2023-23931

Release Date: 2023-02-07

Fix Resolution: cryptography - 39.0.1


Step up your Open Source Security Game with Mend here

CVE-2023-49083 (High) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-49083 - High Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. Calling load_pem_pkcs7_certificates or load_der_pkcs7_certificates could lead to a NULL-pointer dereference and segfault. Exploitation of this vulnerability poses a serious risk of Denial of Service (DoS) for any application attempting to deserialize a PKCS7 blob/certificate. The consequences extend to potential disruptions in system availability and stability. This vulnerability has been patched in version 41.0.6.

Publish Date: 2023-11-29

URL: CVE-2023-49083

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2023-49083

Release Date: 2023-11-29

Fix Resolution: cryptography - 41.0.6


Step up your Open Source Security Game with Mend here

出现Type Error

当我输入 scrapy crawl HotSearchSpider 时,出现了 builtins.TypeError: expected string or bytes-like object

(VirtualEnv) D:\Documents\HCI\Sec 4\CSC\Research Paper\WeiboSpider-master\WeiboSpider>scrapy crawl HotSearchSpider
Unhandled error in Deferred:

Traceback (most recent call last):
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\crawler.py", line 172, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\crawler.py", line 176, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\twisted\internet\defer.py", line 1656, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\twisted\internet\defer.py", line 1571, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\twisted\internet\defer.py", line 1445, in _inlineCallbacks
    result = current_context.run(g.send, result)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\crawler.py", line 80, in crawl
    self.engine = self._create_engine()
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\crawler.py", line 105, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\core\downloader\__init__.py", line 88, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "d:\documents\hci\sec 4\csc\research paper\weibospider-master\virtualenv\lib\site-packages\scrapy\utils\misc.py", line 140, in create_instance
    return objcls.from_crawler(crawler, *args, **kwargs)
  File "D:\Documents\HCI\Sec 4\CSC\Research Paper\WeiboSpider-master\WeiboSpider\WeiboSpider\middlewares.py", line 68, in from_crawler
    ip_num = int(re.findall(r'count=\d+', api)[0][6:])
  File "C:\Users\RJ008\.pyenv\pyenv-win\versions\3.9.0\lib\re.py", line 241, in findall
    return _compile(pattern, flags).findall(string)
builtins.TypeError: expected string or bytes-like object

请问这问题怎么解决呢?谢谢。

CVE-2023-43804 (High) detected in urllib3-1.26.12-py2.py3-none-any.whl

CVE-2023-43804 - High Severity Vulnerability

Vulnerable Library - urllib3-1.26.12-py2.py3-none-any.whl

HTTP library with thread-safe connection pooling, file post, and more.

Library home page: https://files.pythonhosted.org/packages/6f/de/5be2e3eed8426f871b170663333a0f627fc2924cc386cd41be065e7ea870/urllib3-1.26.12-py2.py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • requests-2.26.0-py2.py3-none-any.whl (Root Library)
    • urllib3-1.26.12-py2.py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

urllib3 is a user-friendly HTTP client library for Python. urllib3 doesn't treat the Cookie HTTP header special or provide any helpers for managing cookies over HTTP, that is the responsibility of the user. However, it is possible for a user to specify a Cookie header and unknowingly leak information via HTTP redirects to a different origin if that user doesn't disable redirects explicitly. This issue has been patched in urllib3 version 1.26.17 or 2.0.5.

Publish Date: 2023-10-04

URL: CVE-2023-43804

CVSS 3 Score Details (8.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: Low
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2023-43804

Release Date: 2023-10-04

Fix Resolution (urllib3): 1.26.17

Direct dependency fix Resolution (requests): 2.27.0


Step up your Open Source Security Game with Mend here

CVE-2024-26130 (High) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2024-26130 - High Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. Starting in version 38.0.0 and prior to version 42.0.4, if pkcs12.serialize_key_and_certificates is called with both a certificate whose public key did not match the provided private key and an encryption_algorithm with hmac_hash set (via PrivateFormat.PKCS12.encryption_builder().hmac_hash(...), then a NULL pointer dereference would occur, crashing the Python process. This has been resolved in version 42.0.4, the first version in which a ValueError is properly raised.

Publish Date: 2024-02-21

URL: CVE-2024-26130

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-6vqw-3v5j-54x4

Release Date: 2024-02-21

Fix Resolution: cryptography - 42.0.4


Step up your Open Source Security Game with Mend here

CVE-2023-50782 (High) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-50782 - High Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

A flaw was found in the python-cryptography package. This issue may allow a remote attacker to decrypt captured messages in TLS servers that use RSA key exchanges, which may lead to exposure of confidential or sensitive data.

Publish Date: 2024-02-05

URL: CVE-2023-50782

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: None
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-3ww4-gg4f-jr7f

Release Date: 2024-02-05

Fix Resolution: cryptography - 42.0.0


Step up your Open Source Security Game with Mend here

更新维护说明

额,最近又懒下去了,功能随缘扩展,文档随缘写,有需求请疯狂push我

CVE-2023-46137 (Medium) detected in Twisted-22.4.0-py3-none-any.whl

CVE-2023-46137 - Medium Severity Vulnerability

Vulnerable Library - Twisted-22.4.0-py3-none-any.whl

An asynchronous networking framework written in Python

Library home page: https://files.pythonhosted.org/packages/db/99/38622ff95bb740bcc991f548eb46295bba62fcb6e907db1987c4d92edd09/Twisted-22.4.0-py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • Twisted-22.4.0-py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Twisted is an event-based framework for internet applications. Prior to version 23.10.0rc1, when sending multiple HTTP requests in one TCP packet, twisted.web will process the requests asynchronously without guaranteeing the response order. If one of the endpoints is controlled by an attacker, the attacker can delay the response on purpose to manipulate the response of the second request when a victim launched two requests using HTTP pipeline. Version 23.10.0rc1 contains a patch for this issue.

Publish Date: 2023-10-25

URL: CVE-2023-46137

CVSS 3 Score Details (5.3)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-xc8x-vp79-p3wm

Release Date: 2023-10-25

Fix Resolution: twisted - 23.10.0


Step up your Open Source Security Game with Mend here

CVE-2022-23491 (High) detected in certifi-2022.6.15-py3-none-any.whl

CVE-2022-23491 - High Severity Vulnerability

Vulnerable Library - certifi-2022.6.15-py3-none-any.whl

Python package for providing Mozilla's CA Bundle.

Library home page: https://files.pythonhosted.org/packages/e9/06/d3d367b7af6305b16f0d28ae2aaeb86154fa91f144f036c2d5002a5a202b/certifi-2022.6.15-py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • requests-2.26.0-py2.py3-none-any.whl (Root Library)
    • certifi-2022.6.15-py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Certifi is a curated collection of Root Certificates for validating the trustworthiness of SSL certificates while verifying the identity of TLS hosts. Certifi 2022.12.07 removes root certificates from "TrustCor" from the root store. These are in the process of being removed from Mozilla's trust store. TrustCor's root certificates are being removed pursuant to an investigation prompted by media reporting that TrustCor's ownership also operated a business that produced spyware. Conclusions of Mozilla's investigation can be found in the linked google group discussion.

Publish Date: 2022-12-07

URL: CVE-2022-23491

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2022-23491

Release Date: 2022-12-07

Fix Resolution (certifi): 2022.12.7

Direct dependency fix Resolution (requests): 2.27.0


Step up your Open Source Security Game with Mend here

CVE-2024-21506 (Medium) detected in pymongo-3.11.4-cp37-cp37m-manylinux2014_x86_64.whl

CVE-2024-21506 - Medium Severity Vulnerability

Vulnerable Library - pymongo-3.11.4-cp37-cp37m-manylinux2014_x86_64.whl

Python driver for MongoDB

Library home page: https://files.pythonhosted.org/packages/b1/29/c0c8791ba972456f8aa3f027af33206499bc9f52a948e0d9c10909339b3c/pymongo-3.11.4-cp37-cp37m-manylinux2014_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • pymongo-3.11.4-cp37-cp37m-manylinux2014_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Versions of the package pymongo before 4.6.3 are vulnerable to Out-of-bounds Read in the bson module. Using the crafted payload the attacker could force the parser to deserialize unmanaged memory. The parser tries to interpret bytes next to buffer and throws an exception with string. If the following bytes are not printable UTF-8 the parser throws an exception with a single byte.

Publish Date: 2024-04-06

URL: CVE-2024-21506

CVSS 3 Score Details (5.2)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Local
    • Attack Complexity: High
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: Low

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2024-21506

Release Date: 2024-04-06

Fix Resolution: pymongo - 4.6.3


Step up your Open Source Security Game with Mend here

CVE-2023-37920 (Critical) detected in certifi-2022.6.15-py3-none-any.whl

CVE-2023-37920 - Critical Severity Vulnerability

Vulnerable Library - certifi-2022.6.15-py3-none-any.whl

Python package for providing Mozilla's CA Bundle.

Library home page: https://files.pythonhosted.org/packages/e9/06/d3d367b7af6305b16f0d28ae2aaeb86154fa91f144f036c2d5002a5a202b/certifi-2022.6.15-py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • requests-2.26.0-py2.py3-none-any.whl (Root Library)
    • certifi-2022.6.15-py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Certifi is a curated collection of Root Certificates for validating the trustworthiness of SSL certificates while verifying the identity of TLS hosts. Certifi prior to version 2023.07.22 recognizes "e-Tugra" root certificates. e-Tugra's root certificates were subject to an investigation prompted by reporting of security issues in their systems. Certifi 2023.07.22 removes root certificates from "e-Tugra" from the root store.

Publish Date: 2023-07-25

URL: CVE-2023-37920

CVSS 3 Score Details (9.8)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: High
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-xqr8-7jwr-rhp7

Release Date: 2023-07-25

Fix Resolution (certifi): 2023.7.22

Direct dependency fix Resolution (requests): 2.27.0


Step up your Open Source Security Game with Mend here

CVE-2023-3446 (Medium) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-3446 - Medium Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Issue summary: Checking excessively long DH keys or parameters may be very slow.

Impact summary: Applications that use the functions DH_check(), DH_check_ex()
or EVP_PKEY_param_check() to check a DH key or DH parameters may experience long
delays. Where the key or parameters that are being checked have been obtained
from an untrusted source this may lead to a Denial of Service.

The function DH_check() performs various checks on DH parameters. One of those
checks confirms that the modulus ('p' parameter) is not too large. Trying to use
a very large modulus is slow and OpenSSL will not normally use a modulus which
is over 10,000 bits in length.

However the DH_check() function checks numerous aspects of the key or parameters
that have been supplied. Some of those checks use the supplied modulus value
even if it has already been found to be too large.

An application that calls DH_check() and supplies a key or parameters obtained
from an untrusted source could be vulernable to a Denial of Service attack.

The function DH_check() is itself called by a number of other OpenSSL functions.
An application calling any of those other functions may similarly be affected.
The other functions affected by this are DH_check_ex() and
EVP_PKEY_param_check().

Also vulnerable are the OpenSSL dhparam and pkeyparam command line applications
when using the '-check' option.

The OpenSSL SSL/TLS implementation is not affected by this issue.
The OpenSSL 3.0 and 3.1 FIPS providers are not affected by this issue.

Publish Date: 2023-07-19

URL: CVE-2023-3446

CVSS 3 Score Details (5.3)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: Low

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.openssl.org/news/secadv/20230714.txt

Release Date: 2023-07-19

Fix Resolution: openssl-3.0.10,openssl-3.1.2, cryptography - 41.0.3


Step up your Open Source Security Game with Mend here

CVE-2023-32681 (Medium) detected in requests-2.26.0-py2.py3-none-any.whl

CVE-2023-32681 - Medium Severity Vulnerability

Vulnerable Library - requests-2.26.0-py2.py3-none-any.whl

Python HTTP for Humans.

Library home page: https://files.pythonhosted.org/packages/92/96/144f70b972a9c0eabbd4391ef93ccd49d0f2747f4f6a2a2738e99e5adc65/requests-2.26.0-py2.py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt,/requirements.txt

Dependency Hierarchy:

  • requests-2.26.0-py2.py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Requests is a HTTP library. Since Requests 2.3.0, Requests has been leaking Proxy-Authorization headers to destination servers when redirected to an HTTPS endpoint. This is a product of how we use rebuild_proxies to reattach the Proxy-Authorization header to requests. For HTTP connections sent through the tunnel, the proxy will identify the header in the request itself and remove it prior to forwarding to the destination server. However when sent over HTTPS, the Proxy-Authorization header must be sent in the CONNECT request as the proxy has no visibility into the tunneled request. This results in Requests forwarding proxy credentials to the destination server unintentionally, allowing a malicious actor to potentially exfiltrate sensitive information. This issue has been patched in version 2.31.0.

Publish Date: 2023-05-26

URL: CVE-2023-32681

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: High
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: None
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-j8r2-6x86-q33q

Release Date: 2023-05-26

Fix Resolution: requests -2.31.0


Step up your Open Source Security Game with Mend here

WS-2022-0181 (Medium) detected in Scrapy-2.6.0-py2.py3-none-any.whl - autoclosed

WS-2022-0181 - Medium Severity Vulnerability

Vulnerable Library - Scrapy-2.6.0-py2.py3-none-any.whl

A high-level Web Crawling and Web Scraping framework

Library home page: https://files.pythonhosted.org/packages/89/47/10c1197316233761afb5522d48a8e27f65389044ddda58d91d0eaeaecd20/Scrapy-2.6.0-py2.py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.0-py2.py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Scrapy before v2.6.2 and v1.8.3 vulnerable to one proxy sending credentials to another

Publish Date: 2022-07-29

URL: WS-2022-0181

CVSS 3 Score Details (5.3)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: None
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-9x8m-2xpf-crp3

Release Date: 2022-07-29

Fix Resolution: Scrapy - 1.8.3,2.6.2


Step up your Open Source Security Game with Mend here

CVE-2024-3572 (High) detected in Scrapy-2.6.2-py2.py3-none-any.whl

CVE-2024-3572 - High Severity Vulnerability

Vulnerable Library - Scrapy-2.6.2-py2.py3-none-any.whl

A high-level Web Crawling and Web Scraping framework

Library home page: https://files.pythonhosted.org/packages/e2/8a/e3870cd597bbd4f47d7e1c97bbb67a6293270b9c413e083058ce6d6c7eb7/Scrapy-2.6.2-py2.py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

The scrapy/scrapy project is vulnerable to XML External Entity (XXE) attacks due to the use of lxml.etree.fromstring for parsing untrusted XML data without proper validation. This vulnerability allows attackers to perform denial of service attacks, access local files, generate network connections, or circumvent firewalls by submitting specially crafted XML data.

Publish Date: 2024-04-16

URL: CVE-2024-3572

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2024-3572

Release Date: 2024-04-16

Fix Resolution: scrapy - 2.11.1


Step up your Open Source Security Game with Mend here

WS-2022-0365 (Critical) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

WS-2022-0365 - Critical Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

pyca/cryptography's wheels include a statically linked copy of OpenSSL. The versions of OpenSSL included in cryptography 37.0.0-38.0.3 are vulnerable to a number of security issues. If you are building cryptography source ("sdist") then you are responsible for upgrading your copy of OpenSSL. Only users installing from wheels built by the cryptography project (i.e., those distributed on PyPI) need to update their cryptography versions.

Publish Date: 2022-11-02

URL: WS-2022-0365

CVSS 3 Score Details (9.8)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: High
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-39hc-v87j-747x

Release Date: 2022-11-02

Fix Resolution: cryptography - 38.0.3


Step up your Open Source Security Game with Mend here

CVE-2023-38325 (High) detected in cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-38325 - High Severity Vulnerability

Vulnerable Library - cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/9b/4e/d7454551c3c7b327510e35d88db35c300484225ba47be861e28f0b520b33/cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • cryptography-38.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

The cryptography package before 41.0.2 for Python mishandles SSH certificates that have critical options.

Publish Date: 2023-07-14

URL: CVE-2023-38325

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2023-38325

Release Date: 2023-07-14

Fix Resolution: cryptography - 41.0.2


Step up your Open Source Security Game with Mend here

CVE-2022-39348 (Medium) detected in Twisted-22.4.0-py3-none-any.whl

CVE-2022-39348 - Medium Severity Vulnerability

Vulnerable Library - Twisted-22.4.0-py3-none-any.whl

An asynchronous networking framework written in Python

Library home page: https://files.pythonhosted.org/packages/db/99/38622ff95bb740bcc991f548eb46295bba62fcb6e907db1987c4d92edd09/Twisted-22.4.0-py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt

Dependency Hierarchy:

  • Scrapy-2.6.2-py2.py3-none-any.whl (Root Library)
    • Twisted-22.4.0-py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Twisted is an event-based framework for internet applications. Started with version 0.9.4, when the host header does not match a configured host twisted.web.vhost.NameVirtualHost will return a NoResource resource which renders the Host header unescaped into the 404 response allowing HTML and script injection. In practice this should be very difficult to exploit as being able to modify the Host header of a normal HTTP request implies that one is already in a privileged position. This issue was fixed in version 22.10.0rc1. There are no known workarounds.

Publish Date: 2022-10-26

URL: CVE-2022-39348

CVSS 3 Score Details (5.4)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: Low
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://nvd.nist.gov/vuln/detail/CVE-2022-39348

Release Date: 2022-10-26

Fix Resolution: twisted - 19.2.1,18.4.0;Twisted - 22.10.0rc1


Step up your Open Source Security Game with Mend here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.