Giter VIP home page Giter VIP logo

kaixindelele / chatpaper Goto Github PK

View Code? Open in Web Editor NEW
17.7K 87.0 1.9K 36.16 MB

Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复

Home Page: https://chatwithpaper.org

License: Other

Python 69.87% Jupyter Notebook 14.62% Shell 0.19% Dockerfile 0.63% Makefile 0.13% Batchfile 0.16% TeX 14.39%
arxiv paper

chatpaper's Introduction

Hi there is kaixindelele 👋

寻求一份LLM相关的大厂工作。目前正在准备校招,暂时不考虑实习,除非 条件非常契合。

简历详情

Yongle Luo

电子邮箱:[email protected]
作品链接:Github (19000+ stars)
博客链接:知乎:强化学徒 (19K 关注)

求职意向

RLHF,or 基于LLM的具身智能, or LLM+长文本总结和对话等落地应用,or LLM+Robot or Auto+

期望能有机会进入大厂的核心团队,或者中厂的有钱团队。

教育经历

郑州大学 | 自动化 | 本科 | 2013-2017

中国科学技术大学 | 模式识别与智能系统 | 研二转博;博士四年级在读 | 2017-至今

研究经历

深度强化学习代码库DRLib

  • 基于Spinning UP封装的深度强化学习算法:DQN、DDPG、TD3、SAC、PPO、PER、HER等。
  • 深度强化学习算法链接:DRLib (438 stars)

稀疏奖励矫正密集奖励的强化学习

  • 论文综合稀疏奖励全局收敛但效率低下以及密集奖励收敛快但容易局部最优的特点,提出dense2sparse解决方案,兼顾二者的优势,同时提高探索效率和最终性能。
  • 《Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty 》(共一,机器人会议 2022 AIM,Oral Presentation)
  • 23年改进版《D2SR: Transferring Dense Reward Function to Sparse by Network Resetting》,有效解决多奖励函数切换的稳定性问题,性能大幅提升,大幅降低奖励函数设计要求(一作,机器人EI会议RCAR, Oral,非常有意思的工作)

乒乓球仿真搭建和真机验证—深度强化学习的单步决策高效学习

  • 基于Mujoco物理引擎的乒乓球击球平台,实现与真机类似的击球效果。将击球任务建模成单步强化,利用HER的重标记获得完美样本,用于自我引导探索,实现对数据的高效利用。真机验证结果显示200个回合可以实现92%的落点成功率。
  • 《SIRL: Self-Imitation Reinforcement Learning for Single-step Hitting Tasks》(一作,CAA,A类会议,ARM)

自我引导持续强化学习—彻底解决深度强化学习,在稀疏奖励下复杂序列任务中效率低下的问题

  • 首次提出自我引导探索的强化学习框架。面对奖励反馈稀疏的复杂任务,该算法可以让智能体从失败中提取有效信息,积极探索,不断积累优势,最终实现高效学习。在一到三物体的各类操作任务中都取得极高探索效率,真机实验从零开始训练仅需250回合即可达100%成功率,是本人博士期间最有学术价值的工作。
  • 在此基础上的另外一个侧重于策略优化的工作正在撰写,可以使得样本效率再次提高60%以上。
  • 代码已开源:RHER; 论文已在ArXiv公布:Relay Hindsight Experience Replay(一作,NeuroComputing, 二区Top,已接收)

证书及项目经历

  • 证书:英语四六级、心理咨询师三级
  • 项目经历:
    • 开源ChatPaper,获得16.0K star,GitHub连续三天热榜第五,月活60W,注册用户7W。
    • 开源ChatOpenReview项目:1. 利用langchain实现基于数据库的审稿回复辅助;2. 基于deepspeed做模型SFT;3. 借助搜索引擎,实现全局文献库的审稿。
    • 基于Spinning UP封装的深度强化学习算法:DQN、DDPG、TD3、SAC、PPO、PER、HER等(DRLib 438 star)。
    • 基于强化学习的竞技型乒乓球机器人运动控制系统研发(横向,148w,本人负责 仿真系统搭建和强化算法)
    • LLM+Robot技能库的研发ing,已实现技能库的开发和初步验证。
    • ChatSensitiveWords,利用敏感词库+LLM实现弹性敏感词检测。兼顾效果和速度。

自我评价

  • 熟练掌握经典深度强化学习算法,拥有丰富的机器人仿真和真实系统搭建经验。
  • 品行良好,为人坦率靠谱。工程能力强,编程基础扎实,但没有系统做过算法题,可结合GPT4快速开发。
  • 擅长将人类学习经验应用于人工智能领域,科研能力优秀,拥有丰富的团队合作经验,热爱开源、技术分享和教学。
  • 希望能结合大模型的文本能力,做高等教育的AI辅助,或者LLM+RL的微调,或者LLM的其他应用。

chatpaper's People

Contributors

binary-husky avatar circlestarzero avatar housiyuan2001 avatar jaseon-q avatar jessytsu1 avatar kaixindelele avatar masteryip avatar mrpeterjin avatar nishiwen1214 avatar red-tie avatar uppez avatar wangrongsheng avatar xuzhougeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatpaper's Issues

argparse解析参数的错误/bug for argparse

在命令行执行python文件chat_paper.py 输入参数--sort arxiv.SortCriterion.LastUpdatedDate会报错

AttributeError: 'str' object has no attribute 'value'

经过检查发现在chat_paper.py文件中解析--sort参数的代码是

parser.add_argument("--sort", default=arxiv.SortCriterion.Relevance, help="another is arxiv.SortCriterion.LastUpdatedDate")

可能是由于parser.add_argument方法中没有指定type参数,因此python默认将arxiv.SortCriterion.LastUpdatedDate作为str类型解析,则与预期不符,由于arxiv.SortCriterion是枚举类型,因此应该先指定type参数

parser.add_argument("--sort", default=arxiv.SortCriterion.Relevance,type=arxiv.SortCriterion, help="another is arxiv.SortCriterion.LastUpdatedDate")

并在命令行传入枚举类型的值--sort lastUpdatedDate 才能正常运行
请问这是一个bug么?

Mac M1运行报错

Mac M1运行报错
无法正常使用

代码:python chat_paper.py --pdf_path ./test.pdf
输出:

Key word: reinforcement learning
Query: all: ChatGPT robot
Sort: SortCriterion.Relevance
Traceback (most recent call last):
  File "/Users/dx/workdir/software/ChatPaper/chat_paper.py", line 412, in <module>
    main(args=args)
  File "/Users/dx/workdir/software/ChatPaper/chat_paper.py", line 382, in main
    paper_list = [Paper(path=args.pdf_path)]
  File "/Users/dx/workdir/software/ChatPaper/get_paper_from_pdf.py", line 14, in __init__
    self.title = self.get_title()
  File "/Users/dx/workdir/software/ChatPaper/get_paper_from_pdf.py", line 121, in get_title
    font_size = block["lines"][0]["spans"][0]["size"] # 获取第一行第一段文字的字体大小
IndexError: list index out of range

这有办法解决吗

论文太长了怎么办

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4256 tokens. Please reduce the length of the messages.

将静态资源内置于项目内

我搭建了一个公益站点
但国内大部分地区加载不了cloudflare
且加载fonts.googleapis.com异常缓慢
希望将静态资源内置于项目内

处理一些没有Introduction 的文章时会报错Introduction error 。

Traceback (most recent call last):
File "C:\Users\admin\Documents\GitHub\ChatPaper\chat_paper.py", line 471, in
main(args=args)
File "C:\Users\admin\Documents\GitHub\ChatPaper\chat_paper.py", line 436, in main
paper_list.append(Paper(path=os.path.join(root, filename)))
File "C:\Users\admin\Documents\GitHub\ChatPaper\get_paper_from_pdf.py", line 17, in init
self.parse_pdf()
File "C:\Users\admin\Documents\GitHub\ChatPaper\get_paper_from_pdf.py", line 33, in parse_pdf
self.section_text_dict.update({"paper_info": self.get_paper_info()})
File "C:\Users\admin\Documents\GitHub\ChatPaper\get_paper_from_pdf.py", line 42, in get_paper_info
introduction_text = self.section_text_dict['Introduction']
KeyError: 'Introduction'

UI界面

不知大佬有没有兴趣做一个界面出来(o゜▽゜)o☆

[Bug] OSError: cannot write mode RGBA as JPEG

Traceback (most recent call last):
File "D:\Anaconda\lib\site-packages\PIL\JpegImagePlugin.py", line 643, in _save
rawmode = RAWMODE[im.mode]
KeyError: 'RGBA'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File ".\chat_paper.py", line 415, in
main(args=args)
File ".\chat_paper.py", line 397, in main
reader1.summary_with_chat(paper_list=paper_list)
File ".\chat_paper.py", line 173, in summary_with_chat
first_image, ext = paper.get_image_path()
File "E:\project\ChatPaper\get_paper_from_pdf.py", line 85, in get_image_path
image.save(open(im_path, "wb"))
File "D:\Anaconda\lib\site-packages\PIL\Image.py", line 2431, in save
save_handler(self, fp, filename)
File "D:\Anaconda\lib\site-packages\PIL\JpegImagePlugin.py", line 646, in _save
raise OSError(msg) from e
OSError: cannot write mode RGBA as JPEG

您好,我尝试批量读取本地文件,但是报错了

Traceback (most recent call last):
main(args=args)
File "chat_paper.py", line 433, in main
paper_list.append(Paper(path=os.path.join(root, filename)))
File "H:\学习\ChatPaper-main\get_paper_from_pdf.py", line 15, in init
self.pdf = fitz.open(self.path) # pdf文档
File "D:\software\Anaconda3\lib\site-packages\fitz\fitz.py", line 3962, in init
_fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
fitz.fitz.FileDataError: cannot open broken document
可以汇总论文,但是fitz打不开pdf,环境没啥问题
下载论文并批量并总结没有问题,本地总结就会报错

[Feature Request] 集成 Semantic Scholar API 获取文献

一个小建议,关于 Query 与实际检索结果匹配度不高的问题或许可以通过引入 Semantic Scholar API 来解决,可参见文档

  1. Semantic Scholar 的检索效果比 ArXiv 自带的要强 (个人感觉,未经过严格测试)
  2. 数据来源更广,可以解决 ArXiv 不能查到想要文献的问题
  3. Semantic Scholar API 的接入成本比较低
  4. Semantic Scholar API 提供了按照作者、引文分析的的功能,进一步扩展的空间比较大

Token长度超过4097

% python chat_paper.py --query "all: causal prompt learning" --filter_keys "causal prompt learning" --max_results 5 --language en

Key word: reinforcement learning
Query: all: causal prompt learning
Sort: SortCriterion.Relevance
all search:
0 Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt 2022-05-23 07:51:15+00:00
1 IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach 2022-10-14 20:47:37+00:00
2 Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models 2022-10-19 19:13:07+00:00
3 Causal Intervention-based Prompt Debiasing for Event Argument Extraction 2022-10-04 12:32:00+00:00
4 Prompt Agnostic Essay Scorer: A Domain Generalization Approach to Cross-prompt Automated Essay Scoring 2020-08-04 10:17:38+00:00
filter_keys: causal prompt learning
筛选后剩下的论文数量:
filter_results: 1
filter_papers:
0 Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt 2022-05-23 07:51:15+00:00
All_paper: 1
paper_path: ./pdf_files/all causal prompt learni-2023-03-21-08/Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt.pdf
section_page_dict {'Abstract': 0, 'Introduction': 0, 'Related Work': 1, 'Methodology': 4, 'Method': 8, 'Experiments': 6, 'Conclusion': 7, 'References': 7}
0 Abstract 0
1 Introduction 0
start_page, end_page: 0 1
2 Related Work 1
start_page, end_page: 1 4
3 Methodology 4
start_page, end_page: 4 8
4 Method 8
start_page, end_page: 8 6
5 Experiments 6
start_page, end_page: 6 7
6 Conclusion 7
start_page, end_page: 7 7
7 References 7
start_page, end_page: 7 13
summary_result:

  1. Title: Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt (Chinese translation: 支持因果削减知识提示的视觉语言模型推理)

  2. Authors: Jiangmeng Li, Wenyi Mo, Wenwen Qiang, Bing Su, and Changwen Zheng

  3. Affiliation: Institute of Software Chinese Academy of Sciences, Beijing, China (for the first, third, and fifth authors); Renmin University of China, Beijing, China (for the second and fourth authors)

  4. Keywords: multi-modal, vision-language model, prompt engineering, causality, knowledge graph, ontology

  5. Urls: Paper: http://arxiv.org/abs/2205.11100v1, Github: None

  6. Summary:

  • (1): This paper focuses on improving the transferability of pre-trained vision-language models to downstream tasks in a zero-shot manner.

  • (2): Previous works explored generating fixed or learnable prompts to reduce the performance gap between tasks in the training and test phases. However, existing prompt methods do not explore the semantic information of textual labels, and manually constructing prompts with rich semantic information requires domain expertise and is time-consuming. To address this issue, the authors propose the Causality-pruning Knowledge Prompt (CapKP), which retrieves ontological knowledge graphs by treating textual labels as queries and introduces causality-pruning to refine the derived semantic information.

  • (3): The authors conduct extensive evaluations to demonstrate the effectiveness of CapKP in adapting pre-trained vision-language models to downstream image recognition. CapKP outperforms manual-prompt and learnable-prompt methods, achieving superior domain generalization compared to benchmark approaches.

  • (4): The experimental results show that CapKP achieved an improvement of 12.51% and 1.39% on average compared to manual-prompt and learnable-prompt methods, respectively, with 8 shots. The performance supports the effectiveness of CapKP in improving the transferability of pre-trained vision-language models in a zero-shot manner.
    prompt_token_used: 2279 completion_token_used: 429 total_token_used: 2708
    response_time: 16.399 s
    Traceback (most recent call last):
    File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 469, in
    main(args=args)
    File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 448, in main
    reader1.summary_with_chat(paper_list=paper_list)
    File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 208, in summary_with_chat
    chat_method_text = self.chat_method(text=text)
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 289, in wrapped_f
    return self(f, *args, **kw)
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 379, in call
    do = self.iter(retry_state=retry_state)
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 325, in iter
    raise retry_exc.reraise()
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 158, in reraise
    raise self.last_attempt.result()
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 382, in call
    result = fn(*args, **kwargs)
    File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 324, in chat_method
    response = openai.ChatCompletion.create(
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_requestor.py", line 226, in request
    resp, got_stream = self._interpret_response(result, stream)
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_requestor.py", line 619, in _interpret_response
    self._interpret_response_line(
    File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line
    raise self.handle_error_response(
    openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4132 tokens. Please reduce the length of the messages.

这个问题应该怎么修改代码呢?似乎 openai.ChatCompletion.create() 的 max_token 就是 4097.

关于非arxiv论文的一些问题

感谢您的项目,我想请问:

(1)请问是否支持本地非arxiv论文解读?
(2)请问支持的论文的最大页数是多少?
(3)对于某些论文的某些标题可能不是标准的,该如何处理?(如正常arxiv论文会有一个Experimental,但是如果现在有一篇论文它的标题则是Experimental results and analysis,是否我们需要去在代码中增加入"Experimental results and analysis"这个标题词)

谢谢~

error executing the example

After successful installation, the example is throwing the following error:

python chat_paper.py --query "chatgpt robot" --filter_keys "chatgpt robot" --max_results 1

Key word: learning reinforcement
Query: chatgpt robot
Sort: Relevance
all search:
Traceback (most recent call last):
  File "/Users/paco/Downloads/ChatPaper-main/chat_paper.py", line 420, in <module>
    main(args=args)
  File "/Users/paco/Downloads/ChatPaper-main/chat_paper.py", line 399, in main
    filter_results = reader1.filter_arxiv(max_results=args.max_results)
  File "/Users/paco/Downloads/ChatPaper-main/chat_paper.py", line 51, in filter_arxiv
    for index, result in enumerate(search.results()):
  File "/Users/paco/opt/miniconda3/lib/python3.9/site-packages/arxiv/arxiv.py", line 591, in results
    page_url = self._format_url(search, offset, page_size)
  File "/Users/paco/opt/miniconda3/lib/python3.9/site-packages/arxiv/arxiv.py", line 627, in _format_url
    url_args = search._url_args()
  File "/Users/paco/opt/miniconda3/lib/python3.9/site-packages/arxiv/arxiv.py", line 481, in _url_args
    "sortBy": self.sort_by.value,
AttributeError: 'str' object has no attribute 'value'

[feature request] integrating flask app to provide an interface

This is already a great program to quickly catch up and digest the on-going researches, especially for non-native english speakers. With a interface provided by, such as flask app, could benefit the newbies further. Could you please consider this proposal, thanks.

Problem for running on M1 macbook

I got the ERRORs as below, when running the demo in my m1pro MacBook: python chat_paper.py --pdf_path "demo.pdf":

Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.

本地pdf总结: 运行chat_paper.py,报错

python3 chat_paper.py --pdf_path "demo.pdf"

Key word: reinforcement learning
Query: all: ChatGPT robot
Sort: SortCriterion.Relevance
max_font_sizes [9.962599754333496, 9.962599754333496, 9.962599754333496, 9.962599754333496, 9.962599754333496, 10.958900451660156, 10.958900451660156, 23.91029930114746, 23.91029930114746, 29.88789939880371]
Traceback (most recent call last):
File "/Users/fan/work/gitlab/ChatPaper/chat_paper.py", line 468, in
main(args=args)
File "/Users/fan/work/gitlab/ChatPaper/chat_paper.py", line 426, in main
paper_list.append(Paper(path=args.pdf_path))
File "/Users/fan/work/gitlab/ChatPaper/get_paper_from_pdf.py", line 16, in init
self.title = self.get_title()
File "/Users/fan/work/gitlab/ChatPaper/get_paper_from_pdf.py", line 160, in get_title
self.title_page = page_index
NameError: name 'page_index' is not defined

诡异的bug

response_time: 9.751 s
Traceback (most recent call last):
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 468, in
main(args=args)
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 436, in main
reader1.summary_with_chat(paper_list=paper_list)
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 244, in summary_with_chat
self.export_to_markdown("\n".join(htmls), file_name=file_name, mode=mode)
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 396, in export_to_markdown
with open(file_name, mode, encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: './export\2023-03-20-21-electronics Review Kuruva Lakshmanna 1. Introduction Applications based on smartphones, sensors and actuators are becoming more and New technologies are emerging that evaluate data gathered for practical connections and data in IoT cloud and streaming and fast data analysis in the edge or fog computing and While an IoT has been conducted in recent years, the entire field of deep learning in 2. Deep Learning Techniques Stakeholders must clearly grasp the meaning, building blocks, potentials and chal- compared with normal ML methods. The ability to process data is generally dependent Deep learning is a recently developed multilayer neural network learning algorithm. evaluation test for CNN and DBN on the MNIST database and the real-world handwritten 2.1. Supervised Learning The system model for supervised learning is built into a labeled training set. The 2.1.1. Recurrent Neural Networks (RNNs) The RNN is a discriminative categorical method which can process the serial and However, due to the diffusion of gradient problems and longer term dependency, Applications in IoT Prediction of Transport or Smart city Energy Recognising images Education Sport and Retail Detection in physiology and 2.1.2. Long Short Term Memory (LSTM) The LSTM is discriminative method which can work on time-stamp, sequential and gates control the accessing memory cells and preventing distractions by unrelated inputs. Generally, LSTM is the expanding model of RNN. Various LSTM methods are pro- Applications in IoT Prediction Small period traffic prediction Autonomous driving Detection in physiology and Localization Smart home and city Applications in IoT Energy Health-care Education Sport 2.1.3. Convolutional Neural Networks (CNN’s) The CNN is a discriminatory method which can used more for identifying images DNNs with a dense relation between the layers are difficult to train and do not test well Applications in IoT Healthcare Smart home and city Transportation Recognizing images Detection of physiology and Agriculture Sport and Retail Localization Government 2.1.4. Transformer-Based Deep Neural Networks In the deep learning context, the transformer denotes a sequence-to-sequence archi- 2.2. Unsupervised Learning Unsupervised learning must be used as a complement to traditional learning methods Boltzmann machines (RBMs) or stacked auto-encoders to initialize, replicate back and 2.2.1. Autoencoder (AE) The AE is a generative method which can be suitable for extracting the features and Applications in IoT Fault Assessment Image Recognition Detection in physiology and Energy Localization Public Sector IoT Infrastructure 2.2.2. Restricted Boltzmann Machines (RBMs) The RBM is a generative method which can work on various kinds of data and its suit- Applications in IoT Energy Localization Health Sector Intelligent 2.2.3. Deep Belief Networks (DBNs) The DBN is a generative method which can work on various types of data. DBNs can Applications in IoT Transport Energy Health Sector Intelligent Image Recognition Detection of physiology and Security 3. IoT Applications and Challenges The data analysis leads significantly to IoT as discussed in the previous section. In this 3.1. Data Features of IoT As data is the basis for the extraction of knowledge, high-quality information is 3.2. Deep Learning Using IoT Devices The availability of the latest IoT frameworks and their open source libraries for con- 3.3. Applications of IoT The IoT application is classified according to its basic attributes and characteristics. management, etc. We can reduce the noise, pollution, accidents, parking problems, 3.4. Challenges Data sources are a foundation for the success of DL methods. To apply DL to IoT is module for the entire IoT system layout. It should be more reliable, cost effective and For IoT system designers, designing DL is a great challenge to meet the needs of man- and use smart techniques for data retention [ 4. Conclusions In this paper, a review has been presented on the DL and IoT techniques exploited References.md'

[Feature Request]

I would like to thank you for your dedication and effort in keeping this project up and running.

I wanted to propose a new feature that would allow users to download PDF files directly from Sci-Hub. As you know, Sci-Hub provides free access to millions of academic articles. Adding this feature would greatly enhance the usability of your software .

文件名过长错误:本地部署,正确上传、读取并处理后,在输出md文件的时候把summary内容整个命名为md文件名,导致系统报错

测试其他pdf文件均正常,目前只碰到这一个文件有问题。文件名为:“Integrated_bioprocess_for_conversion_of_gaseoussubstrates_to_liquids.pdf”,下载链接为:“https://www.pnas.org/doi/10.1073/pnas.1516867113”

报错内容如下:

Traceback (most recent call last):
File "/home/cyril/git/ChatPaper/chat_paper.py", line 467, in
main(args=args)
File "/home/cyril/git/ChatPaper/chat_paper.py", line 435, in main
reader1.summary_with_chat(paper_list=paper_list)
File "/home/cyril/git/ChatPaper/chat_paper.py", line 243, in summary_with_chat
self.export_to_markdown("\n".join(htmls), file_name=file_name, mode=mode)
File "/home/cyril/git/ChatPaper/chat_paper.py", line 395, in export_to_markdown
with open(file_name, mode, encoding="utf-8") as f:
OSError: [Errno 36] File name too long: './export/2023-03-15-07-Integrated bioprocess for conversion of gaseous Peng Hu oncerns over diminishing oil reserves and climate-changing renewable liquid fuels (1). One promising direction has been the We have shown previously that acetate in excess of 30 g_L can be produced from mixtures of CO and CO_H Significance Results limitation during growth on H M. thermoacetica content of 61%, which is the highest reported to date on acetate. Carbon fluxes were calculated for this run. sumes acetic acid and converts the carbon to CO system of Fig. 1 was assembled and operated as shown in Table 1. threefold higher than the first stage when CO_CO The time courses of cell growth and lipids production in the aerobic bioreactor are presented in Fig. 10. Lipid titer was 18 g_L Discussion conversion process. First, we used the acetogen Second, this work highlights a novel gas composition switch strategy as an important part of the process required for achieving Third, as summarized in Table 2, the numbers of merit obtained for the integrated system are lower than those achieved for the efficiency (from hydrogen to lipid and yeast) of the integrated Fourth, there is significant potential for reduction of CO emissions in our system compared with the single-stage systems. The main limitation in biodiesel (the primary renewable al- ternative to diesel) production is feedstock availability and cost Materials and Methods.md'

私有化部署后报错

采用私有化部署,上次PDF后后报错,错误内容如下,还需要做什么配置吗:
im_path: image.png
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1032, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 844, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/root/ChatPaper/deploy/Private/app.py", line 619, in upload_pdf
sum_info = reader.summary_with_chat(paper_list=paper_list)
File "/root/ChatPaper/deploy/Private/app.py", line 446, in summary_with_chat
summary_text += "

" + chat_summary_text
TypeError: can only concatenate str (not "tuple") to str

运行本地文件夹批量总结: 运行chat_paper.py, 比如: python chat_paper.py --pdf_path "your_absolute_path"时程序报错

报错信息如下
Traceback (most recent call last):
File "F:\chatpaper\ChatPaper\chat_paper.py", line 468, in
main(args=args)
File "F:\chatpaper\ChatPaper\chat_paper.py", line 433, in main
paper_list.append(Paper(path=os.path.join(root, filename)))
File "F:\chatpaper\ChatPaper\get_paper_from_pdf.py", line 17, in init
self.parse_pdf()
File "F:\chatpaper\ChatPaper\get_paper_from_pdf.py", line 33, in parse_pdf
self.section_text_dict.update({"paper_info": self.get_paper_info()})
File "F:\chatpaper\ChatPaper\get_paper_from_pdf.py", line 42, in get_paper_info
introduction_text = self.section_text_dict['Introduction']
KeyError: 'Introduction'

论文标题获取方法有问题

我在本地环境尝试了使用,通过分析多篇文件进行了测试,有个明显的问题是识别出来的标题等基本信息就不对,经过分析,发现当标题存在多行时,只有第一行的内容会完整的出现在分析结果了,后面可能就会跟上一串完全不同的看似相关的内容(人工智能开始进行经典的胡编乱造了)类似的问题导致标题、作者信息等基本情况出现问题,猜测时识别pdf文件时没有考虑这点(没有分析源码验证这部分),进而引起后面的文章分析的不准确。

Some possible bugs

Thanks for your contribution! It is really convenient for researchers to quickly read the paper.

I noticed two bugs when I use your code searching the paper "Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection". I am not sure whether the bugs only exist on my computer, so maybe you can verify them first.

The first problem is that the code seems cannot recognize "-". I have no idea about why. When I search "Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection", there is no result, but when searching "Static Dynamic Co Teaching for Class Incremental 3D Object Detection" the result is correct.

The second problem is that the author list is not correct if generated by chatgpt. First time it tells me that the authors are "Wenxuan Wang, Xiangyu Chen, Shaoshuai Shi, Kaiqi Huang" and the second time it tells me that the authors are "Fangyu Liu, Zhiliang Ma, Junlan Yang, Jingyi Yu, Jinyong Jeong, Chen Feng, Rongrong Ji", which are both wrong. I found that the correct results could be gotten in the search results of arxiv. I think maybe we can trust more to arxiv's results.

OpenAI API - Access Terminated

请问有遇到封号情况嘛,max_result 参数设置20 + 国外云服务器动态IP配置被封了,不知道是否是一次性调用API次数过多的原因

[Bug] openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4218 tokens. Please reduce the length of the messages.

Traceback (most recent call last):
File ".\chat_paper.py", line 412, in
main(args=args)
File ".\chat_paper.py", line 394, in main
reader1.summary_with_chat(paper_list=paper_list)
File ".\chat_paper.py", line 200, in summary_with_chat
chat_method_text = self.chat_method(text=text)
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 325, in iter
raise retry_exc.reraise()
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 158, in reraise
raise self.last_attempt.result()
File "D:\Anaconda\lib\concurrent\futures_base.py", line 432, in result
return self.__get_result()
File "D:\Anaconda\lib\concurrent\futures_base.py", line 388, in __get_result
raise self.exception
File "D:\Anaconda\lib\site-packages\tenacity_init
.py", line 382, in call
result = fn(*args, **kwargs)
File ".\chat_paper.py", line 284, in chat_method
response = openai.ChatCompletion.create(
File "D:\Anaconda\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "D:\Anaconda\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "D:\Anaconda\lib\site-packages\openai\api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "D:\Anaconda\lib\site-packages\openai\api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "D:\Anaconda\lib\site-packages\openai\api_requestor.py", line 679, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4218 tokens. Please reduce the length of the messages.

运行python chat_paper.py --pdf_path "demo.pdf" 错误

运行python chat_paper.py --pdf_path "demo.pdf" 时出现以下问题:
Traceback (most recent call last):
File "D:\chatpdf\ChatPaper-main\chat_paper.py", line 468, in
main(args=args)
File "D:\chatpdf\ChatPaper-main\chat_paper.py", line 416, in main
reader1 = Reader(key_word=args.key_word,
File "D:\chatpdf\ChatPaper-main\chat_paper.py", line 47, in init
self.encoding = tiktoken.get_encoding("gpt2")
File "D:\anaconda\lib\site-packages\tiktoken\registry.py", line 63, in get_encoding
enc = Encoding(**constructor())
File "D:\anaconda\lib\site-packages\tiktoken_ext\openai_public.py", line 11, in gpt2
mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
File "D:\anaconda\lib\site-packages\tiktoken\load.py", line 68, in data_gym_to_mergeable_bpe_ranks
vocab_bpe_contents = read_file_cached(vocab_bpe_file).decode()
File "D:\anaconda\lib\site-packages\tiktoken\load.py", line 41, in read_file_cached
contents = read_file(blobpath)
File "D:\anaconda\lib\site-packages\tiktoken\load.py", line 19, in read_file
return requests.get(blobpath).content
File "D:\anaconda\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "D:\anaconda\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "D:\anaconda\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "D:\anaconda\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "D:\anaconda\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /gpt-2/encodings/main/vocab.bpe (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

Max retries exceeded with url: /v1/chat/completions

执行命令

 python chat_paper.py --query "all:gravitational wave" --key_word "pulsar timing array" --filter_keys "gravitational wave pulsar" --max_results 10 

然后报错说尝试次数超过限制,是我的OpenAI key的问题吗?

Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 700, in urlopen
    self._prepare_proxy(conn)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 996, in _prepare_proxy
    conn.connect()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 815, in urlopen
    return self.urlopen(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 815, in urlopen
    return self.urlopen(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_requestor.py", line 516, in request_raw
    result = _thread_context.session.request(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 468, in <module>
    main(args=args)
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 447, in main
    reader1.summary_with_chat(paper_list=paper_list)
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 176, in summary_with_chat
    chat_summary_text = self.chat_summary(text=text)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 378, in chat_summary
    response = openai.ChatCompletion.create(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_requestor.py", line 216, in request
    result = self.request_raw(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_requestor.py", line 528, in request_raw
    raise error.APIConnectionError(
openai.error.APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)')))

[小白的疑惑] Windows上能跑但是内置的WSL跑不起来?

读取本地pdf这块儿,在windows自带的git bash(或者其他命令行)上能够跑出完整的结果,但是进入到WSL之后只能到下面读出三篇文章这里,GPT似乎就不返回结果了。

------------------paper_num: 3------------------
0 ./pdf_files/ReadMore/2206.03687.pdf
1 ./pdf_files/ReadMore/4102_polyloss_a_polynomial_expansio.pdf
2 ./pdf_files/ReadMore/BenchmarkNLU4FewShotLearning.pdf

WSL到这里就不运行了...但同一台机器的windows上面可以完整跑出结果。

小白第一次尝试WSL...也许是WSL上linux没有配置好?

请问读不到标题是因为pdf有什么问题呢

Traceback (most recent call last):
File "chat_paper.py", line 412, in
main(args=args)
File "chat_paper.py", line 382, in main
paper_list = [Paper(path=args.pdf_path)]
File "/Users/apple/Documents/workspace/python/ChatPaper/get_paper_from_pdf.py", line 14, in init
self.title = self.get_title()
File "/Users/apple/Documents/workspace/python/ChatPaper/get_paper_from_pdf.py", line 121, in get_title
font_size = block["lines"][0]["spans"][0]["size"] # 获取第一行第一段文字的字体大小
IndexError: list index out of range

是想要读一篇IEEE的文章,如果开发者那边有access的话,标题是 Encryption-based Coordinated Volt/Var Control for Distribution Networks with Multi-Microgrids. 非常感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.