kaixindelele / chatpaper Goto Github PK

View Code? Open in Web Editor NEW

18.1K 92.0 1.9K 36.16 MB

Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复

Home Page: https://chatwithpaper.org

License: Other

Python 69.87% Jupyter Notebook 14.62% Shell 0.19% Dockerfile 0.63% Makefile 0.13% Batchfile 0.16% TeX 14.39%

arxiv paper

chatpaper's People

Contributors

Stargazers

Watchers

Forkers

wzhings magicknight ithink3iam linmuqiang wxc971231 zhanghx0 airyunn zhang-xiaoshi azure1005 mrgongqi dawume rui12366b jeffersonchou yiyuankongmie chengjunjian ruofeidu vpegasus liuqi8827 foolfish1 auspi12341 jyluo1994 chambinlee fanshu6036 hcccc0912 wangrongsheng wook2014 leegang greatsuperliu kazgu skysliao housiyuan2001 gsepcsj zyang1998 rphone verlaski fornote susan1314 yixinz-nus 2529chen xvwcreator liuyixin-louis leavesofgrass2021 antimatter2020 k773920063 inascf jahonn gj-12222 mickeysclubhouse kcilcarus liushuchun jibin5167 ymi33 alexmaehon think-core niuboboo weixie-nnu skywalkerwang98 bill-gots lecooo af-74413592 sddai pilipupu mengshuaiyang peins shayne98 peng1999 orientonubo ukaserge compyan meetayue ailabteam techthiyanes eltociear codeaudit circlestarzero saquib-mehmood jimmc414 wakamenori ltttdh lkunxyz tanghui315 besttea zhangzhenhu hyojunguy esmivn brozcatalina 43josearias iuriimattos2 a876691666 waka1988 ai-jie01 wwwadx geozhouzheng yhao29 oscar-yh nofeetbird0321 babyblue26 mamingsuper wisedoor ranhuiryan

chatpaper's Issues

关于非arxiv论文的一些问题

感谢您的项目，我想请问：

（1）请问是否支持本地非arxiv论文解读？
（2）请问支持的论文的最大页数是多少？
（3）对于某些论文的某些标题可能不是标准的，该如何处理？（如正常arxiv论文会有一个Experimental，但是如果现在有一篇论文它的标题则是Experimental results and analysis，是否我们需要去在代码中增加入"Experimental results and analysis"这个标题词）

谢谢~

网页版，开始用着还挺好，后来就error了

是因为什么？

希望增加对本地PDF文件的批量读取

我的专业arxiv里面没有什么相关论文，所以arxiv的检索功能对我用处不大，但是chatpaper的本地功能很赞，就是一个一个太慢了

将静态资源内置于项目内

我搭建了一个公益站点
但国内大部分地区加载不了cloudflare
且加载fonts.googleapis.com异常缓慢
希望将静态资源内置于项目内

[Bug] OSError: cannot write mode RGBA as JPEG

Traceback (most recent call last):
File "D:\Anaconda\lib\site-packages\PIL\JpegImagePlugin.py", line 643, in _save
rawmode = RAWMODE[im.mode]
KeyError: 'RGBA'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File ".\chat_paper.py", line 415, in
main(args=args)
File ".\chat_paper.py", line 397, in main
reader1.summary_with_chat(paper_list=paper_list)
File ".\chat_paper.py", line 173, in summary_with_chat
first_image, ext = paper.get_image_path()
File "E:\project\ChatPaper\get_paper_from_pdf.py", line 85, in get_image_path
image.save(open(im_path, "wb"))
File "D:\Anaconda\lib\site-packages\PIL\Image.py", line 2431, in save
save_handler(self, fp, filename)
File "D:\Anaconda\lib\site-packages\PIL\JpegImagePlugin.py", line 646, in _save
raise OSError(msg) from e
OSError: cannot write mode RGBA as JPEG

arxiv的搜索关键词想检索特定年份应该用什么

RuntimeError: Directory 'static/' does not exist

运行命令python chat_paper.py --query "chatgpt robot" --filter_keys "chat" --max_results 1 报错

Code error in chat_paper.py (line 65)

ori: if meet_num == len(filter_keys):
fix: if meet_num == len(filter_keys.split(" ")):

建议集成到Zotero中

建议集成到Zotero中，这样更方便一些，谢谢！

私有化部署后报错

采用私有化部署，上次PDF后后报错，错误内容如下，还需要做什么配置吗：
im_path: image.png
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1032, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 844, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/root/ChatPaper/deploy/Private/app.py", line 619, in upload_pdf
sum_info = reader.summary_with_chat(paper_list=paper_list)
File "/root/ChatPaper/deploy/Private/app.py", line 446, in summary_with_chat
summary_text += "

" + chat_summary_text
TypeError: can only concatenate str (not "tuple") to str

请问读不到标题是因为pdf有什么问题呢

Traceback (most recent call last):
File "chat_paper.py", line 412, in
main(args=args)
File "chat_paper.py", line 382, in main
paper_list = [Paper(path=args.pdf_path)]
File "/Users/apple/Documents/workspace/python/ChatPaper/get_paper_from_pdf.py", line 14, in init
self.title = self.get_title()
File "/Users/apple/Documents/workspace/python/ChatPaper/get_paper_from_pdf.py", line 121, in get_title
font_size = block["lines"][0]["spans"][0]["size"] # 获取第一行第一段文字的字体大小
IndexError: list index out of range

是想要读一篇IEEE的文章，如果开发者那边有access的话，标题是 Encryption-based Coordinated Volt/Var Control for Distribution Networks with Multi-Microgrids. 非常感谢！

是否考虑支持 chatPDF 类似的交互提问与回答功能？

目前来看，ChatPaper 仅支持对某个 paper 进行 summarize，但这对于充分理解一个 paper来说并不足够。

所以是否考虑类似 https://www.chatpdf.com/ 一样，可以把 paper 作为一个 bot，进行提问与对话呢

Problem for running on M1 macbook

I got the ERRORs as below, when running the demo in my m1pro MacBook: python chat_paper.py --pdf_path "demo.pdf":

Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.

[Bug] openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4218 tokens. Please reduce the length of the messages.

Traceback (most recent call last):
File ".\chat_paper.py", line 412, in
main(args=args)
File ".\chat_paper.py", line 394, in main
reader1.summary_with_chat(paper_list=paper_list)
File ".\chat_paper.py", line 200, in summary_with_chat
chat_method_text = self.chat_method(text=text)
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 325, in iter
raise retry_exc.reraise()
File "D:\Anaconda\lib\site-packages\tenacity_init_.py", line 158, in reraise
raise self.last_attempt.result()
File "D:\Anaconda\lib\concurrent\futures_base.py", line 432, in result
return self.__get_result()
File "D:\Anaconda\lib\concurrent\futures_base.py", line 388, in __get_result
raise self.exception
File "D:\Anaconda\lib\site-packages\tenacity_init.py", line 382, in call
result = fn(*args, **kwargs)
File ".\chat_paper.py", line 284, in chat_method
response = openai.ChatCompletion.create(
File "D:\Anaconda\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "D:\Anaconda\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "D:\Anaconda\lib\site-packages\openai\api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "D:\Anaconda\lib\site-packages\openai\api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "D:\Anaconda\lib\site-packages\openai\api_requestor.py", line 679, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4218 tokens. Please reduce the length of the messages.

argparse解析参数的错误/bug for argparse

在命令行执行python文件chat_paper.py 输入参数--sort arxiv.SortCriterion.LastUpdatedDate会报错

AttributeError: 'str' object has no attribute 'value'

经过检查发现在chat_paper.py文件中解析--sort参数的代码是

parser.add_argument("--sort", default=arxiv.SortCriterion.Relevance, help="another is arxiv.SortCriterion.LastUpdatedDate")

可能是由于parser.add_argument方法中没有指定type参数,因此python默认将arxiv.SortCriterion.LastUpdatedDate作为str类型解析,则与预期不符,由于arxiv.SortCriterion是枚举类型,因此应该先指定type参数

parser.add_argument("--sort", default=arxiv.SortCriterion.Relevance,type=arxiv.SortCriterion, help="another is arxiv.SortCriterion.LastUpdatedDate")

并在命令行传入枚举类型的值--sort lastUpdatedDate 才能正常运行
请问这是一个bug么?

运行python chat_paper.py --pdf_path "demo.pdf" 错误

运行python chat_paper.py --pdf_path "demo.pdf" 时出现以下问题：
Traceback (most recent call last):
File "D:\chatpdf\ChatPaper-main\chat_paper.py", line 468, in
main(args=args)
File "D:\chatpdf\ChatPaper-main\chat_paper.py", line 416, in main
reader1 = Reader(key_word=args.key_word,
File "D:\chatpdf\ChatPaper-main\chat_paper.py", line 47, in init
self.encoding = tiktoken.get_encoding("gpt2")
File "D:\anaconda\lib\site-packages\tiktoken\registry.py", line 63, in get_encoding
enc = Encoding(**constructor())
File "D:\anaconda\lib\site-packages\tiktoken_ext\openai_public.py", line 11, in gpt2
mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
File "D:\anaconda\lib\site-packages\tiktoken\load.py", line 68, in data_gym_to_mergeable_bpe_ranks
vocab_bpe_contents = read_file_cached(vocab_bpe_file).decode()
File "D:\anaconda\lib\site-packages\tiktoken\load.py", line 41, in read_file_cached
contents = read_file(blobpath)
File "D:\anaconda\lib\site-packages\tiktoken\load.py", line 19, in read_file
return requests.get(blobpath).content
File "D:\anaconda\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "D:\anaconda\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "D:\anaconda\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "D:\anaconda\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "D:\anaconda\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='openaipublic.blob.core.windows.net', port=443): Max retries exceeded with url: /gpt-2/encodings/main/vocab.bpe (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

Mac M1运行报错

Mac M1运行报错
无法正常使用

代码：python chat_paper.py --pdf_path ./test.pdf
输出：

Key word: reinforcement learning
Query: all: ChatGPT robot
Sort: SortCriterion.Relevance
Traceback (most recent call last):
  File "/Users/dx/workdir/software/ChatPaper/chat_paper.py", line 412, in <module>
    main(args=args)
  File "/Users/dx/workdir/software/ChatPaper/chat_paper.py", line 382, in main
    paper_list = [Paper(path=args.pdf_path)]
  File "/Users/dx/workdir/software/ChatPaper/get_paper_from_pdf.py", line 14, in __init__
    self.title = self.get_title()
  File "/Users/dx/workdir/software/ChatPaper/get_paper_from_pdf.py", line 121, in get_title
    font_size = block["lines"][0]["spans"][0]["size"] # 获取第一行第一段文字的字体大小
IndexError: list index out of range

这有办法解决吗

运行本地文件夹批量总结：运行chat_paper.py，比如： python chat_paper.py --pdf_path "your_absolute_path"时程序报错

报错信息如下
Traceback (most recent call last):
File "F:\chatpaper\ChatPaper\chat_paper.py", line 468, in
main(args=args)
File "F:\chatpaper\ChatPaper\chat_paper.py", line 433, in main
paper_list.append(Paper(path=os.path.join(root, filename)))
File "F:\chatpaper\ChatPaper\get_paper_from_pdf.py", line 17, in init
self.parse_pdf()
File "F:\chatpaper\ChatPaper\get_paper_from_pdf.py", line 33, in parse_pdf
self.section_text_dict.update({"paper_info": self.get_paper_info()})
File "F:\chatpaper\ChatPaper\get_paper_from_pdf.py", line 42, in get_paper_info
introduction_text = self.section_text_dict['Introduction']
KeyError: 'Introduction'

您好，想请教您一下这个是如何解析pdf结构的

error executing the example

After successful installation, the example is throwing the following error:

python chat_paper.py --query "chatgpt robot" --filter_keys "chatgpt robot" --max_results 1

Key word: learning reinforcement
Query: chatgpt robot
Sort: Relevance
all search:
Traceback (most recent call last):
  File "/Users/paco/Downloads/ChatPaper-main/chat_paper.py", line 420, in <module>
    main(args=args)
  File "/Users/paco/Downloads/ChatPaper-main/chat_paper.py", line 399, in main
    filter_results = reader1.filter_arxiv(max_results=args.max_results)
  File "/Users/paco/Downloads/ChatPaper-main/chat_paper.py", line 51, in filter_arxiv
    for index, result in enumerate(search.results()):
  File "/Users/paco/opt/miniconda3/lib/python3.9/site-packages/arxiv/arxiv.py", line 591, in results
    page_url = self._format_url(search, offset, page_size)
  File "/Users/paco/opt/miniconda3/lib/python3.9/site-packages/arxiv/arxiv.py", line 627, in _format_url
    url_args = search._url_args()
  File "/Users/paco/opt/miniconda3/lib/python3.9/site-packages/arxiv/arxiv.py", line 481, in _url_args
    "sortBy": self.sort_by.value,
AttributeError: 'str' object has no attribute 'value'

OpenAI API - Access Terminated

请问有遇到封号情况嘛，max_result 参数设置20 + 国外云服务器动态IP配置被封了，不知道是否是一次性调用API次数过多的原因

文件名过长错误：本地部署，正确上传、读取并处理后，在输出md文件的时候把summary内容整个命名为md文件名，导致系统报错

测试其他pdf文件均正常，目前只碰到这一个文件有问题。文件名为：“Integrated_bioprocess_for_conversion_of_gaseoussubstrates_to_liquids.pdf”，下载链接为：“https://www.pnas.org/doi/10.1073/pnas.1516867113”

报错内容如下：

Traceback (most recent call last):
File "/home/cyril/git/ChatPaper/chat_paper.py", line 467, in
main(args=args)
File "/home/cyril/git/ChatPaper/chat_paper.py", line 435, in main
reader1.summary_with_chat(paper_list=paper_list)
File "/home/cyril/git/ChatPaper/chat_paper.py", line 243, in summary_with_chat
self.export_to_markdown("\n".join(htmls), file_name=file_name, mode=mode)
File "/home/cyril/git/ChatPaper/chat_paper.py", line 395, in export_to_markdown
with open(file_name, mode, encoding="utf-8") as f:
OSError: [Errno 36] File name too long: './export/2023-03-15-07-Integrated bioprocess for conversion of gaseous Peng Hu oncerns over diminishing oil reserves and climate-changing renewable liquid fuels (1). One promising direction has been the We have shown previously that acetate in excess of 30 g_L can be produced from mixtures of CO and CO_H Significance Results limitation during growth on H M. thermoacetica content of 61%, which is the highest reported to date on acetate. Carbon fluxes were calculated for this run. sumes acetic acid and converts the carbon to CO system of Fig. 1 was assembled and operated as shown in Table 1. threefold higher than the first stage when CO_CO The time courses of cell growth and lipids production in the aerobic bioreactor are presented in Fig. 10. Lipid titer was 18 g_L Discussion conversion process. First, we used the acetogen Second, this work highlights a novel gas composition switch strategy as an important part of the process required for achieving Third, as summarized in Table 2, the numbers of merit obtained for the integrated system are lower than those achieved for the efficiency (from hydrogen to lipid and yeast) of the integrated Fourth, there is significant potential for reduction of CO emissions in our system compared with the single-stage systems. The main limitation in biodiesel (the primary renewable al- ternative to diesel) production is feedstock availability and cost Materials and Methods.md'

Max retries exceeded with url: /v1/chat/completions

执行命令

 python chat_paper.py --query "all:gravitational wave" --key_word "pulsar timing array" --filter_keys "gravitational wave pulsar" --max_results 10

然后报错说尝试次数超过限制，是我的OpenAI key的问题吗？

Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 700, in urlopen
    self._prepare_proxy(conn)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 996, in _prepare_proxy
    conn.connect()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 1071, in _create
    self.do_handshake()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 815, in urlopen
    return self.urlopen(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 815, in urlopen
    return self.urlopen(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_requestor.py", line 516, in request_raw
    result = _thread_context.session.request(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 468, in <module>
    main(args=args)
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 447, in main
    reader1.summary_with_chat(paper_list=paper_list)
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 176, in summary_with_chat
    chat_summary_text = self.chat_summary(text=text)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/Users/qyq/Library/Mobile Documents/com~apple~CloudDocs/Development/ChatPaper/chat_paper.py", line 378, in chat_summary
    response = openai.ChatCompletion.create(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_requestor.py", line 216, in request
    result = self.request_raw(
  File "/Users/qyq/Library/Python/3.10/lib/python/site-packages/openai/api_requestor.py", line 528, in request_raw
    raise error.APIConnectionError(
openai.error.APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:997)')))

UI界面

不知大佬有没有兴趣做一个界面出来(o゜▽゜)o☆

希望能够加上“时间”~

Arxiv论文的时效性非常重要，希望能够加上。

处理一些没有Introduction 的文章时会报错Introduction error 。

Traceback (most recent call last):
File "C:\Users\admin\Documents\GitHub\ChatPaper\chat_paper.py", line 471, in
main(args=args)
File "C:\Users\admin\Documents\GitHub\ChatPaper\chat_paper.py", line 436, in main
paper_list.append(Paper(path=os.path.join(root, filename)))
File "C:\Users\admin\Documents\GitHub\ChatPaper\get_paper_from_pdf.py", line 17, in init
self.parse_pdf()
File "C:\Users\admin\Documents\GitHub\ChatPaper\get_paper_from_pdf.py", line 33, in parse_pdf
self.section_text_dict.update({"paper_info": self.get_paper_info()})
File "C:\Users\admin\Documents\GitHub\ChatPaper\get_paper_from_pdf.py", line 42, in get_paper_info
introduction_text = self.section_text_dict['Introduction']
KeyError: 'Introduction'

Python版本问题

[feature request] integrating flask app to provide an interface

This is already a great program to quickly catch up and digest the on-going researches, especially for non-native english speakers. With a interface provided by, such as flask app, could benefit the newbies further. Could you please consider this proposal, thanks.

您好，我尝试批量读取本地文件，但是报错了

Traceback (most recent call last):
main(args=args)
File "chat_paper.py", line 433, in main
paper_list.append(Paper(path=os.path.join(root, filename)))
File "H:\学习\ChatPaper-main\get_paper_from_pdf.py", line 15, in init
self.pdf = fitz.open(self.path) # pdf文档
File "D:\software\Anaconda3\lib\site-packages\fitz\fitz.py", line 3962, in init
_fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
fitz.fitz.FileDataError: cannot open broken document
可以汇总论文，但是fitz打不开pdf，环境没啥问题
下载论文并批量并总结没有问题，本地总结就会报错

请教一下，同类项目很多用了 langchain 库来实现，是发现了有啥问题无法实现必须自己实现prompt吗？

大佬你好，我可以尝试一下根据这个版本编写一个dblp数据库的版本吗

Web端执行api-key可用验证很慢

是不是更推荐clone到本地使用？

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4159 tokens. Please reduce the length of the messages.

数据过长

ChatGPT

想问一下这个能不能实现对话功能，类似chatpdf一样？

论文太长了怎么办

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4256 tokens. Please reduce the length of the messages.

Token长度超过4097

% python chat_paper.py --query "all: causal prompt learning" --filter_keys "causal prompt learning" --max_results 5 --language en

Key word: reinforcement learning
Query: all: causal prompt learning
Sort: SortCriterion.Relevance
all search:
0 Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt 2022-05-23 07:51:15+00:00
1 IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach 2022-10-14 20:47:37+00:00
2 Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models 2022-10-19 19:13:07+00:00
3 Causal Intervention-based Prompt Debiasing for Event Argument Extraction 2022-10-04 12:32:00+00:00
4 Prompt Agnostic Essay Scorer: A Domain Generalization Approach to Cross-prompt Automated Essay Scoring 2020-08-04 10:17:38+00:00
filter_keys: causal prompt learning
筛选后剩下的论文数量：
filter_results: 1
filter_papers:
0 Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt 2022-05-23 07:51:15+00:00
All_paper: 1
paper_path: ./pdf_files/all causal prompt learni-2023-03-21-08/Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt.pdf
section_page_dict {'Abstract': 0, 'Introduction': 0, 'Related Work': 1, 'Methodology': 4, 'Method': 8, 'Experiments': 6, 'Conclusion': 7, 'References': 7}
0 Abstract 0
1 Introduction 0
start_page, end_page: 0 1
2 Related Work 1
start_page, end_page: 1 4
3 Methodology 4
start_page, end_page: 4 8
4 Method 8
start_page, end_page: 8 6
5 Experiments 6
start_page, end_page: 6 7
6 Conclusion 7
start_page, end_page: 7 7
7 References 7
start_page, end_page: 7 13
summary_result:

Title: Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt (Chinese translation: 支持因果削减知识提示的视觉语言模型推理)
Authors: Jiangmeng Li, Wenyi Mo, Wenwen Qiang, Bing Su, and Changwen Zheng
Affiliation: Institute of Software Chinese Academy of Sciences, Beijing, China (for the first, third, and fifth authors); Renmin University of China, Beijing, China (for the second and fourth authors)
Keywords: multi-modal, vision-language model, prompt engineering, causality, knowledge graph, ontology
Urls: Paper: http://arxiv.org/abs/2205.11100v1, Github: None
Summary:

(1): This paper focuses on improving the transferability of pre-trained vision-language models to downstream tasks in a zero-shot manner.
(2): Previous works explored generating fixed or learnable prompts to reduce the performance gap between tasks in the training and test phases. However, existing prompt methods do not explore the semantic information of textual labels, and manually constructing prompts with rich semantic information requires domain expertise and is time-consuming. To address this issue, the authors propose the Causality-pruning Knowledge Prompt (CapKP), which retrieves ontological knowledge graphs by treating textual labels as queries and introduces causality-pruning to refine the derived semantic information.
(3): The authors conduct extensive evaluations to demonstrate the effectiveness of CapKP in adapting pre-trained vision-language models to downstream image recognition. CapKP outperforms manual-prompt and learnable-prompt methods, achieving superior domain generalization compared to benchmark approaches.
(4): The experimental results show that CapKP achieved an improvement of 12.51% and 1.39% on average compared to manual-prompt and learnable-prompt methods, respectively, with 8 shots. The performance supports the effectiveness of CapKP in improving the transferability of pre-trained vision-language models in a zero-shot manner.
prompt_token_used: 2279 completion_token_used: 429 total_token_used: 2708
response_time: 16.399 s
Traceback (most recent call last):
File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 469, in
main(args=args)
File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 448, in main
reader1.summary_with_chat(paper_list=paper_list)
File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 208, in summary_with_chat
chat_method_text = self.chat_method(text=text)
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 325, in iter
raise retry_exc.reraise()
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 158, in reraise
raise self.last_attempt.result()
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
File "/Users/jiangwenzhao/Documents/GitHub/ChatPaper/chat_paper.py", line 324, in chat_method
response = openai.ChatCompletion.create(
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_requestor.py", line 226, in request
resp, got_stream = self._interpret_response(result, stream)
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_requestor.py", line 619, in _interpret_response
self._interpret_response_line(
File "/Users/jiangwenzhao/opt/anaconda3/envs/chatgpt/lib/python3.9/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line
raise self.handle_error_response(
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4132 tokens. Please reduce the length of the messages.

这个问题应该怎么修改代码呢？似乎 openai.ChatCompletion.create() 的 max_token 就是 4097.

[Feature Request] 集成 Semantic Scholar API 获取文献

一个小建议，关于 Query 与实际检索结果匹配度不高的问题或许可以通过引入 Semantic Scholar API 来解决，可参见文档。

Semantic Scholar 的检索效果比 ArXiv 自带的要强（个人感觉，未经过严格测试）
数据来源更广，可以解决 ArXiv 不能查到想要文献的问题
Semantic Scholar API 的接入成本比较低
Semantic Scholar API 提供了按照作者、引文分析的的功能，进一步扩展的空间比较大

请问大佬是如何实现方法章节定位和解析的？

Some possible bugs

Thanks for your contribution! It is really convenient for researchers to quickly read the paper.

I noticed two bugs when I use your code searching the paper "Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection". I am not sure whether the bugs only exist on my computer, so maybe you can verify them first.

The first problem is that the code seems cannot recognize "-". I have no idea about why. When I search "Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection", there is no result, but when searching "Static Dynamic Co Teaching for Class Incremental 3D Object Detection" the result is correct.

The second problem is that the author list is not correct if generated by chatgpt. First time it tells me that the authors are "Wenxuan Wang, Xiangyu Chen, Shaoshuai Shi, Kaiqi Huang" and the second time it tells me that the authors are "Fangyu Liu, Zhiliang Ma, Junlan Yang, Jingyi Yu, Jinyong Jeong, Chen Feng, Rongrong Ji", which are both wrong. I found that the correct results could be gotten in the search results of arxiv. I think maybe we can trust more to arxiv's results.

论文标题获取方法有问题

我在本地环境尝试了使用，通过分析多篇文件进行了测试，有个明显的问题是识别出来的标题等基本信息就不对，经过分析，发现当标题存在多行时，只有第一行的内容会完整的出现在分析结果了，后面可能就会跟上一串完全不同的看似相关的内容（人工智能开始进行经典的胡编乱造了）类似的问题导致标题、作者信息等基本情况出现问题，猜测时识别pdf文件时没有考虑这点（没有分析源码验证这部分），进而引起后面的文章分析的不准确。

[Feature Request]

I would like to thank you for your dedication and effort in keeping this project up and running.

I wanted to propose a new feature that would allow users to download PDF files directly from Sci-Hub. As you know, Sci-Hub provides free access to millions of academic articles. Adding this feature would greatly enhance the usability of your software .

为什么gpt4出来，chatpaper就死了

我觉得还是可以做出不错的效果的。

[小白的疑惑] Windows上能跑但是内置的WSL跑不起来？

读取本地pdf这块儿，在windows自带的git bash（或者其他命令行）上能够跑出完整的结果，但是进入到WSL之后只能到下面读出三篇文章这里，GPT似乎就不返回结果了。

------------------paper_num: 3------------------
0 ./pdf_files/ReadMore/2206.03687.pdf
1 ./pdf_files/ReadMore/4102_polyloss_a_polynomial_expansio.pdf
2 ./pdf_files/ReadMore/BenchmarkNLU4FewShotLearning.pdf

WSL到这里就不运行了...但同一台机器的windows上面可以完整跑出结果。

小白第一次尝试WSL...也许是WSL上linux没有配置好？

一用API key号就被封，封了几个了

Get multiline title

Hi,

I think current code cannot capture multiline titles.

arxiv上没有很多医学论文怎么办

本地pdf总结：运行chat_paper.py，报错

python3 chat_paper.py --pdf_path "demo.pdf"

Key word: reinforcement learning
Query: all: ChatGPT robot
Sort: SortCriterion.Relevance
max_font_sizes [9.962599754333496, 9.962599754333496, 9.962599754333496, 9.962599754333496, 9.962599754333496, 10.958900451660156, 10.958900451660156, 23.91029930114746, 23.91029930114746, 29.88789939880371]
Traceback (most recent call last):
File "/Users/fan/work/gitlab/ChatPaper/chat_paper.py", line 468, in
main(args=args)
File "/Users/fan/work/gitlab/ChatPaper/chat_paper.py", line 426, in main
paper_list.append(Paper(path=args.pdf_path))
File "/Users/fan/work/gitlab/ChatPaper/get_paper_from_pdf.py", line 16, in init
self.title = self.get_title()
File "/Users/fan/work/gitlab/ChatPaper/get_paper_from_pdf.py", line 160, in get_title
self.title_page = page_index
NameError: name 'page_index' is not defined

诡异的bug

response_time: 9.751 s
Traceback (most recent call last):
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 468, in
main(args=args)
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 436, in main
reader1.summary_with_chat(paper_list=paper_list)
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 244, in summary_with_chat
self.export_to_markdown("\n".join(htmls), file_name=file_name, mode=mode)
File "F:\chatpaper\chatpapernew\ChatPaper-main\chat_paper.py", line 396, in export_to_markdown
with open(file_name, mode, encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: './export\2023-03-20-21-electronics Review Kuruva Lakshmanna 1. Introduction Applications based on smartphones, sensors and actuators are becoming more and New technologies are emerging that evaluate data gathered for practical connections and data in IoT cloud and streaming and fast data analysis in the edge or fog computing and While an IoT has been conducted in recent years, the entire ﬁeld of deep learning in 2. Deep Learning Techniques Stakeholders must clearly grasp the meaning, building blocks, potentials and chal- compared with normal ML methods. The ability to process data is generally dependent Deep learning is a recently developed multilayer neural network learning algorithm. evaluation test for CNN and DBN on the MNIST database and the real-world handwritten 2.1. Supervised Learning The system model for supervised learning is built into a labeled training set. The 2.1.1. Recurrent Neural Networks (RNNs) The RNN is a discriminative categorical method which can process the serial and However, due to the diffusion of gradient problems and longer term dependency, Applications in IoT Prediction of Transport or Smart city Energy Recognising images Education Sport and Retail Detection in physiology and 2.1.2. Long Short Term Memory (LSTM) The LSTM is discriminative method which can work on time-stamp, sequential and gates control the accessing memory cells and preventing distractions by unrelated inputs. Generally, LSTM is the expanding model of RNN. Various LSTM methods are pro- Applications in IoT Prediction Small period trafﬁc prediction Autonomous driving Detection in physiology and Localization Smart home and city Applications in IoT Energy Health-care Education Sport 2.1.3. Convolutional Neural Networks (CNN’s) The CNN is a discriminatory method which can used more for identifying images DNNs with a dense relation between the layers are difﬁcult to train and do not test well Applications in IoT Healthcare Smart home and city Transportation Recognizing images Detection of physiology and Agriculture Sport and Retail Localization Government 2.1.4. Transformer-Based Deep Neural Networks In the deep learning context, the transformer denotes a sequence-to-sequence archi- 2.2. Unsupervised Learning Unsupervised learning must be used as a complement to traditional learning methods Boltzmann machines (RBMs) or stacked auto-encoders to initialize, replicate back and 2.2.1. Autoencoder (AE) The AE is a generative method which can be suitable for extracting the features and Applications in IoT Fault Assessment Image Recognition Detection in physiology and Energy Localization Public Sector IoT Infrastructure 2.2.2. Restricted Boltzmann Machines (RBMs) The RBM is a generative method which can work on various kinds of data and its suit- Applications in IoT Energy Localization Health Sector Intelligent 2.2.3. Deep Belief Networks (DBNs) The DBN is a generative method which can work on various types of data. DBNs can Applications in IoT Transport Energy Health Sector Intelligent Image Recognition Detection of physiology and Security 3. IoT Applications and Challenges The data analysis leads signiﬁcantly to IoT as discussed in the previous section. In this 3.1. Data Features of IoT As data is the basis for the extraction of knowledge, high-quality information is 3.2. Deep Learning Using IoT Devices The availability of the latest IoT frameworks and their open source libraries for con- 3.3. Applications of IoT The IoT application is classiﬁed according to its basic attributes and characteristics. management, etc. We can reduce the noise, pollution, accidents, parking problems, 3.4. Challenges Data sources are a foundation for the success of DL methods. To apply DL to IoT is module for the entire IoT system layout. It should be more reliable, cost effective and For IoT system designers, designing DL is a great challenge to meet the needs of man- and use smart techniques for data retention [ 4. Conclusions In this paper, a review has been presented on the DL and IoT techniques exploited References.md'