Giter VIP home page Giter VIP logo

docsummarizer's Introduction

GPT 3.5/4 Powered Document Summarizer

This is a tool that takes a text document (PDF or TXT) or YouTube transcript and generates a concise summary using GPT-4 or GPT-3.5-turbo. It can accurately summarize hundreds of pages of text. It's built with Python and Streamlit and leverages the langchain library for text processing. While the final output is generated with either GPT3.5 or GPT4 (the LLM's that power ChatGPT), only a small portion of the overall document is used in the prompts. Before any call is made to either LLM, the document is separated into small sections that contain the majority of the meaning of the document.

Demo it here: https://gptdoc-summarizer.streamlit.app/

Features

  • Supports PDF and TXT file formats
  • Utilizes GPT-4 or GPT-3.5-turbo for generating summaries
  • Automatic clustering of the input document to identify key sections
  • Customizable number of clusters for the summarization process

Usage

  1. Launch the Streamlit app by running streamlit run main.py
  2. Upload a document (TXT or PDF) to summarize.
  3. Enter your OpenAI API key if the free usage cap has been hit.
  4. Choose whether to use GPT-4 for the summarization (recommended, requires GPT-4 API access).
  5. Click the "Summarize" button and wait for the result.

Modules

  • main.py: Streamlit app main file
  • utils.py: Contains utility functions for document loading, token counting, and summarization
  • streamlit_app_utils.py: Contains utility functions specifically for the Streamlit app

Main Functions

  • main(): Entry point for the Streamlit app
  • process_summarize_button(): Processes the "Summarize" button click and displays the generated summary
  • validate_input(): Validates user input and displays warnings for invalid inputs
  • validate_doc_size(): Validates the document size for token limits

Utility Functions

  • doc_loader(): Loads a document from a file path
  • token_counter(): Counts the number of tokens in a text string
  • doc_to_text(): Converts a langchain Document object to a text string
  • doc_to_final_summary(): Generates the final summary for a given document
  • summary_prompt_creator(): Creates a summary prompt list for the langchain summarize chain
  • pdf_to_text(): Converts a PDF file to a text string
  • check_gpt_4(): Checks if the user has access to GPT-4
  • token_limit(): Checks if a document has more tokens than a specified maximum
  • token_minimum(): Checks if a document has more tokens than a specified minimum

docsummarizer's People

Contributors

e-johnstonn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.