Giter VIP home page Giter VIP logo

file_management_sys's Introduction

File Management System

Overview

This File Management System is designed to automate the extraction of structured data from various document formats. It creates summaries, critiques of PowerPoint presentations, visualizes complex data trapped in PDFs, and ensures outputs are trustworthy and relevant to user queries. This system is ideal for users needing to manage and analyze large volumes of document data efficiently.

Features

  • Data Extraction: Automatically extract structured data from diverse document formats.
  • PowerPoint Processing: Generate summaries and critiques of PowerPoint presentations.
  • Data Visualization: Convert complex data from PDFs into visual representations.
  • Trustworthy Outputs: Ensures the accuracy and relevance of outputs to user queries.

Experimental Notebooks

The system includes three Jupyter notebooks that handle different file types:

  1. PowerPoint Files: pptx_data.ipynb - Handles the extraction and analysis of data from .pptx files.
  2. PDF Files: pdf_data.ipynb - Focuses on extracting and visualizing data from .pdf files.
  3. CSV Files: csv_data.ipynb - Efficiently processes and analyses .csv files, favored for its reliability and ease of use.

Backend

The system leverages a separate Django Rest Framework (DRF) app to handle queries and processing for any file type. This backend is designed to provide flexibility and robustness in managing a variety of document formats. It enables data analytic Retriever-And-Generator (RAG) operations on any uploaded file through an API. This feature is optional and can be integrated based on specific user needs.

To utilize this backend:

  1. API Integration: The backend can be accessed via an API, allowing users to perform operations on uploaded files directly through API calls.
  2. Data Analytics: Implements a Retriever-And-Generator approach for data extraction and analysis, providing advanced insights and summaries based on the content of the files.
  3. Optional Use: This backend module is optional and can be enabled or disabled as required by the user, ensuring that resources are utilized efficiently only when needed.

Evaluation Report

  • CSV Files: The system performs exceptionally well with CSV files, providing reliable and efficient data processing.
  • PDF Files: While the system generally handles PDFs effectively, there are instances where the visualization code may not execute correctly, requiring the user to refine the code manually.
  • PowerPoint Files: The PowerPoint handling is robust, offering detailed summaries and critiques that add value to presentation analysis. However, it shares a similar problem with the PDF files.

Getting Started

To get started with the File Management System, clone the repository and install the required dependencies:

git clone [repository-url]
cd file-management-system
pip install -r requirements.txt


To run any of the notebooks, use:
`jupyter notebook [notebook-name.ipynb]`

file_management_sys's People

Contributors

omarquess avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.