Giter VIP home page Giter VIP logo

semantic-splitting-tutorial's Introduction

Semantic Splitting

This tutorial is available on my YouTube channel:

IMAGE ALT TEXT HERE YouTube Video.

Overview

This project provides a Python implementation of Semantic Splitting: a powerful technique for optimizing document segmentation in language models, especially useful in retrieval-augmented generation tasks. By analyzing the semantic relationships within text, it can automatically identifies the best points to split documents, enhancing the performance and relevance of language model responses.

Features

  • Automated semantic analysis for intelligent document splitting.
  • Easy integration with popular language models like GPT-4.
  • Complete Python notebook for end-to-end implementation.
  • Embedding calculations and divergence plotting for optimal segmentation.
  • Peak detection for identifying precise splitting points.

Getting Started

To start using the tutorial, clone this repository and install the required dependencies using poetry:

git clone repo
cd repo
poetry install

Prerequisites

Ensure you have the following installed on your system:

  • Python 3.12 or later
  • Poetry for dependency management

Tutorial Structure

The tutorial is divided into executable Jupyter notebooks, each focusing on different aspects of using Magentic with LiteLLM:

  1. 1_semantic_splitting.ipynb: Semantic splitting by example.

Support and Contribution

For questions, support, or to contribute to this tutorial, please open an issue or pull request on the GitHub repository. I welcome contributions that help enhance and clarify the content.

semantic-splitting-tutorial's People

Contributors

jimzer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.