Giter VIP home page Giter VIP logo

spacy-llm's Introduction

spacy-llms: augmenting nlp pipelines

Screenshot 2023-05-16 at 11 31 42 AM

integration of spacy's components with Large Language Models (LLMs) to boost text processing, entity extraction, NER, and summarization. Includes unit and integration tests, fixtures, and samples.

enabling NLP pipelines with Large Language Models (LLMs), combining spacy's supervised learning or rule-based components with LLM-powered features.

process_text_foo

console-output

installation

the installation steps suit a config:

macos/osx
arm/m1, -conda
cpu
virtual environment
english
efficiency
spacy-quickstart β©© other configs

activate virtual environment and install spacy:

terminal:

conda create -n venv
conda activate venv
conda install -c conda-forge spacy
python -m spacy download en_core_web_sm
python -m spacy validate

en_core_web_sm: a small English model trained on web text.

en_core_web_trf: for accuracy, use a transformer-based model.

i.e.

python -m spacy download en_core_web_trf

see spacy donwload method β©© see spacy models β©©

🏁 start run:

pytest src/test.py
python src/main.py
python src/get_top_ranked_phrases.py

features

βœ”οΈŽ load_model() loads the spacy model. returns the model. i.e. spacy.load("en_core_web_sm")

βœ”οΈŽ process_text_returns_expected_tuples(nlp, text): loads the spacy model, processes text. returns expected tuples. i.e. [(token, POS, dependency)]

βœ”οΈŽ extract_entities_returns_expected_entity_tuples(nlp, text) identifies named entities in text. returns expected entity tuples. i.e. [(entity, label)]

βœ”οΈŽ summarize_text_returns_expected_summary(nlp, text) generates a summary of text by extracting important phrases. returns expected summary. i.e. 'summary'

βœ”οΈŽ get_top_ranked_phrases(text) extracts top ranked phrases from text. returns expected phrases. i.e. [(phrase, rank)]

βœ”οΈŽ @pytest.fixture

βœ”οΈŽ textrank

βœ”οΈŽ pytextrank

βœ”οΈŽ pytest

samples

butyrate_text

butyrate_text = """Trivia: The bacterium Faecalibacterium prausnitzii in the human gut microbiome is responsible for producing butyrate, a short-chain fatty acid.
Explanation: Faecalibacterium prausnitzii utilizes complex carbohydrates, such as dietary fiber, as its primary energy source. Through a fermentation process, it breaks down these carbohydrates into smaller molecules, including butyrate. Butyrate has beneficial effects on gut health, serving as an energy source for colon cells, promoting their growth, maintaining the gut barrier integrity, and reducing inflammation. Faecalibacterium prausnitzii's ability to produce butyrate highlights its importance in maintaining a healthy gut microbiome."""

geosynchronization_text

geosynchronization_text():
return """Trivia: The concept of geosynchronization was first postulated by Arthur C. Clarke.
Explanation: Geosynchronous orbits are orbits around Earth that have an orbital period matching Earth's rotation period.
This results in the satellite appearing stationary with respect to a point on Earth's surface. This concept is crucial in space physics and geodesy,
as it is used in various applications like communication satellites. Arthur C. Clarke, a British science fiction writer,
was the first to postulate this concept, which is why geosynchronous orbits are sometimes referred to as Clarke orbits."""

roadmap

✎ optimize LLM Integration
✎ extend models
✎ api development
✎ testing
✎ dockerization

contributing

To contribute, fork the repository, implement changes, run tests βœ“, and submit a pull request We appreciate and support collaborations 🀝

notes

πŸ’­ forgetfulness
πŸ’­ momentum
πŸ’­ extraction
πŸ’­ dependency parsing
πŸ’­ spacy evaluate
πŸ’­ ner
πŸ€— huggingface transformers
πŸ¦™ spacy-llm
πŸ’­ memory
πŸ’­redis
πŸ’­ system stability

license

mit

acknowledgements

βœ”οΈŽ explosion_ai πŸ’₯
βœ”οΈŽ @spacy_io πŸͺ
βœ”οΈŽ DerwenAI 🌲
βœ”οΈŽ spacy-pytextrank β©©
βœ”οΈŽ {rada,tarau}@cs.unt.edu - textrank: bringing order into texts πŸ—„οΈ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.