Giter VIP home page Giter VIP logo

spacy-llm's People

Contributors

patmejia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

spacy-llm's Issues

Import "spacy" could not be resolved from source Pylance(reportMissingModuleSource)

Debug:

Terminal

  1. Create a virtual environment:
python -m venv .env
  1. Activate the virtual environment:
source .env/bin/activate  # Unix/Linux/Mac
.env\Scripts\activate.bat  # Windows
  1. or, Activate conda (if not already activated):
conda activate spacy-llm
  1. Install the spacy-llm package using conda:
conda install spacy-llm
  1. Validate installation
python -m spacy validate

In VScode:

  • cmd + p

  • > Python: Select interpreter + return

Screenshot 2023-05-14 at 10 01 12 PM
  • select interpreter at workspace level
Screenshot 2023-05-14 at 10 02 01 PM

References:

AttributeError: module 'pytextrank' has no attribute 'TextRank'

AttributeError: module 'pytextrank' has no attribute 'TextRank'

reproduce err:

run:

def summarize_text_returns_expected_summary(nlp, text):
    doc = process_text(nlp, text)
    if 'textrank' not in nlp.pipe_names:
        tr = pytextrank.TextRank()
        nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
    doc = nlp(text)
    return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]

omitting the if statement, risks encountering errors when accessing textrank: the script won't check if textrank is present in the pipeline.

error:

AttributeError: module 'pytextrank' has no attribute 'TextRank'

fix:

step_1

check pytextrank installation

pip list | grep pytextrank
Screenshot 2023-05-16 at 12 18 10 AM

step_2

replace:

tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)

with:

nlp.add_pipe("textrank")

updated code:

def summarize_text_returns_expected_summary(nlp, text):
    doc = process_text(nlp, text)
    if 'textrank' not in nlp.pipe_names:
        nlp.add_pipe("textrank")
    doc = nlp(text)
    return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]

why?

spacy pipeline: sequence of processing steps (tokenization, POS tagging, NER).

incorrect code manually uses pytextrank.TextRank(), then attempts to add it to the pipeline.

tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)

correct code:

nlp.add_pipe("textrank")

auto adds textrank component correctly, ensuring proper registration and accessibility.

adding TextRank to the spacy pipeline registers its methods, attributes, and allows access via ._ on documents (e.g., doc._.textrank.summary()).

notes on module 'pytextrank' has no attribute 'parse_doc

a parser is often a necessary component in NLP pipeline.

it can be added to the pipeline alongside PyTextRank.

since:

error msg indicates that the parse_doc function is not found in the pytextrank module. potentially, due to changes in the pytextrank library: some functions might have been removed; or simply, do not exist.

do instead:

load a spacy parser, and add it to the pipeline along pytextrank.

i.e. the spacy small english model en_core_web_sm tokenizes the text before parsing it.

example:

import spacy
import pytextrank
import json

def get_top_ranked_phrases(text):
   nlp = spacy.load("en_core_web_sm")

   nlp.add_pipe("textrank")
   doc = nlp(text)

   top_phrases = []

   for phrase in doc._.phrases:
       top_phrases.append({
           "text": phrase.text,
           "rank": phrase.rank,
           "count": phrase.count,
           "chunks": phrase.chunks
       })

   return top_phrases

sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'

top_phrases = get_top_ranked_phrases(sample_text)

for phrase in top_phrases:
   print(phrase["text"], phrase["rank"], phrase["count"], phrase["chunks"])

output:

Screenshot 2023-05-16 at 1 47 09 AM

code notes:

โœ”๏ธŽ load spacy small english model

โœ”๏ธŽ add pytextrank to pipeline

โœ”๏ธŽ store the top-ranked phrases

โœ”๏ธŽ examine the top-ranked phrases in the document

โœ”๏ธŽ print the top-ranked phrases

link to repo: https://github.com/patmejia/spacy-llm

thanks to:

-Paco Nathan
-DerwenAI
-Victoria Stuart
-spacy-pytextrank
-textrank: bringing order into text
-keywords and sentence extraction with textrank (pytextrank)
-ๆจกๅ—'pytextrank'ๆฒกๆœ‰ๅฑžๆ€ง'parse_doc'
-module-pytextrank-has-no-attribute-parse-doc
-scattertext/issues/92

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.