patmejia / spacy-llm Goto Github PK
View Code? Open in Web Editor NEW๐ฆ๐ช fusion of spacy's supervised learning or rule-based components; spacy-llms engaged: text processing, entity extraction & summaries
License: MIT License
๐ฆ๐ช fusion of spacy's supervised learning or rule-based components; spacy-llms engaged: text processing, entity extraction & summaries
License: MIT License
python -m venv .env
source .env/bin/activate # Unix/Linux/Mac
.env\Scripts\activate.bat # Windows
conda activate spacy-llm
spacy-llm
package using conda
:conda install spacy-llm
python -m spacy validate
cmd + p
> Python: Select interpreter
+ return
AttributeError: module 'pytextrank' has no attribute 'TextRank'
run:
def summarize_text_returns_expected_summary(nlp, text):
doc = process_text(nlp, text)
if 'textrank' not in nlp.pipe_names:
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
doc = nlp(text)
return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]
omitting the
if
statement, risks encountering errors when accessingtextrank
: the script won't check iftextrank
is present in the pipeline.
AttributeError: module 'pytextrank' has no attribute 'TextRank'
step_1
check pytextrank
installation
pip list | grep pytextrank
step_2
replace:
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
with:
nlp.add_pipe("textrank")
updated code:
def summarize_text_returns_expected_summary(nlp, text):
doc = process_text(nlp, text)
if 'textrank' not in nlp.pipe_names:
nlp.add_pipe("textrank")
doc = nlp(text)
return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]
spacy pipeline: sequence of processing steps (tokenization, POS tagging, NER).
incorrect code manually uses pytextrank.TextRank()
, then attempts to add it to the pipeline.
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
correct code:
nlp.add_pipe("textrank")
auto adds textrank
component correctly, ensuring proper registration and accessibility.
adding
TextRank
to the spacy pipeline registers its methods, attributes, and allows access via._
on documents (e.g.,doc._.textrank.summary()
).
module 'pytextrank' has no attribute 'parse_doc
a parser is often a necessary component in NLP pipeline.
it can be added to the pipeline alongside PyTextRank.
since:
error msg indicates that the parse_doc
function is not found in the pytextrank
module. potentially, due to changes in the pytextrank library: some functions might have been removed; or simply, do not exist.
do instead:
load a spacy parser
, and add it to the pipeline along pytextrank
.
i.e. the spacy small english model en_core_web_sm
tokenizes the text before parsing it.
import spacy
import pytextrank
import json
def get_top_ranked_phrases(text):
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")
doc = nlp(text)
top_phrases = []
for phrase in doc._.phrases:
top_phrases.append({
"text": phrase.text,
"rank": phrase.rank,
"count": phrase.count,
"chunks": phrase.chunks
})
return top_phrases
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'
top_phrases = get_top_ranked_phrases(sample_text)
for phrase in top_phrases:
print(phrase["text"], phrase["rank"], phrase["count"], phrase["chunks"])
output:
code notes:
โ๏ธ load spacy small english model
โ๏ธ add pytextrank to pipeline
โ๏ธ store the top-ranked phrases
โ๏ธ examine the top-ranked phrases in the document
โ๏ธ print the top-ranked phrases
link to repo: https://github.com/patmejia/spacy-llm
-Paco Nathan
-DerwenAI
-Victoria Stuart
-spacy-pytextrank
-textrank: bringing order into text
-keywords and sentence extraction with textrank (pytextrank)
-ๆจกๅ'pytextrank'ๆฒกๆๅฑๆง'parse_doc'
-module-pytextrank-has-no-attribute-parse-doc
-scattertext/issues/92
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.