Comments (3)
Hello, sorry for my delayed response. To make the word cloud you will need D3 word cloud, https://github.com/jasondavies/d3-cloud
As for the PHP code, here is what I have done in the past ...
use TextAnalysis\Analysis\Keywords\Rake;
use TextAnalysis\Documents\TokensDocument;
use TextAnalysis\Tokenizers\WhitespaceTokenizer;
use StopWordFactory;
use TextAnalysis\Filters;
class WordCloud
{
const NGRAM_SIZE = 3;
/**
* @var \TextAnalysis\Interfaces\ITokenTransformation[]
*/
protected $tokenFilters = [];
/**
* @var \TextAnalysis\Interfaces\ITokenTransformation[]
*/
protected $contentFilters = [];
/**
* The keyword scores are not setup in a compatible way with
* what D3 cloud expects
* @param array $keywordScores
*/
public function getScaledScores($keywordScores)
{
$scaleFactor = 1 / array_sum(array_values($keywordScores));
array_walk($keywordScores,
function(&$value, $key) use ($scaleFactor){
$value = round($value * $scaleFactor, 5);
});
return $keywordScores;
}
/**
*
* @return \TextAnalysis\Interfaces\ITokenTransformation[]
*/
public function getContentFilters()
{
if(empty($this->contentFilters)) {
$lambdaFunc = function($word){
return preg_replace('/[^[:print:]]/', ' ', $word);
};
$this->contentFilters = [
new Filters\StripTagsFilter(),
new Filters\LowerCaseFilter(),
new Filters\NumbersFilter(),
new Filters\EmailFilter(),
new Filters\UrlFilter(),
new Filters\PossessiveNounFilter(),
new Filters\QuotesFilter(),
new Filters\PunctuationFilter(),
new Filters\CharFilter(),
new Filters\LambdaFilter($lambdaFunc),
new Filters\WhitespaceFilter()
];
}
return $this->contentFilters;
}
/**
*
* @return \TextAnalysis\Interfaces\ITokenTransformation[]
*/
public function getTokenFilters()
{
if(empty($this->tokenFilters)) {
$stopwords = StopWordFactory::get('stop-words-fox.txt');
$this->tokenFilters = [
new Filters\StopWordsFilter($stopwords),
];
}
return $this->tokenFilters;
}
/**
*
* @param string $content
* @return array
*/
public function getKeywordScores($content)
{
$tokens = (new WhitespaceTokenizer())->tokenize($content);
$tokenDoc = new TokensDocument(array_map('strval', $tokens));
unset($tokens);
foreach($this->getTokenFilters() as $filter)
{
$tokenDoc->applyTransformation($filter, false);
}
// will return null values in an array
$size = count($tokenDoc->toArray());
if($size < self::NGRAM_SIZE || !array_filter($tokenDoc->toArray())) {
return [];
}
$rake = new Rake($tokenDoc, self::NGRAM_SIZE);
return $rake->getKeywordScores();
}
}
$cloud = new WordCloud();
$scores = $cloud->getKeywordScores("YOUR CONTENT GOES HERE")
// scales the scores for the D3 cloud library
$scaledScores = $cloud->getScaledScores($scores);
You must use $scaledScores with the D3 cloud library. Sorry for the incomplete example. Please post your completed solution and I will use it to update the documentation.
from php-text-analysis.
No problem, thank you for this! I'll report back after I try this.
I did get a working prototype with jQCloud and the getKeyValuesByWeight
from the FreqDist
class.
from php-text-analysis.
Sounds good. I am closing this issue.
from php-text-analysis.
Related Issues (20)
- Add TextRank Algorithm
- Type hint HOT 2
- Poor Vader Sentiment Accuracy. Lots of influential words missing from the vader_lexicon.txt HOT 5
- starting with documentation HOT 2
- CharFilter not working? HOT 2
- Does this repo supports Paraphrasing HOT 1
- Store naive Bayes model HOT 2
- Is there a way to get the output in JSON format? HOT 10
- UTF8 for normalize_tokens HOT 1
- Entity Extraction returns empty array HOT 4
- Can't install this package with Symfony 5 HOT 1
- Entity Text Parser HOT 3
- How can I use the TF-IDF? HOT 5
- Trying to access array offset on value of type bool HOT 3
- False IDF calculation HOT 2
- PHP 7.4 compatability HOT 4
- Multinomial Naive Bayes HOT 1
- Support PHP 8.x HOT 1
- Notice & Warning on lines 216, 217, 219 WordnetCorpus.php HOT 4
- FreqDist::getKeyValuesByWeight HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from php-text-analysis.