Giter VIP home page Giter VIP logo

Comments (3)

yooper avatar yooper commented on May 20, 2024 1

Hello, sorry for my delayed response. To make the word cloud you will need D3 word cloud, https://github.com/jasondavies/d3-cloud

As for the PHP code, here is what I have done in the past ...

use TextAnalysis\Analysis\Keywords\Rake;
use TextAnalysis\Documents\TokensDocument;
use TextAnalysis\Tokenizers\WhitespaceTokenizer;
use StopWordFactory;
use TextAnalysis\Filters;

class WordCloud
{
    const NGRAM_SIZE = 3;
    
    /**
     * @var \TextAnalysis\Interfaces\ITokenTransformation[]
     */
    protected $tokenFilters = [];
    
    /**
     * @var \TextAnalysis\Interfaces\ITokenTransformation[]
     */    
    protected $contentFilters = [];    

    /**
     * The keyword scores are not setup in a compatible way with
     * what D3 cloud expects
     * @param array $keywordScores
     */
    public function getScaledScores($keywordScores)
    {
        $scaleFactor = 1 / array_sum(array_values($keywordScores));
        
        array_walk($keywordScores, 
            function(&$value, $key) use ($scaleFactor){                 
                $value = round($value * $scaleFactor, 5);
            });            
        return $keywordScores;
    }
    
    /**
     * 
     * @return \TextAnalysis\Interfaces\ITokenTransformation[]
     */
    public function getContentFilters()
    {
        if(empty($this->contentFilters)) {
            
            $lambdaFunc = function($word){
                return  preg_replace('/[^[:print:]]/', ' ', $word);
            };
            
            $this->contentFilters = [
                new Filters\StripTagsFilter(),
                new Filters\LowerCaseFilter(),
                new Filters\NumbersFilter(),           
                new Filters\EmailFilter(),
                new Filters\UrlFilter(),
                new Filters\PossessiveNounFilter(),
                new Filters\QuotesFilter(),
                new Filters\PunctuationFilter(),
                new Filters\CharFilter(),
                new Filters\LambdaFilter($lambdaFunc),
                new Filters\WhitespaceFilter()     
            ];
        }
        return $this->contentFilters;
    }
    
    /**
     * 
     * @return \TextAnalysis\Interfaces\ITokenTransformation[]
     */
    public function getTokenFilters()
    {
        if(empty($this->tokenFilters)) {
            $stopwords = StopWordFactory::get('stop-words-fox.txt');
            $this->tokenFilters = [              
                new Filters\StopWordsFilter($stopwords),
            ];
        }        
        return $this->tokenFilters;
    }
    
    /**
     * 
     * @param string $content
     * @return array
     */
    public function getKeywordScores($content)
    {        
        $tokens = (new WhitespaceTokenizer())->tokenize($content);       
        $tokenDoc = new TokensDocument(array_map('strval', $tokens));
        unset($tokens);
                
        foreach($this->getTokenFilters() as $filter)
        {
            $tokenDoc->applyTransformation($filter, false);
        }        
        
        // will return null values in an array
          
        $size = count($tokenDoc->toArray());
        if($size < self::NGRAM_SIZE || !array_filter($tokenDoc->toArray())) {
            return [];
        }           
        
        $rake = new Rake($tokenDoc, self::NGRAM_SIZE);
        return $rake->getKeywordScores();
    }

}

$cloud = new WordCloud();
$scores = $cloud->getKeywordScores("YOUR CONTENT GOES HERE")
// scales the scores for the D3 cloud library
$scaledScores = $cloud->getScaledScores($scores);

You must use $scaledScores with the D3 cloud library. Sorry for the incomplete example. Please post your completed solution and I will use it to update the documentation.

from php-text-analysis.

nickescobedo avatar nickescobedo commented on May 20, 2024

No problem, thank you for this! I'll report back after I try this.

I did get a working prototype with jQCloud and the getKeyValuesByWeight from the FreqDist class.

from php-text-analysis.

yooper avatar yooper commented on May 20, 2024

Sounds good. I am closing this issue.

from php-text-analysis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.