Giter VIP home page Giter VIP logo

Comments (8)

jprante avatar jprante commented on August 20, 2024

Do you want TF/IDF for the index (shard), or for the document?

from elasticsearch-index-termlist.

fsieduc avatar fsieduc commented on August 20, 2024

My objective is to automate the construction of the Completion Suggester's documents (https://www.elastic.co/blog/you-complete-me) with the "best" terms found in an index/type.

Ideally I need the tf-idf to be calculated on some documents of the index/type, not on all documents.
Ex: give me the terms and their tf-idf, for the documents in index/type "yesterday/tweets" matching "foo".
The corpus is then all the documents (tweets) in the index "yesterday" and the type "tweets" matching "foo".

In that case the tf does not change but the idf does.

So by order or preference I would like to have :

  • the list of terms with theirs tf-idf for a selection of documents found in index/type (utopic)
  • the list of terms with theirs tf-idf for all documents in an index/type
  • the list of terms with theirs tf-idf for all documents in an index

Do you think it is possible for you ?

Thanks

from elasticsearch-index-termlist.

jprante avatar jprante commented on August 20, 2024

Yes, this is possible.

The selection of documents found in index/type is not utopic. I can walk though a search result with scan/scroll, then retrieve doc-by-doc. This may take extreme amount of time (hours), and might only be available as file output, not over REST API.

I am not sure how this can be useful for completion suggester FST construction. Synonyms, stopwords, phrases, and all the goodies of the Lucene suggesters are not available in term list construction.

from elasticsearch-index-termlist.

fsieduc avatar fsieduc commented on August 20, 2024

Ok seems complex, my understanding of ES/Lucene is not suffisant.

Do you think that the option "the list of terms with theirs tf-idf for all documents in an index" is possible via REST ?

from elasticsearch-index-termlist.

jprante avatar jprante commented on August 20, 2024

Yes, I think so.

from elasticsearch-index-termlist.

fsieduc avatar fsieduc commented on August 20, 2024

Great news. Can you imagine adding this option in your project ? (like the &totalfreq=1)

from elasticsearch-index-termlist.

jprante avatar jprante commented on August 20, 2024

Of course, please stay tuned.

from elasticsearch-index-termlist.

fsieduc avatar fsieduc commented on August 20, 2024

Youhhouuu

from elasticsearch-index-termlist.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.