Distinct values and distinct count about flexsearch HOT 10 CLOSED

nextapps-de commented on May 6, 2024

Distinct values and distinct count

from flexsearch.

Comments (10)

ts-thomas commented on May 6, 2024

Hello, I know this feature from database queries but sadly this functionality isn't actually supported by FlexSearch.

from flexsearch.

georgyfarniev commented on May 6, 2024

@ts-thomas what preventing us from support this option by adding matching distinct values to object and accumulating it during search iteration? I just want to know how difficult is to implement it and if there's chances that you will accept pull request with it.

from flexsearch.

ts-thomas commented on May 6, 2024

Sounds good to me. What would help me is a short example of a small set of documents and the desired result when searching. That would give me a better insight.

from flexsearch.

georgyfarniev commented on May 6, 2024

@ts-thomas I will provide you an proposed example when possible. Are you considering to split large chunk of code onto modules to simplify development? it could be helpful for creating PR's

from flexsearch.

ts-thomas commented on May 6, 2024

Yes of course, it is already on my plan. I'm considering to use the new ES6 modules functionality because it is also compatible with Closure Compiler. Another option is to port the codebase to TypeScript. Would be nice to know, how TypeScript could be compiled into other programming languages easily (I personally targeting C/C++, Java, Python). Java JNI is also an option for me for this purpose.

from flexsearch.

georgyfarniev commented on May 6, 2024

Here is some proposed examples:

const documents = [
  { id: 1, data: 'text 1', category: 1 },
  { id: 2, data: 'text 2', category: 1 },
  { id: 3, data: 'text 1', category: 2 }
]


// Getting distinct values
const results = index.search({
  query: text,
  distinct: ['data', 'category']
})


// results containing:
{
  documents: [
    { id: 1, data: 'text 1', category: 1 },
    { id: 2, data: 'text 2', category: 1 },
    { id: 3, data: 'text 1', category: 2 }
  ],
  distinct: {
    category: [1, 2],
    data: ['text 1', 'text 2']
  }
}

// Getting distinct count
const results = index.search({
  query: 'text',
  distinct_count: ['data', 'category']
})


// results containing:
{
  documents: [
    { id: 1, data: 'text 1', category: 1 },
    { id: 2, data: 'text 2', category: 1 },
    { id: 3, data: 'text 1', category: 2 }
  ],
  distinct_count: {
    category: 2,
    data: 2
  }
}

Note that sometimes distinct values will be too large and optionally we should support only distinct count as well. One more part is that found documents is returned in separate field, thus it give flexibility to store additional data in return result without complicating overall API.

About C++, I'm not quite sure that it good idea to compile typescript to C++. In my experience, I had very successful case of implementing algorithms itself in C\C++ library, then wrapping it to use by other scripting languages, probably it good idea in your case too. For example, you can use swig to create adapter for upper level languages. Only one disadvantage here is that it would not work for web browser usage, but it could significantly reduce overhead of re-implementing algorithm for every scripting language.

I think good starting point here could be creation of robust, well documented and stable TypeScript implementation

from flexsearch.

ts-thomas commented on May 6, 2024

Thanks a lot for the example. The last thing which is not clear for me is what is the main purpose to have distinct in the result? I think this would help me to understand the requirements. I also added this feature to the milestones https://github.com/nextapps-de/flexsearch/milestone/25

The separate field for the results is a good point, because it is also needed by the pagination.

The TypeScript port of the core functionality as a ultimate base makes a lot of sense and will come surely.

from flexsearch.

georgyfarniev commented on May 6, 2024

Simple example where distinct is useful:

Let’s say that we store category id within product document. When we query for products, we also need to know in which categories the search result products are. It’s useful to perform filtering after search query was executed. Sorry for my poor English.

from flexsearch.

ts-thomas commented on May 6, 2024

Thanks for providing me an useful example. This feature would be possible and may coming soon. They may some other tasks which needs to be done before (like Plugin-API), the distinct would be a good example for a plugin.

from flexsearch.

ts-thomas commented on May 6, 2024

This feature was added to the milestones. It should be build on top of the upcoming Plugin API. This makes it easier for everyone to build features without understanding the whole algorithm.

from flexsearch.

Distinct values and distinct count about flexsearch HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent