nextapps-de / flexsearch Goto Github PK

Next-Generation full text search library for Browser and Node.js

License: Apache License 2.0

JavaScript 100.00%

search search-algorithm search-engine searching-algorithms searching search-in-text full-text-search fulltext-search elasticsearch nodejs

flexsearch's Issues

TypeError C is not a function

Thanks for this capability. I am excited to learn how this works for several use cases I have

I ran your 'best practice with some modeification. I cannot find the source of the error ...
"c is not a function'

Here is my code

const FlexSearch = require("flexsearch")

const bookstore = new FlexSearch();
const pizzashop = new FlexSearch();
const votingbooth = new FlexSearch();

let settings = {
action: "score",
adventure: {
encode: "extra",
tokenize: "strict",
depth: 5,
threhold: 5,
doc: {
id: "id",
field: ["intent", "text"]
} },

comedy: {
    encode: "advanced",
    tokenize: "forward",
    threshold: 5
}

}
let index = {}

const add = (id, cat, intent, text) => {
console.log(gr(Starting on Index ${id}))
console.log(for ${cat}, ${intent}, ${text})
try {
(index[cat] || (
index[cat] = new FlexSearch(settings[cat])
)).add(id, intent, text);
} catch(error) {
console.log(error)
}

}

const search = (cat, query) => {
return index[cat] ? index[cat].search(query) : [];
}

let x = 0
training.map((t) => {
console.log(b(Creating index ${x}))
x++
add(x, "bookstore", t.intent, t.text);
add(x, "pizzashop", t.intent, t.text);
add(x, "votingbooth", t.intent, t.text);
})

//add(1, "action", "Movie Title");
//add(2, "adventure", "Movie Title");
//add(3, "comedy", "Movie Title");

console.log(r(THIS SHOULD EXECUTE LAST))
//index.update(10025, "Road Runner");
//index.remove(10025);
var result1 = search("bookstore", "i am searching for a book"); // --> [1]
var result2 = search("pizzashop", "howdy"); // --> [1]
var result3 = search("votingboooth", "i need directions"); // --> [1]

console.log(========== FAST SEARCH TEST ==========)
console.log(result1)
console.log(result2)
console.log(result3)

The log shows an empty array

Sorting

Pretty neat. Performances really well.

I read in #7 "Flexsearch is a micro library whose complexity we want to keep as low as possible in the core. "

What about sorting? We are currently considering replacing our list filters by flexsearch. It would be nice to use the same index also for sorting.

Logical Operator (Please Vote)

Which kind of expression do you prefer?

1. required / optional / prohibited

var results = index.search([{
    field: "title",
    query: "foobar",
    presence: "required"
},{
    field: "body",
    query: "content",
    presence: "optional"
},{
    field: "blacklist",
    query: "xxx",
    presence: "prohibited"
}]);

2. and / or / not

var results = index.search([{
    field: "title",
    query: "foobar",
    bool: "and"
},{
    field: "body",
    query: "content",
    bool: "or"
},{
    field: "blacklist",
    query: "xxx",
    bool: "not"
}]);

3. + / -

var results = index.search([{
    field: "+title",
    query: "foobar"
},{
    field: "body",
    query: "content"
},{
    field: "-blacklist",
    query: "xxx"
}]);

Port of this library for Ruby

Hi @ts-thomas ,

I am a beginner to open source contribution / projects. I want to work on the port of this library for Ruby. If possible can you point towards any reference/article/blog post related to scoring algorithm and other implementations used in this library. If anyone is already working on this library for Ruby, please let me know, I would also love to contribute to the project.

How to configure for searching mixed language text

for example: "PostgreSQL快速入门".

The benchmarks with different presets seem unfair

The benchmarks for query and memory tests use different presets, but compare to same config of other libraries.
It would be helpful to be able to compare the difference of flexsearch performance between presets, while showing a full, unbiased picture.

Error while loading language files on node.js

I'm trying to load the language files to use with the stemmer p.e., but I'm getting a TypeError: Cannot read property 'registerLanguage' of undefined error.

var FlexSearch = require('flexsearch')
require(require('flexsearch/lang/en')

The error seems to indicate that the flexsearch object is not in scope, but when pass it as a global variable I get the same error. Am I missing something here?

How to know total number of matched items with pagination?

Is it possible to know total number of items found, to know how many pages in pagination should be displayed?

Error on compile.js

I tried to build this library with 'npm run build-compact' and got some errors like below :

/bin/sh: -c: line 0: unexpected EOF while looking for matching '' /bin/sh: -c: line 1: syntax error: unexpected end of file { Error: Command failed: java -jar node_modules/google-closure-compiler-java/compiler.jar --compilation_level=ADVANCED_OPTIMIZATIONS --use_types_for_optimization=true --new_type_inf=true --jscomp_warning=newCheckTypes --generate_exports=true --export_local_property_definitions=true --language_in=ECMASCRIPT6_STRICT --language_out=ECMASCRIPT6_STRICT --process_closure_primitives=true --summary_detail_level=3 --warning_level=VERBOSE --emit_use_strict=true --output_manifest=log/manifest.log --output_module_dependencies=log/module_dependencies.log --property_renaming_report=log/renaming_report.log' --js='flexsearch.js' --js='lang/**.js' --js='!lang/**.min.js' --define='RELEASE=compact' --define='DEBUG=false' --define='PROFILER=false' --define='SUPPORT_WORKER=false' --define='SUPPORT_ENCODER=true' --define='SUPPORT_CACHE=false' --define='SUPPORT_ASYNC=true' --define='SUPPORT_PRESETS=true' --define='SUPPORT_SUGGESTIONS=false' --define='SUPPORT_SERIALIZE=false' --define='SUPPORT_INFO=false' --define='SUPPORT_DOCUMENTS=true' --define='SUPPORT_WHERE=false' --define='SUPPORT_LANG_DE=false' --define='SUPPORT_LANG_EN=false' --js_output_file='dist/flexsearch.compact.js' && exit 0

and just found a simple error in 'compile.js(116:92)'.

exec("java -jar node_modules/google-closure-compiler-java/compiler.jar" + parameter + "' --js='flexsearch.js' --js='lang/**.js' --js='!lang/**.min.js'" + flag_str + " --js_output_file='dist/flex search." + (options["RELEASE"] || "custom") + ".js' && exit 0", function(){

After removing the unnecessary single quotation after parameter + ", the build process worked fine.
I think it's just a mistyping... maybe. 😓

Can't destroy index if created with doc parameter

flexsearch version 0.5.1

Problem

Can't destroy index instance in the browser because of the error.

Details

Here is test HTML:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title>Benchmark Presets</title>
    <style>
        body{
            font-family: sans-serif;
        }
        table td{
            padding: 1em 2em;
        }
        button{
            padding: 5px 10px;
        }
    </style>
</head>
<body>
<div id="container"></div>
<script src="../dist/flexsearch.min.js"></script>
<script>
  (function(){
    var index = new FlexSearch({
        doc: {
            id: 'id',
            field: 'title'
        }
    });
    index.add([
      { id: 1, title: 'foo' },
      { id: 2, title: 'bar' }
    ])
    console.log(index.search('foo'))
    index.destroy()
  })();
</script>
</body>
</html>

Window console displays error:

 TypeError: a is undefined[Learn More]
flexsearch.min.js:33:45

Relevance: can it be based on number of times a term occurs?

I would expect that if I search for a term, and that term appears once in document A but several times in document B, that B would have a higher position in the results than A. But that does not seem to be the case.

Example:

const FlexSearch = require(`flexsearch`)

const index = new FlexSearch({
	tokenize: `strict`,
	encode: `advanced`,
	cache: false,
	doc: {
		id: `id`,
		field: {
			content: {
				threshold: 9,
				resolution: 10,
			},
		},
	},
})

index.add([{
	id: 1,
	content: `billy bob thorton`,
}, {
	id: 2,
	content: `billy who now what billy okay so what now thorton?`,
}])

console.log(
	index.search(`billy`)
)
// => [ { id: 1, content: 'billy bob thorton' },
//  { id: 2,
//    content: 'billy who now what billy okay so what now thorton?' } ]

I would expect that a search for billy would have a higher score for document id 2 than document id 1, but the search returns document id 1 as the top result.

Tested with [email protected].

Property 'length' of undefined when using web-worker

I tried setting the "worker" option to false and everything worked very well. But when I enable this option and set it to any number different than false, my console prints "Uncaught (in promise) TypeError: Cannot read property 'length' of undefined".

Here is the screenshot:

I have around 30.000 items, thats why I want to use the web worker feature.

Any ideas? I can give another informations if necessary.

Development Roadmap (Please Participate)

Please make suggestions or give some feedback.

1. Extract Core Functionality

The extraction of the core functionality is basically required for many upcoming features as well as for still existing ones, like:

Plugin API
Custom Tooling
Language-specific ports or migrations
Pluggable Workflows
All kinds of extensions

These still existing features has to remain as a core functionality:

Lexical Pre-Scored Index
Contextual-based Map
Index-related Settings:
- threshold
- resolution
- depth
- rtl
Matching Tokens (Query)
Cursor-based Pagination
Logical Operators
Cross-Process Intersection
Index-based Suggestions

The basic core API should have this methods:

create
init
add
update
remove
destroy
match (search)

These missing features also needs to be integrated as a core functionality:

Providing abstract I/O, supporting various kinds of index storage:
- In-Memory
- Partial Persistent Storage (persistent documents, in-memory index)
- Storage-only (persistent documents, persistent index)

These functions should be extracted as an optional tooling:

System-specific Features (Browser, Node.js):
- Web Worker
- Async
Language-specific Features:
- Encoder
- Tokenizer
- Matcher, Stemmer, Filter
Documents (Field-Search)
Custom Search
Find / Where / Tags
Export / Import (Serialization)
Cache
Presets

2. Plugin API

The plugin API is required to provide additional tooling and features in a modular and extendable manner. The plugin API should have these capabilities:

Extend via ad hoc methods
Extend via pipeline
Extend via events (callbacks)
Plugin Package Descriptor

3. Prerequisites

Extract language-specific logic
Provide process connectivity and refactor

4. Language Port

There are several requests of a TypeScript port. The advantage of TypeScript compared to plain JavaScript may be too less, since the TypeScript also compiles to JavaScript and is also less optimized as the Google Closure Compiler for that purpose.

Technically there are two targets:

Browser
System (OS)

Browsers are actually covered as well as Node.js. Making a TypeScript port will do not cover any additional ecosystem. Only the formal codebase will differ and at the end it is just a different pattern for the same result. That's why I prefer a browser-less system-wide port over TypeScript. The language Rust is pretty close to TypeScript/JavaScript and covers 2., so this might be a better candidate for a port.

There is no final decision at the moment, so let us discuss pro and cons here.

Serializing as stream instead of string

I'm trying to create an index over a large dataset and I want to separate the script that's creating the index from the script that's using the index. The index creation seems to work very well, but when I use index.export(), I'm getting a RangeError: Invalid string length error. Is there a way to export the index as a file without getting this error? A possible solution would be to allow exporting via a stream that could be written to a file directly.

Thanks!

Document search not returning results

If the same word appears in different doc fields, the search returns no results.

Demo (open console):
https://stackblitz.com/edit/flexsearch

React Component?

Is there any plan to make a React Component from flexsearch ?

FuzzySearch and more usage examples

Hi, I've been using - https://github.com/jeancroy/FuzzySearch
...which I've found to be very quick. How does FuzzySearch compare to flexsearch?

Be great if you could make some more usage examples.

Multivalue attributes

What is the best way to handle documents with multi value attributes?
For example a document with a m:n relation to another entity.

Serialize/Deserialize for SSR ?

Does the library support serialize/deserialize flexsearch object as json ?
I'd love to create index in Node , but will deserialize the object in browser for client-side searching.

Exception thrown when searching for a value containing whitespace where suggest is set to true

Hi Thomas

Using the following example

const FlexSearch = require('./flexsearch')

const fs = new FlexSearch({
  encode: 'extra',
  tokenize: 'full',
  threshold: 1,
  depth: 4,
  resolution: 9,
  async: false,
  worker: 1,
  cache: true,
  suggest: true,
  doc: {
    id: 'id',
    field: [ 'intent', 'text' ]
  }
})

fs.add([
  {
    id: 0,
    intent: 'intent',
    text: 'text'
  }, {
    id: 1,
    intent: 'intent',
    text: 'howdy - how are you doing'
  }
])

console.log('INFO', fs.info())

const result = fs.search('howdy', { bool: 'or' })
console.log('RESULT', result)

const result2 = fs.search('howdy -', { bool: 'or' })
console.log('RESULT', result2)

An exception is thrown using 'howdy - as search parameter. When setting suggest to false, the search is successful, but the search for howdy - does not find any results.

The exception thrown is

.../search/flexsearch.js:3308
                    z = suggestions.length;
                                    ^

TypeError: Cannot read property 'length' of undefined
    at intersect (.../servers/search/flexsearch.js:3308:37)
    at FlexSearch.merge_and_sort (.../servers/search/flexsearch.js:1393:22)
    at FlexSearch.search (.../servers/search/flexsearch.js:1561:43)
    at Object.<anonymous> (.../servers/search/test2.js:33:19)
    at Module._compile (internal/modules/cjs/loader.js:734:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:745:10)
    at Module.load (internal/modules/cjs/loader.js:626:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:566:12)
    at Function.Module._load (internal/modules/cjs/loader.js:558:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:797:12)

In flexsearch on line 3068

function intersect(arrays, limit, cursor, suggest, bool, has_not) {

            let result = [];
            let suggestions;
            const length_z = arrays.length;

suggestions is not being assigned, because the while loop on line 3133 is false

while(++z < length_z){

so the assignment of the suggestion variable on line 3211 is bypassed

                    let found = false;

                    i = 0;
                    suggestions = [];

                    while(i < length){

The reason the search for howdy -, when suggestions is false, is unsuccessful is probably because of the options passed in. Should I implement my own tokenizer if I would like to find queries like howdy -?

Thanks in advance

Regards
William

Paging with mutltiple fields/boost.

Setup

var index = FlexSearch.create({
    doc: {
        id: "url",
        field: [
            "title",
            "content"
        ]
    }
});

Working

Invoke:

index.search(
    "test",
    {
        page: true,
        limit: 5
    })

Result:

{
  "page": "0",
  "next": "5",
  "result": [
    {
      "title": "Load Testing V. 1.0.1",
      "content": "test",
      "url": "/Project_Management/validations/validation2"
    },
    {
      "title": "Pre Test Inpsection Report",
      "content": "test",
      "url": "/V_and_V/5016-09-F21"
    },
    {
      "title": "Packaging Validaiton Test Report",
      "content": "test",
      "url": "/V_and_V/5016-09-F19"
    },
    {
      "title": "EMC 60601 Test Plan",
      "content": "test",
      "url": "/V_and_V/5016-09-F23"
    },
    {
      "title": "Third Party Testing",
      "content": "test",
      "url": "/3rd_Party_Testing"
    }
  ]
}

Not working

Invoke:

index.search(
    [
        {
            field: "title",
            query: "test",
            boost: 1
        },
        {
            field: "content",
            query: "test",
            boost: 0.5
        }
    ],
    {
        page: true,
        limit: 5
    }));

Result:

{
  "page": "0",
  "next": null,
  "result": [
  ]
}

Comments

I need to be able to page the results, while also search multiple fields with different boost values.

Data doesn't get indexed

I am trying to run the example you posted in issue #30 without any luck.

Here is the code:

const FlexSearch = require('flexsearch')

// provide a document descriptor for each index
// the field "id" and at least one "field" is mandatory.

const settings = {
  'bookstore': {
    preset: 'score',
    doc: {
      id: 'id',
      field: ['intent', 'text']
    }
  },
  'pizzashop': {
    encode: 'extra',
    tokenize: 'strict',
    depth: 5,
    threshold: 5,
    doc: {
      id: 'id',
      field: ['intent', 'text']
    }
  },
  'votingbooth': {
    encode: 'advanced',
    tokenize: 'forward',
    threshold: 5,
    doc: {
      id: 'id',
      field: ['intent', 'text']
    }
  }
}

const index = {}

const add = (cat, doc) => {
  const i = index[cat] || (
    index[cat] = new FlexSearch(settings[cat])
  )
  i.add(doc)
}

const search = (cat, query) => {
  return index[cat] ? index[cat].search(query) : []
}

// provide documents which have the same structure as defined in the document descriptor above

const bookstore = [{
  id: 0,
  intent: 'intent',
  text: 'text'
}, {
  id: 1,
  intent: 'intent',
  text: 'i am searching for a book'
}]

const pizzashop = [{
  id: 0,
  intent: 'intent',
  text: 'text'
}, {
  id: 1,
  intent: 'intent',
  text: 'howdy'
}]

const votingbooth = [{
  id: 0,
  intent: 'intent',
  text: 'text'
}, {
  id: 1,
  intent: 'intent',
  text: 'i need directions'
}]

// add a full document or an array of documents to the index

add('bookstore', bookstore)
add('pizzashop', pizzashop)
add('votingbooth', votingbooth)

console.log('INFO', index['bookstore'].info())
console.log('INFO', index['pizzashop'].info())
console.log('INFO', index['votingbooth'].info())

console.log('INFO', index['bookstore'])
// search

const result1 = search('bookstore', 'i am searching for a book') // --> [1]
const result2 = search('pizzashop', 'howdy') // --> [1]
const result3 = search('votingbooth', 'i need directions') // --> [1]

console.log('========== FAST SEARCH TEST ==========')
console.log(result1)
console.log(result2)
console.log(result3)

and the ouput I get is:

INFO { id: 0,
  memory: 0,
  items: 0,
  sequences: 0,
  chars: 0,
  cache: false,
  matcher: 0,
  worker: undefined,
  threshold: 1,
  depth: 4,
  contextual: true }
INFO { id: 3,
  memory: 0,
  items: 0,
  sequences: 0,
  chars: 0,
  cache: false,
  matcher: 0,
  worker: undefined,
  threshold: 5,
  depth: 5,
  contextual: true }
INFO { id: 6,
  memory: 0,
  items: 0,
  sequences: 0,
  chars: 0,
  cache: false,
  matcher: 0,
  worker: undefined,
  threshold: 5,
  depth: 0,
  contextual: 0 }
INFO k {
  id: 0,
  o: [],
  f: 'strict',
  w: false,
  async: false,
  threshold: 1,
  b: 9,
  depth: 4,
  C: false,
  m: false,
  s: [Function: bound ],
  a:
   { id: [ 'id' ],
     field: [ [Array], [Array] ],
     index: { intent: [k], text: [k] },
     keys: [ 'intent', 'text' ] },
  h:
   [ [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {} ],
  i: [Object: null prototype] {},
  c: [Object: null prototype] {},
  g:
   [Object: null prototype] {
     '0': { id: 0, intent: 'intent', text: 'text' },
     '1':
      { id: 1, intent: 'intent', text: 'i am searching for a book' } },
  v: true,
  cache: false,
  j: false }
========== FAST SEARCH TEST ==========
[]
[]
[]

Am I missing something?

Node version: 11.9.0

Thanks in advance...

Funding?

Idea: implement funding of this project using something like issuehunt

v0.6.21 TypeError: Cannot convert object to primitive value

Hello,

I've faced with the error: TypeError: Cannot convert object to primitive value

It is produced in the .add method.

Version: 0.6.21

Version 0.6.2 works as expected.

Cyrillic languages support

Hello,

I've faced with the following behaviour.

This example works as expected:

const FlexSearch = require('flexsearch');
const index = new FlexSearch();

index.add(1, 'Foobar')
console.log(index.search('Foobar'));
// [ 1 ]

But this one shows no results.

const FlexSearch = require('flexsearch');
const index = new FlexSearch();

index.add(1, 'Фообар')
console.log(index.search('Фообар'));
// []

I've tested in node and in browser.

Offset support for implementing pagination

Hello. Is there any plans to support offset in addition to limit for implementing pagination? Thanks in advance.

Pagination: forwards and backwards

The next page is not a problem, but the previous one. When I call the previous page, I get an array instead of an object. Then the fields for the page are also missing.
Could you give an simple example of a pagination back and forth?

Error when search "john wick" on demo site

Hi,
Your work is great.

After playing around, i found an issue with your demo.

Search string "john wi" is shown fine.
But search "john wic" is empty.

Could you check it?

Any reason for all the weird linebreaks in flexsearch.js?

I say weird, but I should rather say… unconventional.

Like:

while(i < length){

                        tmp = arr[i++];

                        const index = "@" + tmp;

                        if(check[index]){

Are they on purpose?

If so, what is their purpose?

If not, could using tools like Prettier (or Prettier + ESLint) help?

Typescript support is missing

Are there any plans to support typescript?

Are there any papers on Contextual-based Scoring?

I’d love to read more on this scoring strategy.

paper Contextual-based Scoring

I look for the paper that is cited in the README. I can not find on the web.

Any help?

function type stemmer is not supported

Only object type stemmer is supported

Distinct values and distinct count

Hello, is it possible to count distinct values of field or\and get distinct values for some fields? For example, when searching products in catalog, it's good to know distinct category id's of results

Contextual Search documentation is missing

The readme includes the line

Note: This feature is actually not enabled by default. Read here how to enable.

but the "here" link doesn't go to any page, and I can't find the intended target in the repo :-o

What are "depth" and "threshold"?

I don't know enough fulltext index terminology to infer what these two settings actually mean.

I'm guessing from context that "depth" is the maximum number of words/tokens away a term can be and still be considered relevant.

I have no idea what the "threshold" number implies. :-x

I know I want that sweet contextual searching, so I'd love to figure this out so I can pick numbers appropriate to my use case.

Not working with React Native

first thanks for great library, but its not working with react native :(

Contextual scoring doesn't seem to be working

When I set a depth, I would expect that if I search for multiple terms, documents that contain those terms near each other would score higher.

Example:

const FlexSearch = require(`flexsearch`)

const index = new FlexSearch({
	tokenize: `strict`,
	encode: `advanced`,
	cache: false,
	doc: {
		id: `id`,
		field: {
			content: {
				threshold: 9,
				resolution: 10,
				depth: 2,
			},
		},
	},
})

index.add([{
	id: 1,
	content: `billy who now what billy okay so what now thorton?`,
}, {
	id: 2,
	content: `billy bob thorton`,
}])

console.log(
	index.search(`billy thorton`)
)
// => [ { id: 1,
//    content: 'billy who now what billy okay so what now thorton?' },
//  { id: 2, content: 'billy bob thorton' } ]

I would expect document id 2 to be the top result, since it contains "billy" and "thorton" within two words of each other, but the top result is actually document id 1.

Tested in [email protected].

Unexpected exception when attempting to call Index.search method

I tried to use code example from unit test, but got the following error:

Code to reproduce:

const FlexSearch = require('flexsearch')

// tslint:disable

;(async () => {
  const index = new FlexSearch({
    async: true,
    doc: {
      id: 'id',
      field: [ 'data:name' ]
    }
  })

  const data = [{
    id: 2,
    data: {
      title: 'Title 3',
      body: 'Body 3'
    }
  }, {
    id: 1,
    data: {
      title: 'Title 2',
      body: 'Body 2'
    }
  }, {
    id: 0,
    data: {
      title: 'Title 1',
      body: 'Body 1'
    }
  }]

  await index.add(data)

  console.log(index.search)

  const result = await index.search({
    field: 'data:body',
    query: 'body'
  })

  console.dir(result)
})()

Output:

[Function]
(node:10016) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'search' of undefined
    at h.search (C:\Users\User\Documents\Projects\test\node_modules\flexsearch\dist\flexsearch.node.js:24:281)
    at C:\Users\User\Documents\Projects\test\index.js:38:30
    at process._tickCallback (internal/process/next_tick.js:43:7)
    at Function.Module.runMain (internal/modules/cjs/loader.js:778:11)
    at startup (internal/bootstrap/node.js:300:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:826:3)
(node:10016) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:10016) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Environment: Node
Node version: v11.2.0
Flexsearch version: "^0.5.2"

Remove Features: Where / Find / Tags

I thinking about to remove these features:

index.find() (get document by ID will remain)
index.where()
tag fields
where clause in custom search

The main reasons for this may:

they do not scale properly, just useful up to a medium size of document length
tags cannot be serialized, instead they need to recover from the original documents which slows down the import function
a custom helper function will replace this functionality and is also faster and also less redundant

What do you think about?

How does suggestion work?

I tried to activate the suggestion function but it does not change anything in the result. How does it work?

thanks.

How best to return unindexed data for each match (as well as the ID)?

For each item that matches a query, I'd like to be able to get unindexed arbitrary data — not just its ID.

For example: for matches when searching Shakespeare plays, I'd like to be able to return the text of an individual line (which is indexed) but also play name, location, speaker, etc.

What's the best way to achieve this?

I can do this in Elasticlunr (for example) like this:

const index = elasticlunr(function() {
    this.addField('text'); // doc property to be indexed
    this.setRef('id'); // doc property that is the ID of each item
    for (const doc of docs) {
      // doc includes additional arbitrary data for each item: play, speaker, location, etc.
      this.addDoc(doc); 
    }
}

Would I simply need to create an object that maps IDs with item data, or is there a better way to do this?

Great project by the way — thanks so much for building this.

Benchmark with algolia ?

Can someone do a benchmark between this library and Algolia?
I just want to know if I should drop algolia for a better copycat?
Thank you ;)

Does this module support CJK word splitting?

For example in Chinese, 一个单词 are two words. How to make sure I can get the correct result when searching 单词?

Results are not unique when matches in more than one field

I expected to get matching documents to be unique within result. What is the angle for repeating these?

Example:

const f = new FlexSearch({
	doc: {
		id: 'id',
		field: ['field1', 'field2']
	}
})

const docs = [
	{id: 1, field1: 'phrase', field2: 'phrase'}
]

f.add(docs)
console.log(f.search('phrase'))
// Result = [{id: 1, field1: "phrase", field2: "phrase"} 1: {id: 1, field1: "phrase", field2: "phrase"}]

Search results depend on the order of fields

NOTE: I've rewritten the entire issue because I've found a way to reproduce my issue on a very small dataset.

I've noticed that I'm missing search results depending on the order of fields that I provide when creating the index.

In the following example, there are two objects where notation:0 matches the search term WW 8840, and one object where prefLabel:de matches WW 8840. In the first example, only the latter object is returned as a search result even though all fields are supposed to be searched. The second example returns the correct search results just by reordering the fields (putting notation:0 to the end). Note that when specifying notation:0 as the only field to search, it will return the correct results in both cases.

Non-working example (prints 1 and 2 even though the first query should return 3 results):

const FlexSearch = require("flexsearch")

let index = new FlexSearch({
  doc: {
    id: "uri",
    field: [
      "prefLabel:de",
      "notation",
      "editorialNote:de",
    ]
  },
  profile: "score"
})

// Example dataset
let concepts = [
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208720%20-%20WW%209239"}],"created":"2012-07-05","editorialNote":{"de":"(Blutgruppen s. XD 3200)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4130604-1"},{"uri":"http://d-nb.info/gnd/4022814-9"},{"uri":"http://d-nb.info/gnd/4070945-0"},{"uri":"http://d-nb.info/gnd/4074195-3"}],"identifier":["152145:13422"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840 - WW 8879","prefLabel":{"de":"Blutkörperchen (Erythrozyt, Leukozyt), Hämoglobin"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WD%205000%20-%20WD%205970"}],"created":"2012-07-05","editorialNote":{"de":"(Antibiotika s. XI 3500)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4155845-5"},{"uri":"http://d-nb.info/gnd/4276935-8"},{"uri":"http://d-nb.info/gnd/4176522-9"},{"uri":"http://d-nb.info/gnd/4175383-5"},{"uri":"http://d-nb.info/gnd/4148701-1"}],"identifier":["148204:"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WD 5380","prefLabel":{"de":"Pyrrolfarbstoffe, Cytochrome, Chromoproteine (Hämoglobin s. WW 8840)"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WD%205380"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"}],"created":"2012-07-05","editorialNote":{},"identifier":["152145:13423"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840","prefLabel":{"de":"Allgemeines"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840"}
]

index.add(concepts)

let results
results = index.search("WW 8840")
console.log(results.length) // only matches the second concept (which mentions "WW 8840" in label)

results = index.search("WW 8840", {
  field: "notation"
})
console.log(results.length) // correctly matches two concepts
// with large dataset, also correctly matches the two concepts

Working example (prints 3 and 2 as expected, just by reordering fields):

const FlexSearch = require("flexsearch")

let index = new FlexSearch({
  doc: {
    id: "uri",
    field: [
      "prefLabel:de",
      "editorialNote:de",
      "notation",
    ]
  },
  profile: "score"
})

// Example dataset
let concepts = [
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208720%20-%20WW%209239"}],"created":"2012-07-05","editorialNote":{"de":"(Blutgruppen s. XD 3200)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4130604-1"},{"uri":"http://d-nb.info/gnd/4022814-9"},{"uri":"http://d-nb.info/gnd/4070945-0"},{"uri":"http://d-nb.info/gnd/4074195-3"}],"identifier":["152145:13422"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840 - WW 8879","prefLabel":{"de":"Blutkörperchen (Erythrozyt, Leukozyt), Hämoglobin"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WD%205000%20-%20WD%205970"}],"created":"2012-07-05","editorialNote":{"de":"(Antibiotika s. XI 3500)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4155845-5"},{"uri":"http://d-nb.info/gnd/4276935-8"},{"uri":"http://d-nb.info/gnd/4176522-9"},{"uri":"http://d-nb.info/gnd/4175383-5"},{"uri":"http://d-nb.info/gnd/4148701-1"}],"identifier":["148204:"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WD 5380","prefLabel":{"de":"Pyrrolfarbstoffe, Cytochrome, Chromoproteine (Hämoglobin s. WW 8840)"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WD%205380"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"}],"created":"2012-07-05","editorialNote":{},"identifier":["152145:13423"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840","prefLabel":{"de":"Allgemeines"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840"}
]

index.add(concepts)

let results
results = index.search("WW 8840")
console.log(results.length) // only matches the second concept (which mentions "WW 8840" in label)

results = index.search("WW 8840", {
  field: "notation"
})
console.log(results.length) // correctly matches two concepts
// with large dataset, also correctly matches the two concepts

Any idea why this is happening? Thanks!

Multiple documents update by query?

Hello, first of all, thanks for creating new nice search engine. We are looking to use it instead of elasticsearch, which is very complex and have lots of legacy in it’s DSL and difficulties to get desired results. Currently we are interested if there’s any plans to implement multiple documents update by single query? It’s necessary, for example, to disable some of products when it’s category is disabled.

Also, to avoid creating another ticket, I would like know if it is possible to boost search result based on numeric value stored in search index itself.

Thanks in advance.

How to create an index for a book

Hey
First thanks for the amazing library!

I would like to know if you can index a number and get the subject name, sub-topic, and paragraph number.
And whether it is possible to find two paragraphs together
For example

book:
[
    {
        "topic": "topic",
        "content": [
            {
                "title":
                "parts": [
                    "word1, word2, word3, word4, word5",
                    "word6, word7, word8, word9, word10",
                ]
            }
        ]
    }
]

index.search("word2 word3") // = [{topic: "topic1", title: "title1", part: 0}]
index.search("word5 word6") // = [{topic: "topic1", title: "title1", part: 0}, {topic: "topic1", title: "title1", part: 0}]
``` 

Thanks

Question on 0.7.0/Field boosting

I see the documentation on indexing different fields in a document has been fleshed out, which is great, I was wondering how that would work.

The readme claims that field searching is a thing in 0.7.0, but the changelog only goes up to 0.6.0 and the version on npm is 0.6.2 – what's the deal there?

Besides wondering where I could find 0.7.0 I have one question: how does boosting work?

I have a document with a title, and a body. I want matches in the title to count towards the score 10x more than matches in the body.

Could I achieve that by setting the boost on the title field to 10, and the boost on the body field to 1? Is that how boost works, or have I misguessed? What is the default boost for a field?

Settings get overriden

We use flexsearch in a react app. Performs pretty well, thanks!
We store the flexsearch settings in a constant outside of a component. We also store documents and not key values pairs.
The first initialization of the component works perfect. All following behave wrong. The doc property is null. I guess flexsearch accesses the object by reference and somehow replaces the doc property.

Is this behavior expected?

nextapps-de / flexsearch Goto Github PK

flexsearch's Issues

1. required / optional / prohibited

2. and / or / not

3. + / -

Problem

Details

1. Extract Core Functionality

2. Plugin API

3. Prerequisites

4. Language Port

Setup

Working

Not working

Comments

Recommend Projects

Recommend Topics

Recommend Org