Comments (3)
@h0lg If no field is specified, then the currently the default index tokenizer is used to parse and normalize the search text - it's only if a specific field is being searched on, LIFTI uses the index tokenizer that was configured for that.
In that respect, you're right in that searching across all fields will be a problem if different tokenization has been used for them, and that's exactly the same as the problem that needs to be solved here.
I'd need to spend a bit more time thinking about this than I have right now, but I'm wondering if when searching for text across multiple fields:
- All affected fields are collected (all fields, or a subset when a wildcarded field name is specified)
- Each unique tokenizer is used to parse the search text.
- The distinct search terms yielded from the tokenizers are combined with a field filter operator with the appropriate field ids. (A search term in this context could be any number number of tokens if a bracketed statement is encountered)
Edge cases to consider:
- When searching across all fields, if all tokenizers are the same or all unique tokenizers produce the same search terms, then no field filters need to be applied.
I think this will require quite a bit of rework in the query parser logic, but it's certainly not impossible...
from lifti.
I understand that in your example it is unclear which tokenizer to apply to the search text if the index itself uses a different tokenizer than the field(s) being searched. I never thought about this configuration and don't have an answer.
But how does lifti decide which tokenizer to use for the search text when searching across all fields with different configured tokenizers? Isn't that a similar question? O am I missing some important difference?
from lifti.
I see, thanks for the clarification and sharing your thoughts.
Explaining the intricacies of the tokenization during the field search process and what happens in which case seems daunting to me. Maybe we're thinking about it too complicated? You could go with some rule that's easy to communicate and doesn't require you to explain the underlying mechanics - even if it has limitations. e.g.
If you search the same term/query across multiple fields (using wild cards or pipes or whatever), you can only do so if they share the same tokenizer. Otherwise you have write separate field queries.
Would that make things easier?
from lifti.
Related Issues (20)
- Item not getting indexed HOT 5
- Synonyms and related items HOT 8
- Generating search result phrases from match locations HOT 3
- Wildcard HOT 2
- Remove all punctuation from index and query HOT 6
- Outdated NuGet dependencies HOT 8
- Wild card matching in exact sequence seems broken in v4 HOT 4
- Indexing text from nested objects HOT 10
- How to search for combined words? HOT 4
- Performing search CPU spikes HOT 6
- Dynamic fields (was: DictionaryTokenization) HOT 21
- V5 checklist
- Score boosting HOT 5
- Add support for stop words
- Provide a method to calculate the size of the index in memory
- Query syntax: Add support for spaces in field names HOT 6
- Remove dependency on System.Collections.Immutable HOT 2
- Suggestion: custom stemmers HOT 2
- Search for words with a `=` character HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lifti.