Comments (4)
The problem is that for user-issued keyword search you want the field to be tokenized into bits and normalised quite a bit. That makes it hard (read: I don't know how) to ensure that your query is a full match. If we have document A "foo/bar.zoo?query=Hamburger+City", a user-query oriented indexing would result in ["foo", "bar", "zoo", "query", "hamburger", "city"] or something like that. When you make a machine-oriented search (click on a facet or such) for "foo/bar.zoo?query=Hamburger", it will also match document A, as Solr does not care about the extra token "city" in the document when matching.
The easy answer is to increase the index size and make a parallel version of the field where the indexing is verbatim. StrField
works well for that and has the added bonus that docValues works for that field. Do note that this really means verbatim searches: "foo/bar.zoo?query=hamburger+city" will not match "foo/bar.zoo?query=Hamburger+City". That is normally not a problem if the field is intended for machine-queries.
This method is used in my latest pull-request to webarchive-discovery-code for the fields url_norm and url_search, where one field is an indexed & docValues StrField
and the other is a heavily tokenized & normalised TextField
.
from warclight.
URL as exact and rest as fuzzy makes sense to me!
from warclight.
@tokee what's the preferred method to do an exact match? Something like Arclight has setup here? Or, something else?
from warclight.
Resolved with 3e034d2
from warclight.
Related Issues (20)
- Dependencies problem: warclight was resolved to 0.8.3, which depends on rails (~> 5.0) HOT 1
- Change master branch to main branch HOT 1
- Migrate CI infrastructure from TravisCI to GitHub Action
- Timemaps as a service?
- Source File value should link to a search
- Refactor replay_link (Assignment Branch Condition size)
- replay_link -- Rails/TimeZone: Do not use Time.parse.strftime without zone.
- return_five helper method needs a conditional
- return_five should link to searches
- Create helper method for thumbnail URL HOT 2
- Handle no results for replay_link query
- Does not install with template file HOT 2
- Usability of Facet Links HOT 3
- Update conf to handle large links_hosts
- Change field searches to use fields instead of handlers HOT 2
- Use the default webarchive-discovery Solr 7 config
- Remove 'content' as a field search
- Discussion: TimeTravel API is unreliable and should not be used HOT 6
- Performance Cops will be removed from RuboCop 0.68. Use rubocop-performance gem instead.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from warclight.