Comments (4)
OK, the NPE in sort, I did some manual debugging via good ole System.out.println
. This only happens in the assertion if the cache check is greater than 1, which does seem to happen with intra-merge concurrency.
However, if I undo all the concurrency and printout the assertion check lines (that pass in without concurrency), I still see that sort: null
. This tells me that this assertion has always had an issue with sort == null
and not double checking it.
from lucene.
I think I know the issue with the parallel merging. This only happens when we use a SortingCodecReader.
The key issue is here: 17c285d
This adjusted the caching as to where:
private <T> T getOrCreateNorms(String field, IOSupplier<T> supplier) throws IOException {
return getOrCreate(field, true, supplier);
}
private synchronized <T> T getOrCreate(String field, boolean norms, IOSupplier<T> supplier)
throws IOException {
if ((field.equals(cachedField) && cacheIsNorms == norms) == false) {
assert assertCreatedOnlyOnce(field, norms);
cachedObject = supplier.get();
cachedField = field;
cacheIsNorms = norms;
}
assert cachedObject != null;
return (T) cachedObject;
}
private <T> T getOrCreateDV(String field, IOSupplier<T> supplier) throws IOException {
return getOrCreate(field, false, supplier);
}
This will cause a weird race condition as when merging norms will call getOrCreateNorms
and in parallel, we could be calling getOrCreateDV
, either of which will overwrite the other, and then potentially double cache.
Parallel merging breaks these assumptions and could cause issues.
@iverase @jpountz I propose we remove intra-merging parallelism from norms, terms, and doc values and do a 9.11.1 release.
We can incrementally add those back in the future.
from lucene.
Parallel merging breaks these assumptions and could cause issues.
Well, the assumptions are that its only accessed once. But now in parallel merging, it could be re-cached any number of times as the norms fields vs. the dv fields keep kicking eachother out of cache.
Seems like a bad idea regardless and I would like us to spend time making sure this is OK.
The proposal for a 9.11.1 is out of caution here.
from lucene.
Disabling concurrent merging for terms, norms and doc values until we figure out how to make it compatible with SortingCodecReader sounds good to me.
from lucene.
Related Issues (20)
- Explore bypassing HNSW graph building for tiny segments HOT 1
- The Closeable interface of CloseableThreadLocal<T> seems redundent
- org.apache.lucene.index.IndexFormatTooNewException on arm64 HOT 1
- CorruptIndexException: docs out of order in merge thread HOT 1
- Update WrapperDownloader to accept java 22 and correct deprecated new URL API
- Should FieldInfo#FieldNumbers hold one map with index properties instead of a map for each property?
- Backport gh workflow cleanups to branch_9x
- Bug in MultiLeafKnnCollector causes #minCompetitiveSimilarity to stay artificially low in some situations HOT 3
- Expose flat vectors in "user space" HOT 7
- Change behavior for finding segmentMerges from contains to finding IDs? HOT 1
- Scalar quantization extreme edge case of uniform vector values
- English grammar error in `Field that stores a per-document <code>long</code> values for scoring` HOT 3
- NullPointerException in StringValueFacetCounts when using MultiCollectorManager HOT 2
- Remove internal uses of @Deprecated methods from TopScoreDocCollector and TopFieldCollector
- Lucene's facets should tap into `IndexSearcher`'s `TaskExecutor` too? HOT 2
- Adding bit vector support HOT 1
- RollingCharBuffer does not check all failed values.
- Add to the website homepage references for vector-based and hybrid search
- Examine adding more off-heap vector scoring HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lucene.