ipni / go-indexer-core Goto Github PK
View Code? Open in Web Editor NEWCore go datastructure of a cid index
License: Other
Core go datastructure of a cid index
License: Other
Add COBR value codec and benchmarks that compare it with the existing JSON and custom binary format. Set CBOR as default if its performance is comparable with the custom binary format in favour of using more common data formats.
With a 179G value store on disk it is taking 12 minutes to initialize storethehash
2022-03-17T18:36:40.832Z INFO indexer command/daemon.go:95 Valuestore initializing/opening {"type": "sth", "path": "/data/valuestore-sth"}
2022-03-17T18:48:15.657Z INFO indexer command/daemon.go:100 Valuestore initialized
Since only the multihash portion of a CID is used to index content, this should be made clear by replacing CID with multihash in all function signatures.
Currently during a put of a batch of multihashes, the core reads the current value of each hash sequentially as part of updating those values.
To support large batches from a single provider, we should support sharding these reads cross multiple threads so that we can parallelize waiting on the open/read syscalls for these reads.
To create the indexer core engine takes a valueStore interface to use for storing unencrypted index data. Options can also be specified to give a dhstore to use for storing encrypted index data. I do not think the same core should not be doing both. Instead, when creating a core engine instance it should only be given a valueStore interface. That interface can either be an unencrypted or an encrypted implementation.
If we really want to support both, then the indexer should use multiple cores, but there should be no reason since we do not need to look up unencrypted results to then store them in an encrypted DB. We only needed that during our transition from one to the other.
After upgrading x/sys
dependency panic still occurs on put when values retrieved from cache are checked for equality.
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f7f772e6999 pc=0x410af1]
goroutine 275 [running]:
runtime.throw({0x15d88b4?, 0xc00536a000?})
/usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc00903e898 sp=0xc00903e868 pc=0x4473b1
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:825 +0x305 fp=0xc00903e8e8 sp=0xc00903e898 pc=0x45d785
memeqbody()
/usr/local/go/src/internal/bytealg/equal_amd64.s:108 +0xd1 fp=0xc00903e8f0 sp=0xc00903e8e8 pc=0x410af1
bytes.Equal(...)
/usr/local/go/src/bytes/bytes.go:20
github.com/filecoin-project/go-indexer-core.Value.Match(...)
/go/pkg/mod/github.com/filecoin-project/[email protected]/value.go:58
github.com/filecoin-project/go-indexer-core.Value.Equal(...)
/go/pkg/mod/github.com/filecoin-project/[email protected]/value.go:63
github.com/filecoin-project/go-indexer-core/engine.(*Engine).Put(0xc002dc0b00, {{0xc0103faed0, 0x22}, {0xc013080d40, 0x3d, 0x3f}, {0xc002fe2db8, 0x2, 0x2}}, {0xc00a1f4000, ...})
/go/pkg/mod/github.com/filecoin-project/[email protected]/engine/engine.go:93 +0x645 fp=0xc00903eae8 sp=0xc00903e8f0 pc=0xa28945
github.com/filecoin-project/storetheindex/internal/ingest.(*Ingester).indexAdMultihashes(0xc00052c3c0, {{0x1a54eb0, 0xc0009c1060}, {0xc013080f00, 0x3b}, {0xc0009c1240, 0x1, 0x1}, {0xc009343400, 0x272, ...}, ...}, ...)
/storetheindex/internal/ingest/linksystem.go:534 +0x705 fp=0xc00903ed18 sp=0xc00903eae8 pc=0xde7765
github.com/filecoin-project/storetheindex/internal/ingest.(*Ingester).ingestEntryChunk(0xc00052c3c0, {0x1a55828?, 0xc0084d8660?}, {{0x1a54eb0, 0xc0009c1060}, {0xc013080f00, 0x3b}, {0xc0009c1240, 0x1, 0x1}, ...}, ...)
/storetheindex/internal/ingest/linksystem.go:475 +0x13e fp=0xc00903ee50 sp=0xc00903ed18 pc=0xde6d7e
currently a bitsize option is allowed, but if the index is subsequently opened with a different bitsize set, bad things will happen.
the used bitsize should be persisted somewhere with a store the hash datastore.
Per ipni/index-provider#15, we'll need to update indexer.Value
to include a contextual ID (like a dealID
) to identify updates over metadata for the same CID.
This will also mean slightly changing how the indexer-core behaves on Put
. A Put
for the same dealID
shouldn't append a new entry but update the existing one.
// Value is the value of an index entry that is stored for each CID in the indexer.
type Value struct {
// Contextual ID used to identify different entries.
DealID cid.Cid
// PrividerID is the peer ID of the provider of the CID
ProviderID peer.ID
// Metadata is serialized data that provides information about retrieving
// data, for the indexed CID, from the identified provider.
Metadata []byte
}
// cc @gammazero
If we need to remove all data for a provider in the persistence layer there's no implementation for it yet.
It may require an offline process that iterates through memory checking all CIDs that have metadata for that provider. Maybe we can also design a new index that keeps track of providers and CIDs to optimize this process (it may add a significant storage overhead, we'll need to benchmark it)
Add metrics to report:
Other teams are starting to use this code, I think we should start considering populating the README with a brief description of the project (bearing in mind that our interfaces may still be subject to minor changes).
After advertisements are ingested by an indexer instance backed by pebble, not all records are found when multihashes are lookup via the finder API.
Considering the efficiency gain of using binary value codec, make it the default choice and "do the right thing" in case the value codec is JSON.
The idea is to automatically detect the encoding and migrate values on the fly to binary format. That means we will avoid running a lengthy migration and would opportunistically move values while taking advantage of binary efficiency for any newly stored values.
@ischasny just pushed a release tag: v0.7.1.
Please manually verify validity (using `gorelease`), and update `version.json` to reflect the manually released version, if necessary.
In the future, please use the automated process.
I think we should define a dedicated type for metadata in go-indexer-core that has receivers for marshalling/unmarshalling. This is a type used in both indexer node and provider and seems like it belongs to core.
Functionality is already implemented as part of indexer.Value
It just needs to be refactored into its own type.
It would also make it easier to write go-doc for metadata and spell out its dependency to transport protocol.
@gammazero what are your thoughts? I am happy to pick this up if the suggestion makes sense.
Originally posted by @masih in ipni/index-provider#48 (comment)
@adlrocha confirmed offline that both the backends support iteration. We should expose and implement an iteration API so users can use it for introspection/debugging.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.