Comments (7)
If we are going to do retention policies it via tombstones, we need some-kind of a rolling info in the tombstone. Something like:
- seriesA 1200-1600hr (timespan) Deleted
- seriesB anything older than 10days Retention
Now if we are going to specify retention via a config file, we might need to change every-block on reload. We also need to compact blocks that have reached full-sizes periodically.
This is assuming we are not allowed to view metrics immediately after they expire, if that is not the case, then only the compactor has the retention-information and the metrics stay around until compaction.
from tsdb.
For implementing deletions we need to support time-range deletions too. One chunk could have multiple deleted and valid-ranges.
My plan to implement this:
- Store a deleted postings list with
[{mint, maxt}]
(deleted-ranges) in the index. - We embed a new field,
deletedRanges: [{mint, maxt}]
in the ChunkMeta. - When looking up the
ChunkMeta
, the index populates the meta with the field. - The iterator over
ChunkMeta
simply skips the time-ranges.
One optimization could be we store the fully deleted postings and expose a Deleted() Postings
on IndexReader that the QE can intersect with to remove the fully deleted series. But how to detect which series has been fully-deleted on a persisted-index is to be determined.
Haven't looked into the compaction code-base yet, but it should be straight-forward there.
Does this sound okay?
from tsdb.
So I have been thinking a little more about this. Do we want to support deletion of time-ranges? Or deletion beyond a time, i.e, "Delete all metrics in the time-series older than t0
".
If this is the scenario, it makes it a wee-bit easier to implement this but the approach will be the one mentioned above.
from tsdb.
We were thinking of using tombstones on headBlock
s also. But influx has a similar storage scenario and they are removing the entries that are in-memory when doing deletes.
Ref: Last paragraph under https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/DESIGN.md#data-flow
Haven't dug into the code yet, but I think they are removing the data rather than removing the entries in the index. If we can drop chunks and data from memSeries
, then not doing tombstones for in-memory data might be better.
But all of this needs to be benchmarked and validated.
from tsdb.
Edit: actually had this tab open for way too long and didn't see your updates from yesterday when writing this. If we can drop data immediately, sure, that's great. But likely more expensive and error-prone than just using tombstones (postings lists are not update-friendly, neither are our compressed chunks). The data will be gone once it's compacted anyway. Deletes are rare.
Let's focus on deletions for now (as would be done by the user).
I think retention policies could simply be implemented on top by doing a delete+compact cycle in the foreground. They also don't have to be strict. Just running this cycle every 1h would be fine IMO.
General (non-)requirements around deletions:
- they should be visible as soon as the deletion call returns
- expected to be very rare and in bulk (compared to append writes)
- okay to be slow
Regarding your proposal:
We generally expect deletes to happen against persisted and in-memory blocks equally. The in-memory block could be updated by the index entries you describe, the persisted blocks cannot.
This would be equivalent to a re-compaction of the index as its not feasible to start randomly adding and moving bytes in the existing index file.
So by this approach a deletion request would have to synchronously write a new index file. Now this would be perfectly okay. Deletions may be slow, are rare, and in bulk.
If it was a bulk delete though, this would not reduce the size much because the sample data is still around. The next full compaction of the block would then care about removing the actual samples by dropping full chunks or dropping some samples from them. The latter could also care about rewriting chunks so we don't end up with 2 samples chunks – that's probably just an optimisation for later though.
What this does do overall is introduce a fair bit of noise into our index format for things that should only be ephemeral until the next full compaction.
An alternative here is having tombstones separately tracked in a WAL, to which we append deletes. All tombstones are additionally in an in-memory data structure. The elements in there are then considered, as you described, when going over iterators.
At the next compaction the tombstones are fully applied by removing the data from the in the newly compacted block. Doing some quick math for in-memory size of the tombstone tracking:
1e6 series x 24 bytes (series ID, start, end) = 24 MiB
That seems pretty reasonable to track 1 million deletes (but would multiply by affected blocks of course). That should be stored in some form of postings list as you described to be considered when querying.
What I did not quite understand was your need for having the entries in the postings list AND in the ChunkMeta directly.
Some care must be taken to correctly rebuild the sorted delete-postings list if there are several deletes done after another.
Overall this would not add complexity to the core files. Memory footprint is largely negligible. It is a fair bit faster and does not require a full rewrite if we are just deleting 2KB worth of samples from a 5GB block.
We could dynamically decide whether the amount of tombstones justifies a full re-compaction or whether we just keep the few tombstones around.
Generally, I think I'd lean towards making it part of the IndexReader as you said. This means persisted blocks also get an IndexWriter to perform deletions and their indexReader implementation gets additional in-memory structures.
The logging of tombstones could just be added as another entry type to the existing WAL. This means that persisted blocks also get a WAL now... things are getting more complex, but there's no real way around that :)
from tsdb.
Okay, so we are going to support deleting time-ranges and not delete older than t0
.
What I did not quite understand was your need for having the entries in the postings list AND in the ChunkMeta directly.
We won't be modifying the ChunkMeta directly. When looking up the ChunkMeta, we would populate the deleted ranges. This is because a chunk can have partially-deleted data and the iterator on it needs to know that info.
+1 for a separate tombstone file, it is cleaner than modifying the index. When we load the index we load the info from this file too. Though, for persisted blocks, it could be just tombstones
instead of inside a WAL.
And for in-mem blocks, it would be just added to the WAL.
But that is again an implementation detail and can be decided upon later.
from tsdb.
Closed via #82
from tsdb.
Related Issues (20)
- Memory issues on short block sizes HOT 23
- Flaky test with the new live reader HOT 3
- TSDB CLI tool doesn't work on live Prometheus HOT 1
- Snapshots with head block are unsafe HOT 7
- Support selectors for LabelNames and LabelValues HOT 2
- double metric HOT 4
- selectOverlappingDirs bug that misses some directories HOT 1
- Why do Max and Minn have opposite definitions? HOT 1
- Improve renameFile function HOT 5
- Cleanup temp file if rename/replace failed HOT 1
- "unknown series references" after clean shutdown HOT 47
- Log storage filesystem type at startup HOT 4
- Investigate WAL compression HOT 1
- Prometheus restart very slowly when the size of data folder is huge HOT 18
- prometheus_tsdb_compactions_failed_total metric not incremented when compaction fails HOT 1
- Unstable: Not all data returned after write HOT 2
- How best to write blocks when bulk importing? HOT 6
- Cannot run more than one wal.LiveReader with metrics
- Proposal to compute actual size of files on filesystems with compression HOT 4
- docs outdated on Label offset Table? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsdb.