We want to provide a way to delete data based on time series selectors as well as time

Closed via <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Implement deletions,about prometheus-junkyard/tsdb

Comments (7)

gouthamve commented on June 18, 2024

If we are going to do retention policies it via tombstones, we need some-kind of a rolling info in the tombstone. Something like:

seriesA 1200-1600hr (timespan) Deleted
seriesB anything older than 10days Retention

Now if we are going to specify retention via a config file, we might need to change every-block on reload. We also need to compact blocks that have reached full-sizes periodically.

This is assuming we are not allowed to view metrics immediately after they expire, if that is not the case, then only the compactor has the retention-information and the metrics stay around until compaction.

from tsdb.

gouthamve commented on June 18, 2024

For implementing deletions we need to support time-range deletions too. One chunk could have multiple deleted and valid-ranges.
My plan to implement this:

Store a deleted postings list with [{mint, maxt}] (deleted-ranges) in the index.
We embed a new field, deletedRanges: [{mint, maxt}] in the ChunkMeta.
When looking up the ChunkMeta, the index populates the meta with the field.
The iterator over ChunkMeta simply skips the time-ranges.

One optimization could be we store the fully deleted postings and expose a Deleted() Postings on IndexReader that the QE can intersect with to remove the fully deleted series. But how to detect which series has been fully-deleted on a persisted-index is to be determined.

Haven't looked into the compaction code-base yet, but it should be straight-forward there.

Does this sound okay?

from tsdb.

gouthamve commented on June 18, 2024

So I have been thinking a little more about this. Do we want to support deletion of time-ranges? Or deletion beyond a time, i.e, "Delete all metrics in the time-series older than t0".

If this is the scenario, it makes it a wee-bit easier to implement this but the approach will be the one mentioned above.

from tsdb.

gouthamve commented on June 18, 2024

We were thinking of using tombstones on headBlocks also. But influx has a similar storage scenario and they are removing the entries that are in-memory when doing deletes.

Ref: Last paragraph under https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/DESIGN.md#data-flow

Haven't dug into the code yet, but I think they are removing the data rather than removing the entries in the index. If we can drop chunks and data from memSeries, then not doing tombstones for in-memory data might be better.

But all of this needs to be benchmarked and validated.

from tsdb.

fabxc commented on June 18, 2024

Edit: actually had this tab open for way too long and didn't see your updates from yesterday when writing this. If we can drop data immediately, sure, that's great. But likely more expensive and error-prone than just using tombstones (postings lists are not update-friendly, neither are our compressed chunks). The data will be gone once it's compacted anyway. Deletes are rare.

Let's focus on deletions for now (as would be done by the user).
I think retention policies could simply be implemented on top by doing a delete+compact cycle in the foreground. They also don't have to be strict. Just running this cycle every 1h would be fine IMO.

General (non-)requirements around deletions:

they should be visible as soon as the deletion call returns
expected to be very rare and in bulk (compared to append writes)
okay to be slow

Regarding your proposal:
We generally expect deletes to happen against persisted and in-memory blocks equally. The in-memory block could be updated by the index entries you describe, the persisted blocks cannot.

This would be equivalent to a re-compaction of the index as its not feasible to start randomly adding and moving bytes in the existing index file.
So by this approach a deletion request would have to synchronously write a new index file. Now this would be perfectly okay. Deletions may be slow, are rare, and in bulk.
If it was a bulk delete though, this would not reduce the size much because the sample data is still around. The next full compaction of the block would then care about removing the actual samples by dropping full chunks or dropping some samples from them. The latter could also care about rewriting chunks so we don't end up with 2 samples chunks – that's probably just an optimisation for later though.

What this does do overall is introduce a fair bit of noise into our index format for things that should only be ephemeral until the next full compaction.
An alternative here is having tombstones separately tracked in a WAL, to which we append deletes. All tombstones are additionally in an in-memory data structure. The elements in there are then considered, as you described, when going over iterators.

At the next compaction the tombstones are fully applied by removing the data from the in the newly compacted block. Doing some quick math for in-memory size of the tombstone tracking:

1e6 series x 24 bytes (series ID, start, end) = 24 MiB

That seems pretty reasonable to track 1 million deletes (but would multiply by affected blocks of course). That should be stored in some form of postings list as you described to be considered when querying.
What I did not quite understand was your need for having the entries in the postings list AND in the ChunkMeta directly.

Some care must be taken to correctly rebuild the sorted delete-postings list if there are several deletes done after another.

Overall this would not add complexity to the core files. Memory footprint is largely negligible. It is a fair bit faster and does not require a full rewrite if we are just deleting 2KB worth of samples from a 5GB block.
We could dynamically decide whether the amount of tombstones justifies a full re-compaction or whether we just keep the few tombstones around.

Generally, I think I'd lean towards making it part of the IndexReader as you said. This means persisted blocks also get an IndexWriter to perform deletions and their indexReader implementation gets additional in-memory structures.
The logging of tombstones could just be added as another entry type to the existing WAL. This means that persisted blocks also get a WAL now... things are getting more complex, but there's no real way around that :)

from tsdb.

gouthamve commented on June 18, 2024

Okay, so we are going to support deleting time-ranges and not delete older than t0.

What I did not quite understand was your need for having the entries in the postings list AND in the ChunkMeta directly.

We won't be modifying the ChunkMeta directly. When looking up the ChunkMeta, we would populate the deleted ranges. This is because a chunk can have partially-deleted data and the iterator on it needs to know that info.

+1 for a separate tombstone file, it is cleaner than modifying the index. When we load the index we load the info from this file too. Though, for persisted blocks, it could be just tombstones instead of inside a WAL.
And for in-mem blocks, it would be just added to the WAL.
But that is again an implementation detail and can be decided upon later.

from tsdb.

gouthamve commented on June 18, 2024

Closed via #82

from tsdb.

Implement deletions about tsdb HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent