If an AID contributes with NULL , do we include that A

Should a NULL AID value contribute to the seed? <p dir=

Do we include NULL contributors in AID seed? about pg_diffix HOT 12 CLOSED

diffix commented on July 28, 2024

Do we include NULL contributors in AID seed?

from pg_diffix.

Comments (12)

cristianberneanu commented on July 28, 2024

After thinking about it a bit more, I think they should be included in the seed (but they should be ignored during aggregation).

NULL values were ignored during aggregation in the previous system, are ignored in the reference implementation and are also ignored in standard SQL aggregators.

But the seed is a property of the bucket, so any encountered AIDs have to contribute to it. If we find a way to compute the seed while we digest data, then we can drop NULL handling from the aggregators, simplifying stuff a bit more.

from pg_diffix.

sebastian commented on July 28, 2024

On the assumption that it's the value contributed by the AID that's null, and not the AID itself, then I think it should be:

In count(*) then yes. The null value is counted and hence the AID should also be used in the seed material
For aggregates where null-values are ignored the AID does not contribute to the seed for such a value (such as count(price) for a price of null)

from pg_diffix.

sebastian commented on July 28, 2024

But the seed is a property of the bucket, so any encountered AIDs have to contribute to it.

Why should any encountered AID contribute to the seed, if that AID doesn't otherwise contribute to the bucket?
I think exactly the "The seed is a property of the bucket" statement rings true, but I would read that as an argument for excluding the AID from the seed in the case where it didn't contribute to the aggregate.

from pg_diffix.

edongashi commented on July 28, 2024

Why should any encountered AID contribute to the seed, if that AID doesn't otherwise contribute to the bucket?

diffix_lcf(aid) has no knowledge of what's happening in aggregates.

Hmm, let's consider this scenario:

There are 100 rows of shape (aid, col) in a bucket where AIDs are unique in interval [1...100].
Let's suppose 99 rows have col set to NULL and we want to calculate count(col).
Should we suppress this bucket? diffix_lcf(aid) says no because it has no idea that col is NULL...

from pg_diffix.

cristianberneanu commented on July 28, 2024

As much as I would like to use this opportunity to get rid of some code, I still think it should affect the seed.
Even if it has no effect on the aggregate, it still helps the bucket pass LCF.

from pg_diffix.

sebastian commented on July 28, 2024

There are 100 rows of shape (aid, col) in a bucket where AIDs are unique in interval [1...100].
Let's suppose 99 rows have col set to NULL and we want to calculate count(col).
Should we suppress this bucket? diffix_lcf(aid) says no because it has no idea that col is NULL...

It passes the low count filter (and all AIDs contribute to the seed for the low count filter), but we don't produce an aggregate, because we have insufficient data for the aggregate.
We can still output some value such as null, <insufficient data for an aggregate count> (assuming the query was SELECT col, count(...)).

from pg_diffix.

cristianberneanu commented on July 28, 2024

The main question is do we need to have the same seed for all aggregators (including LCF) or not.
Previously, the seed was computed separately and per bucket. We don't have this option now, so maybe we don't have to keep the same design, unless it causes a vulnerability.

from pg_diffix.

yoid2000 commented on July 28, 2024

There are a number of different things being discussed here, so I'm a bit confused. The set of questions seem to be:

Should a NULL AID value contribute to the seed?
Should a NULL AID value be counted as a distinct user when counting the number of distinct users?
Do we need to have the same seed for all aggregators (including LCF) or not?

Regarding 1, what would cause an AID to be NULL?

Regarding 2, can we avoid this question by always knowing what the actual AID is?

Regarding 3, this question doesn't arise for Publish AFAIK, and I think it is premature to ask it for the other variants.

In fact, none of this really matters for Publish...

from pg_diffix.

edongashi commented on July 28, 2024

Should a NULL AID value contribute to the seed?

The question is not for NULL AID but for NULL contribution coming from a (non-null) AID.

In the example below, which AIDs will be used for the AID noise seed for count(col): 1,2,4 or 1,2,3,4?

aid	col
1	'a'
2	'b'
3	NULL
4	'c'

from pg_diffix.

yoid2000 commented on July 28, 2024

Ah, I thought you meant the AID itself was NULL, not the aggregate.

My intuition is that the AIDs with NULL contribution to the aggregator should be included in the seed as well as the LCF computation.

And I can't think of an attack that would exploit this.

And I presume it is simpler to just include the AID in all cases (no special cases to deal with NULL).

So let's go with including all AIDs regardless of contribution to the aggregator.

from pg_diffix.

cristianberneanu commented on July 28, 2024

And I presume it is simpler to just include the AID in all cases (no special cases to deal with NULL).

Actually, it is easier to exclude NULL contributions (simpler to ignore stuff sooner, rather than later).
But code is already written to include them in the extension, so maybe simpler to leave it so? Then we need to update the reference implementation to do the same.

from pg_diffix.

cristianberneanu commented on July 28, 2024

Closing this as NULL contributions are already included in the seed.

from pg_diffix.

Do we include NULL contributors in AID seed? about pg_diffix HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent