Giter VIP home page Giter VIP logo

Comments (15)

NevilleS avatar NevilleS commented on May 29, 2024 1

I like that suggestion, to add user.account.address and user.contact.address as subcategories, so we have those for city/street/etc.

from fideslang.

brentonmallen1 avatar brentonmallen1 commented on May 29, 2024 1

I forgot there was already a user.location

from fideslang.

SteveDMurphy avatar SteveDMurphy commented on May 29, 2024

@mfbrown @NevilleS - @brentonmallen1 suggested adding another layer for address prior to the more detailed values (i.e. street

So instead of user.account.street we would have something like user.account.address.street

Any thoughts here? I can include these changes as well if we want to go that route 👍🏽

The only issue I can think of off the top of my head would be if there was a separate use for something like city or state that didn't align with address somehow?

from fideslang.

brentonmallen1 avatar brentonmallen1 commented on May 29, 2024

from fideslang.

NevilleS avatar NevilleS commented on May 29, 2024

Let's do it. I think it's a good improvement and all those address fields (city/street/postal code/etc) were really looking for a home

from fideslang.

cilliankieran avatar cilliankieran commented on May 29, 2024

As part of a large(ish) dataset labeling exercise I've come across the same questions in the last 24 hours.

To list some observations from that dataset labeling exercise and some comments on what's said here:

  • contact today encapsulates address labels as well as email and phone_number
  • The first issue with this is that if you have a field that is an entire address i.e. street, city, state, zip, country, you can’t label it correctly - contact isn’t semantically appropriate and it also encapsulates email and phone.
  • We should introduce user.contact.address and move phone and email up to user.contact
  • We are also missing building, suite or apartment # equivalent. So basically some version of unit_number or unit (not very intuitive?)
  • On your point of the edge cases of city and state not part of an address. I would say they should be labeled user.location , however our current use of location, in this context is actually for lat/long data so not sure this is right. Perhaps it should be location with optional sub categories for location.city, location.state and location.coordinates?
  • A final question though, what's the thinking in having user.account.address and user.contact.address - do we not risk complicating the understanding for manual labeling?

from fideslang.

brentonmallen1 avatar brentonmallen1 commented on May 29, 2024

as I understand the new changes, user.account and user.contact are branches that will remain. Currently, they both have repeating categories - i.e. user.account.contact.postal_code and user.contact.postal_code - as a remnant of removing the provided/derived and identifiable/non-identifiable paths.

from fideslang.

cilliankieran avatar cilliankieran commented on May 29, 2024

Yes, I think the wrinkle is probably from moving account --> user.account, as in the prior structure account represented something not owned by a user directly (pretty distinct) and I'm wondering if nesting these is now creating a doubling of these branches unnecessarily. I'm going to go back and look at the ISO spec and get back on this later today...

from fideslang.

brentonmallen1 avatar brentonmallen1 commented on May 29, 2024

just for some perspective on how these choices can impact things in a practical example, here's a label mapping I'm attempting to do for a separate effort.

https://docs.google.com/spreadsheets/d/1myvSpNPXT5U78B5XZb5p9mFAu1sIJqwsMs6vW36xkE8/edit#gid=0

from fideslang.

cilliankieran avatar cilliankieran commented on May 29, 2024

That's helpful @brentonmallen1, I think it's worth noting this decision impacts both the classifier work so how you're thinking about it here and also the manual labeling and cognitive load expected of any dev trying to manually annotate something.

As a reminder for all, these are the broad distinctions (based on ISO 19944) of the difference between account and user:

Account data
A class of data specific to each customer of the service that is required to sign up for, purchase or administer the service. This data includes information such as names, addresses, payment information, etc. Account data is generally under the control of the service provider.

User data
This includes content directly created by users and all data, including all text, sound, software or image files that customers provide to the service, or are provided to the cloud service on behalf of customers, through the capabilities of the service or application. This includes directly provided or derived data.

I understand why we agreed to move account under user, I just want to sanity check if we understand the impact in thinking for a developer using the tool - I would have to understand the distinction between a users contact information and between a users account related contact information.

Fwiw, to double check myself on this I've asked Damien (aka my evil twin brother, Chief Privacy Officer at Twitter and advisor to Ethyca) to play out a thought experiment and answer for us whether the distinction of account related to any specific single element of personal data would matter either in policy enforcement, risk evaluation or mapping. He said he'd get back to me on this over night....

from fideslang.

NevilleS avatar NevilleS commented on May 29, 2024

Hey all - I followed up with @cilliankieran separately as well and we agree with the general recommendation here: let's remove account entirely. This removes the duplication of ...email as a subcategory and will force us to declare "Account" data with a separate dimension like data_use, or a different kind of grammar like an "attribute", etc.

So to be laser-clear, I think the list of changes here would be:

  • Remove identifiable, nonidentifiable, derived, and provided subcategories
    • Remove the parent categories, combining the subcategories into one
  • Rename provide.system to provide.service
  • Remove account entirely, and all subcategories (e.g. account.contact)
  • Add user.contact.address as a parent category for the address-related categories like ...city, ...street, etc.

Do those four points sound right @SteveDMurphy ?

from fideslang.

SteveDMurphy avatar SteveDMurphy commented on May 29, 2024

All good to me @NevilleS, it certainly feels like this better achieves the goal of reducing hesitancy when annotating a dataset while providing more value as described by @brentonmallen1 - thanks for driving this to a conclusion and for the thoughts and feedback from everyone!

from fideslang.

NevilleS avatar NevilleS commented on May 29, 2024

ship it!

from fideslang.

brentonmallen1 avatar brentonmallen1 commented on May 29, 2024

would location be a more generic parent than address?

from fideslang.

SteveDMurphy avatar SteveDMurphy commented on May 29, 2024

would location be a more generic parent than address?

I feel like your original suggestion of address is pretty comprehensive for the purposes here (street, city, etc.) I think I remember from a live meeting someone mentioning location was generally more in line with point or coordinate data as well

from fideslang.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.