iabtechlab / fideslang Goto Github PK

Open-source description language for privacy to declare data types and data behaviors in your tech stack in order to simplify data privacy globally. Supports GDPR, CCPA, LGPD and ISO 19944.

Home Page: https://ethyca.github.io/fideslang

License: Creative Commons Attribution 4.0 International

Python 57.53% HTML 14.71% JavaScript 22.29% CSS 4.09% Dockerfile 0.54% Makefile 0.84%

privacy-tools dataprivacy taxonomy open-source privacy-engineering

fideslang's People

Contributors

Stargazers

Watchers

Forkers

nathanawmk pattisdr ax42 python-repository-hub bestmoon nasajon heymichaelp ewdurbin nevilles ethyca

fideslang's Issues

Move account to a subcategory of user, or remove it as a category

Is your feature request related to a specific problem?

Describe the solution you'd like

Although users have generally not complained about this, I think most don’t realize that the account category exists and why it should be used. The intent of account is to handle the common case where data used to manage a user’s account (e.g. an email address used to sign-in) may need to be processed differently than data used for non-account purposes (e.g. an email address used to contact them for support).

However, this distinction seems to not be intuitive to early users, and in fact the lack of questions about it seems to imply that it’s just not noticed! Therefore, it might be more intuitive and useful if we had two sibling subcategories, e.g. ...account.email and ...contact.email for this purpose.

Alternatively, it may be easier to remove it entirely. The distinction between account and contact, for example, might be better modeled by applying two different data_use annotations (e.g. provide vs. improve).

Now, we have data categories like:

user.provided.identifiable.contact.email
user.provided.identifiable.contact.phone_number
account.contact.email
account.contact.phone_number

After, the account category would be moved to a subcategory of user:

user.provided.identifiable.contact.email
user.provided.identifiable.contact.phone_number
user.provided.identifiable.account.email
user.provided.identifiable.account.phone_number

Alternatively, the account category could be deleted entirely and migrated to use user.provided.identifiable.contact equivalents instead.

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Data Owner Custom Field

As a data protection officer, I would like to be able to see the data owner. On the mouse over event, include the person or position information. The field must be of type string allowing the insertion of certain names such as:

John Ave
just name
Chief Finance Officer
just role
John Ave <[email protected]>
name + email
Chief Finance Officer <[email protected]>
role + email
Data Categories, Data Uses, Data Subjects, Data Qualifiers must contain owners. All levels and categories must have owners.

The filling of the field is optional and when it is empty, it should not appear.

As Is:

To Be:

Because this functionality is optional, backwards compatibility will be possible. I would like to start the conversation about technical debt and how this functionality will be implemented.

See ya

On the policy page the matches title has a typo in the  

Docs Update Description

matches on the policy page has as typo causing it to be displayed as matches enum &nbspr

Additional context

Add any other context about the problem here.

Research: Describe data movement

Ethyca internal requirements doc: https://ethyca.atlassian.net/wiki/spaces/PM/pages/2351661057/Data+Map+UI+-+Data+Source+and+Lineage

Extend policy model to include third country transfers

Is your feature request related to a specific problem?

An accompanying ticket to ethyca/fides#654, we are looking to extend fideslang to include the third_country_transfers attribute.

Describe the solution you'd like

TBD, need to determine how we want this to work in ethyca/fides#654, and will add additional details here

Generate fancy markdown docs directly from the source-code files

Is your feature request related to a specific problem?

We're adding as much documentation as we possible can to the pydantic models themselves for fideslang, and it's only going to get harder and more cumbersome to try to manage updating two separate places whenever we make a fideslang model change.

Describe the solution you'd like

I'd like each of the data types to have a nice markdown table generated purely from the pydantic sauce and included as part of our docs

Describe alternatives you've considered, if any

Using some kind of automated docs builder, it may exist

Add a CLI flag/config value that includes dbt models as dataset files in a Taxonomy

Is your feature request related to a specific problem?

Users should be able to build a taxonomy that includes dbt models as datasets

Describe the solution you'd like

There should be a flag/config value that allows the user to specify that they would like to load datasets from dbt model files and include them as part of the taxonomy.

Describe alternatives you've considered, if any

The alternative would be to generate static fides dataset files from the dbt files, but this seems redundant. And would leave the potential for the files to get out of sync. If we always parse from the source dbt files, we can never get out of sync.

Additional context

This step needs to happen at Taxonomy parse time and will probably exist as part of fideslang.

Human-readable aliasing for fides models

Is your feature request related to a specific problem?

Currently in fidesctl, the human-readable column names for a data map are defined in a constant. This seems to be misaligned with the rest of our codebase which relies heavily on pydantic models to define and maintain structure and attribution (i.e. descriptions) of data across the fides ecosystem.

Describe the solution you'd like

Add human-readable values, utilizing the alias attribute, to be used by other fides products.

Describe alternatives you've considered, if any

The use case that inspires this issue built a constant (dict) with Pandas DataFrame columns mapped to a human-readable equivalent, which is currently passed as the first set of records to a data map item. See fidesctl#779

Additional context

This could yet be challenging due to some required separation when using a DataFrame to join resources to construct data map items.

Creating at least one privacy declaration without a name causes validation errors

Bug Description

Creating at least one privacy declaration without a name causes validation errors

Steps to Reproduce

In a python repl...

>>> from fideslang import System, PrivacyDeclaration
>>> p = PrivacyDeclaration(data_categories=[], data_use="test", data_subjects=[])
>>> p2 = PrivacyDeclaration(data_categories=[], data_use="test", data_subjects=[])
>>> s = System(fides_key="s", system_type="s" privacy_declarations=[p, p2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for System
privacy_declarations
  '<' not supported between instances of 'NoneType' and 'NoneType' (type=type_error)

Doing just one privacy declaration works okay, perhaps because this sort doesn't need to compare

Expected behavior

Be able to make multiple privacy declarations without names without problems

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

Version:
OS:
Python Version:
Docker Version:

Additional context

There's a number of ways we could fix this, considering name is being deprecated:

Remove the sort
Sort on data_use instead
Default name to ""

I'm not sure what the best way is, open to suggestions!

Update fideslang documentation

Docs Update Description

Now that fideslang lives here full-time, the docs probably need to be revisited and updated to match the true state of fideslang.

Additional context

There might be some useful docs still in fidesctl for fideslang, but they might be gone by the team this ticket gets worked on so git history will come in handy

These changes were also recently made to Fideslang and published as version 1.1.0 - https://ethyca.atlassian.net/wiki/spaces/EN/pages/2345599010/Fideslang+1.1.0+Planning

The ticket for that work is here: https://github.com/ethyca/fideslang/issues/58

Remove old `data_files/` directory

Bug Description

The history of the data_files/ directory suggests that it is quite outdated and could lead to confusion. These should likely be removed or at least re-generated if still required

Steps to Reproduce

n/a

Expected behavior

fideslang to be imported or viewed from the documentation website

Screenshots

n/a

Environment

Version: 1.1.0
OS: Darwin
Python Version: 3.9.12
Docker Version: 20.10.16

Additional context

Found as part of #62

Complete Support for Controller/Processor Identification - Controllorship #22

Discussed in #22

Extending taxonomy to adequately describe controller/processor of data types for a given system's behaviors relative to data processing.

Two prior approaches have been investigated:

Adhering to constructs of ISO 19943
Using system I/O context to derive data flow and stewardship

Requires additional evaluation after next major release on 11/3/2021

^{Originally posted by JasonMWhite October 26, 2021}
A common issue that we face is the need to know which data we hold as a controller vs as a processor. Depending on the role in which we are acting, we can have different requirements for retention and permitted uses.

This can be more than a simple controller/processor split. There are also situations where we could be a co-controller, or there could be several independent controllers.

Have you thought of how you might extend the data subject taxonomy to include a controllership dimension?

Add a new config section for dbt

Is your feature request related to a specific problem?

We want to enable dbt model parsing, and having a config section where values could be configured would make this much easier.

Describe the solution you'd like

A new section, similar to the existing cli, api, etc. that allows us to configure values for dbt

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

The initial look should be something like:

[dbt]
models_dir = "models/"

Should `system_dependencies` allow self referencing?

Is your feature request related to a specific problem?

Systems can send data to themselves, so is it useful to be able to list themselves as system_dependencies? Right now it is not possible. system_dependencies is deprecated, but egress and ingress are replacing it and do allow self referencing. Making this issue as a place to discuss the pros/cons

Additional context

Rename provide.system to provide.service in data_uses

Is your feature request related to a specific problem?

This is a simple one, but I’ve found that every time we explain the provide data use, we invariably want to say something like “…the provide use is the most common reason to collect user data, and it’s because you need their data to provide your product or service.” On top of these, we use the noun system in many places in fideslang (including as a top-level category in data_category) where it has a different meaning, so I feel it'd be more intuitive to not use the "system" terminology in the data_use taxonomy at all.

Describe the solution you'd like

Now, we have data uses like:

provide.system
provide.system.operations.support

After, we’d just rename these:

provide.service
provide.service.operations.support

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Set pydantic `orm_mode` config to `True` for systems and privacy declarations

Is your feature request related to a specific problem?

As part of ethyca/fides#3036, we'll be making a proper db table to store the privacy_declarations of Systems. Since we've already got pydantic models for PrivacyDeclarations, by updating the System, PrivacyDeclaration and DataUse pydantic models (defined here in fideslang) to have orm_mode set to True, this would minimize the amount of code changes in fides that we'd need to implement this update; instead we can rely mainly on pydantic's orm model mapping to treat the new PrivacyDeclaration DB (ORM) records similarly to how they were being handled before as JSON blobs that were validated against the pydantic models

Describe the solution you'd like

Add a

class Config:
        orm_mode = True

to the System, PrivacyDeclaration and DataUse pydantic models defined in fideslang.

- The reason we'll be updating the `DataUse` model is because we'll add a 'hard' FK reference from `PrivacyDeclaration` to `DataUse` records, rather than relying on a 'soft' string-matched reference that existed previously

Describe alternatives you've considered, if any

Instead of making this update, we could update more of our code in fides to translate DB/ORM instances of PrivacyDeclarations into JSON blobs as they'd been previously, so that they can be validated properly by the pydantic model. IMO, going that route would introduce more customized logic that would be fragile to future changes.

The proposed updates here would be a more "out of the box" solution that should be more maintainable and less error prone. Probably significantly less code too!

Another alternative: could/should we create a base pydantic class for our fideslang pydantic models that defaults to orm_mode = True? we may want to update more of our pydantic models in a similar fashion moving forward, so this could save us some updates. I'd need to do a bit more investigation to look at potential downsides/regressions from this broader change...

Additional context

We may look to make similar changes moving forward, since we may want to store more of our fideslang pydantic models in their own DB tables. How can we make this as future proof as possible while also not blowing things up? See base class suggestion above.

Consider updating type annotation of `meta` attribute to be more permissive

Is your feature request related to a specific problem?

For ethyca/fides#3315, we've discussed leveraging the meta attribute on the System model/class to pass in some arbitrary metadata. For that use case, it could be useful to specify nested JSON objects (ideally of arbitrary depth), rather than the "flat" (i.e. depth of 1) objects/dictionaries that are required currently based on the type annotation of Optional[Dict[str, str]].

@TheAndrewJackson also noted (in ethyca/fides#3315) that this was a requirement he needed for the webscanner demo.

Describe the solution you'd like

Could we allow for nested JSON objects of arbitrary depth for the meta attributes (of System and probably of Dataset too, for consistency)?

e.g.

{
  "saas_config": {
    "type":"stripe",
    "icon":"***",
  }
}

{
  "web_scan": {
    "domains": [
      {
        "url": "***",
        "cookies": [
          "***",
          "***",
          "***",
        ]
      }
    ]
  }
}

I think this may be as simple as updating the type annotations to Optional[Dict[str, Any]]?

Describe alternatives you've considered, if any

There may be some unforeseen implications of making this change -- is it possible that any dependent code (either in our, i.e. Ethcya's, code base or anyone else who uses fideslang) is depending on the stricter typing that's in place, and if so, would this change be disruptive?

Additionally, it would be good to understand the original motivation behind the stricter typing, if that was a deliberate decision.

If we judge that updating the type annotation on the meta property to be more permissive is not a good path forward, then we could create another property that allows for nested dictionaries (of arbitrary length) for arbitrary metadata. This feels pretty messy, though - especially considering we already have a fidesctl_meta attribute (that holds more strictly-typed system metadata)...

Additional context

This is motivated by the work to more explicitly tie Systems to saas connector types. See ethyca/fides#3315 for some more detail on that use case.

cc @TheAndrewJackson @galvana

Assign dbt datasets to a system based off of a default config value

Is your feature request related to a specific problem?

Generating datasets from dbt models is great, but generally not super useful unless we are attributing them to a specific system. This would create a feature that allows us to do that

Describe the solution you'd like

We could add a config value like default_system that will automatically attribute dbt models to a certain system at runtime.

Describe alternatives you've considered, if any

Additionally, we could have a special key/value pair in the dbt models yaml parser that can assign models to specific systems on a per-model basis

Additional context

Add any other context or screenshots about the feature request here.

[Backend] Add Optional PrivacyDeclaration.cookies field

Is your feature request related to a specific problem?

When I am creating or editing a data use on a system, I need to be able to provide one or more cookie names that correspond to that data use.

Describe the solution you'd like

Add an optional PrivacyDeclaration.cookies field which should be an optional list of strings.

Assign dbt datasets to a system based off of a metadata value

Is your feature request related to a specific problem?

There might be times when we want to assign different dbt models to different systems. We should allow the user to do this via model-level metadata

Describe the solution you'd like

The dbt model parser would look for a special metadata field at the model leve called something like system that would tell fidesctl which system to assign the dataset to

Describe alternatives you've considered, if any

Using a generic, catch-all default system for all datasets

Additional context

Add any other context or screenshots about the feature request here.

Broken category links

Docs Update Description

When I click on the links on both the README, and policy pages I get 404s. For example on the README Learn more about [Data Categories in the taxonomy reference now](https://ethyca.github.io/fideslang/data_categories/). goes to a 404.

Additional context

Add any other context about the problem here.

Add fides_meta and meta attributes to base models

Is your feature request related to a specific problem?

Reticketed from https://github.com/ethyca/fideslang/issues/94, which handled this for Datasets.

We want to more broadly unify some of the -ctl and -ops fields. Ops wasn't really concerned with updating Systems or Organizations in the past, but as we move to share models in between these products, it would be good to share the underlying attributes.

Update Organization to use fides_meta instead of fidesctl_meta
Update System to use fides_meta instead of fidesctl_meta
Add meta fields to Organization and System.

Are there other models we need to consider?

Note:
Downstream fides writes system information to fidesctl_meta. This will need to switch to fides_meta. Likewise, downstream database tables will need to rename the fidesctl_meta attribute.

A description of what the problem is.
Ex. I'm always frustrated when [...]

Describe the solution you'd like

A description of what you want to happen.

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Release fideslang 1.0.0

No py.typed file present resulting in no type hints error

Bug Description

This library contains types, but when using it through pip install fideslang the installing library doesn't know about the types resulting in a mypy no type hints or library stubs error. This can be fixed by adding a py.typed file. After that installing libraries will know types are present and remove this error.

Steps to Reproduce

Create a project and Install the fideslang package into a virtual environment with pip install fideslang
Import something from fides lang into the project. For example from fideslang.models import FidesKey
Run mypy and see the error mypy: Skipping analyzing "fideslang.models": found module but no type hints or library stubs

Expected behavior

The package should alert the installing package to the presence of types.

Screenshots

Environment

Version: 0.9.0
OS: MacOS
Python Version: 3.10
Docker Version: N/A

Additional context

N/A

Docs for system `joint_controller` and `data_protection_impact_assessment` are inaccurate

Bug Description

joint_controller and data_protection_impact_assessment are listed in the docs as array fields, but they are just object fields.

https://ethyca.github.io/fideslang/resources/system/

For example, data_protection_impact_assessment is not an array, but rather this: https://github.com/ethyca/fideslang/blob/main/src/fideslang/models.py/#L597-L618

The documentation should be updated to not describe these fields as arrays.

Upstream DSR and connector related metadata from Fides

The fides project has extended the fideslang models to included extra metadata for processing DSRs. Most notably the FidesopsMeta model. All of the changes made to fideslang models in the dataset.py should be upstreamed into the fideslang project. This is to standardize on what DSR metadata should look like.

Acceptance Criteria

Upstreamed all of the DSR fields from fides into fideslang
Update the Taxonomy model to not strip out extra fields that are included in JSON payloads
Update fidesops_meta to fides_meta

Add a dbt model files parser to fideslang

Is your feature request related to a specific problem?

We want to include dbt models as viable sources of dataset information.

Describe the solution you'd like

We need to write a parser for fideslang that allows us to create fideslang Dataset objects from standard dbt model yml files.

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

For this first iteration, we won't be implementing this into any existing commands, but we should have tests that prove the parser works.

Update css to use appropriate brand colors for the docs site

Docs Update Description

The light mode in particular of the fideslang docs is using the old ethyca brand colors.

Additional context

Appropriate updates can be pulled from either fidesops or fidesctl.

Remove the deprecated `PrivacyDeclaration.dataset_references` and `System.system_dependencies` fields

The dataset_references field on PrivacyDeclaration resources and the system_dependencies field on System resources were deprecated in #85 (see ethyca@36638c3 and ethyca@6e03b27). These fields should be removed.

Duplicate records in .fides/db_dataset.yml

Bug Description

While writing tests for https://github.com/ewdurbin/dbml-to-fides, I grabbed .fides/db_dataset.yml as a fixture.

When testing some transformations, I discovered that this file contains duplicates for fidesuser, fidesuserpermissions, and client tables with conflicting information (namely data_categories).

Steps to Reproduce

$ egrep '  - name: fidesuser$|  - name: fidesuserpermissions$|  - name: client$' .fides/db_dataset.yml 
  - name: client
  - name: fidesuser
  - name: fidesuserpermissions
  - name: client
  - name: fidesuser
  - name: fidesuserpermissions

Expected behavior

I would have expected that something validates a Fides dataset to have non-conflicting ~~keys~~ entries for each member of a collection

Screenshots

n/a

Environment

n/a

Additional context

n/a

update taxonomy diagrams and usages

Docs Update Description

See engineering ticket: https://github.com/ethyca/fideslang/issues/58

Remove identifiable, nonidentifiable, derived, and provided subcategories
Remove the parent categories, combining the subcategories into one
Rename provide.system to provide.service
Move account to a subcategory of user (e.g. user.account.email, user.contact.email)

Additional context

Usages and diagrams both need updating, as well as the visual taxonomy.

Add a 'Business' category

Is your feature request related to a specific problem?

There isn't a clear way to label PII data that is tied to a business.

There are instances where data can be associated with a business rather than an individual. For example, a phone number could be for a business but it would get categorized as user.contact.phone_number.

Describe the solution you'd like

Add a business branch to the root node of the data categories tree.

Describe alternatives you've considered, if any

Currently, it seems like the system category is the fallback.

`name` field in DatasetCollection and DatasetField should be unique

Bug Description

Right now you are able to add two DatasetCollections of the same name to a Dataset. The name field should be enforced to be unique, since databases wouldn't allow i.e. two schemas of the same name.

Steps to Reproduce

Add two DatasetCollections of the same name

Expected behavior

There should be a validator that prevents the second DatasetCollection from being added

Additional context

More context for why this came up in ethyca/fides#718 (comment)

Tests break with Pydantic 1.10.0

Bug Description

With Pydantic 1.10.0 The tests are failing.

    @pytest.mark.unit
    def test_find_referenced_fides_keys_1():
        test_data_category = DataCategory(
            name="test_dc",
            fides_key="key_1.test_dc",
            description="test description",
            parent_key="key_1",
        )
        expected_referenced_key = {"key_1", "key_1.test_dc", "default_organization"}
        referenced_keys = relationships.find_referenced_fides_keys(test_data_category)
>       assert referenced_keys == set(expected_referenced_key)
E       AssertionError: assert {'default_organization', 'key_1.test_dc'} == {'key_1', 'default_organization', 'key_1.test_dc'}
E         Extra items in the right set:
E         'key_1'
E         Full diff:
E         - {'key_1', 'default_organization', 'key_1.test_dc'}
E         ?  ---------
E         + {'default_organization', 'key_1.test_dc'}

Steps to Reproduce

Install pydantic 1.10.0
Run the test suite

Expected behavior

Tests should pass

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

Version: main branch
OS: MacOS and CI
Python Version:
Docker Version:

Additional context

I have opened a PR to specify pydantic < 1.10.0 for now until we can figure out what the issue is. #79

Add a `tags` field to all top-level fides models

Is your feature request related to a specific problem?

We want to let users "tag" their resources to eventually make filtering and reporting easier

Describe the solution you'd like

We should add a tags field to the FidesBase model found here as type List[str]

Describe alternatives you've considered, if any

Adding this to the generic meta field instead

Additional context

Add any other context or screenshots about the feature request here.

Remove the identifiable/nonidentifiable subcategories in data_categoires

Is your feature request related to a specific problem?

Describe the solution you'd like

The identifiable / nonidentifiable categories tend to confuse users; especially engineers who are attempting to classify a given field. We received this feedback from Momentive, Twitter, Ikea, and Walmart

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

tech debt - consider reverting `Literal` workaround for pydantic bug

Is your feature request related to a specific problem?

https://github.com/ethyca/fideslang/pull/109 had a workaround for a pydantic bug. we should consider reverting this workaround in due course, if possible, just to clear out tech debt.

Describe the solution you'd like

will the underlying bug(s) have been addressed at a certain point so that we can revert the change?

Describe alternatives you've considered, if any

keep the workaround in place

Additional context

keeping track of tech debt 😄

DataUse & DataCategory Simplifications

The following proposed changes to fideslang follow months of review and feedback from users. Most of these changes are due to confusion when implementing causing hesitation.

Remove identifiable, nonidentifiable, derived, and provided subcategories
- Remove the parent categories, combining the subcategories into one
Rename provide.system to provide.service
Move account to a subcategory of user
- e.g. user.account.email, user.contact.email

Describe the solution you'd like

Implementing these changes will provide some simplicity when selecting a DataCategory and help reduce the barrier to entry.

Describe alternatives you've considered, if any

Maintaining the existing taxonomy as backwards compatible.

Additional context

Some decision made around this change:

Do not implement backwards compatibility for this breaking change
Some suggested enhancements have been pushed to a later minor release of fideslang

Update visual taxonomy to include `user.biometric`

Docs Update Description

The visual taxonomy is missing the user.biometric category. Make sure these categories appropriately align with the docs.

Additional context

"In particular the docs refer to data categories like user.biometric and user.biometric_health but on the visualization there’s user.biometric_health and user.credentials.biometric_credentials.
Can we double check that both/either are correct."

Add `ingress` and `egress` fields to the `system` resource for modeling data flows

Is your feature request related to a specific problem?

We need a way to track directionality of data flowing between systems

Describe the solution you'd like

Adding an ingress and an egress field to the system resources

Describe alternatives you've considered, if any

More alternatives and discussion took place here (internal only)

Additional context

Current Definition: yaml system: - fides_key: demo_analytics_system name: Demo Analytics System description: A system used for analyzing customer behaviour. system_type: Service privacy_declarations: - name: Analyze customer behaviour for improvements. data_categories: - user.contact - user.device.cookie_id data_use: improve.system data_subjects: - customer data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified dataset_references: # This goes away - demo_users_dataset Proposed Definitionyaml system: - fides_key: demo_analytics_system name: Demo Analytics System description: A system used for analyzing customer behaviour. system_type: Service privacy_declarations: - name: Analyze customer behaviour for improvements. data_categories: - user.contact - user.device.cookie_id data_use: improve.system data_subjects: - customer data_qualifier: aggregated.anonymized.unlinked_pseudonymized.pseudonymized.identified ingress: # This is added - fides_key: marketing_system type: system # An Enum of [system|dataset] action: process # An as-of-yet undefined Enum egress: # This is added - fides_key: customer_dataset type: dataset action: persist

Note that the main changes here: 1. The removal of the dataset_references field 2. The addition of the ingress and egress fields

Design decisions to consider: 1. Ingress and Egress are defined on a per-declaration basis 2. This design does not let users specify which taxonomy types are involved in the egress/ingress, instead inheriting those values from the privacy declaration they are attached to 3. The enum values for action will need to be defined (process? persist? Read/write?) 4. No delineation between internal/external ingress/egress (external systems should be defined the same as internal systems, so this seems like an ok omission)

Update Data Uses (taxonomy)

Update Fideslang data uses as described here: https://ethyca.atlassian.net/wiki/spaces/PM/pages/2643132432/One+set+of+data+uses+to+rule+them+all+privacy+regulations#%5CuD83D%5CuDDD2-Proposed-Changes

Changes break down into five sections:

Top-level-category changes
Second-level-category changes
Restructuring advertising to support consent
Restructuring provide to support essential notices.
~~Adding keys and objects to model appropriate consents~~

Row 5 is deliberately removed from the scope of this work. We will continue to refine this and look at including it in a future update.

Acceptance Criteria

Update the data uses to reflect the new users shown in the sheet: [PROPOSED] Data Uses - Fides Taxonomy V 1.4
Create a migration utility that will migrate existing uses to the new uses (especially focused on advertising uses)
Ideally, we’d also create a script that will allow a customer to preview the changes before they are made, so they can prepare internally to make any necessary updates post-deployment.
Since this is a larger change to the taxonomy, this should trigger an update to make this v1.4

Release 1.2.0

Added the is_default column, ready to cut a patch release

Move the `fideslang` module from `fidesctl` into this repo

Is your feature request related to a specific problem?

The fideslang module is designed to be the foundation for all fides projects but is currently living in fidesctl. It should be put here to facilitate more intentional changes to the language itself and cleaner releases and version pinning of the language.

Describe the solution you'd like

This repo becomes the python project for fideslang, which gets uploaded to pypi and versioned just like everything else.

Describe alternatives you've considered, if any

Keeping this module in fidesctl

Additional context

This will simplify the fidesctl repo somewhat and give us a docs site to heavily describe the language.

Data Subjects should be hierarchical

To be consistent with the other taxonomies, Data Subjects should be hierarchical to be more descriptive in declaring the types of data subjects that a piece of software can hold.

fideslang: incompatibility with python < `3.10` caused by pydantic bug

Bug Description

Corresponding ticket to ethyca/fides#3358 which fixes the problem in fides. But we should fix it also here in fideslang to keep it compatible in its own right with python < 3.10.

Steps to Reproduce

try "using" fideslang in a python env < 3.10

Expected behavior

projects/apps that run on python < 3.10 should be able to easily leverage fideslang

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

Version:
OS:
Python Version:
Docker Version:

Additional context

See ethyca/fides#3358 for more context

include `id` in `PrivacyDeclaration` model

Is your feature request related to a specific problem?

Now that PrivacyDeclarations have their own DB table/records, we should allow their id to be exposed on API responses so that it can be referenced by other operations (e.g. custom field references)

Describe the solution you'd like

include an Optional[str] field of id in the PrivacyDeclaration model

Describe alternatives you've considered, if any

Instead of updating fideslang here, we could instead extend the PrivacyDeclaration model class in fides to include the additional field, and it would be used specifically in response payloads. This may be a bit more targeted, but it's a lot more code changes and i think it will be hard to follow, so i prefer this option here.

Additional context

I believe this will be needed to allow the fidesplus FE to be able to add custom fields that reference particular PrivacyDeclarations. without this id being returned in our API responses, there's no way that the FE will be able to know the id of particular PrivacyDeclaration.

`DataFlow`s include invalid fields when exported as YAML

Bug Description

When a System resource that includes egress and/or ingress DataFlows are exported to YAML using write_manifests(), the type field renders incoherently. Ex:

type: !!python/object/apply:fideslang.models.FlowableResources
- system

Expected behavior

The type field should be written as a literal string value. Ex:

type: system

Additional context

Rather than assert that type parameters must be a member of the FlowableResources Enum, assert that type is a literal str, and add a validator that asserts it's inclusion in the FlowableResources Enum.

Enforce at least one "tag" set on each system

Is your feature request related to a specific problem?

Once #43 is implemented, we would like to enforce that each system has at least one "tag"

Describe the solution you'd like

Add a pydantic validator to make sure that the "tag" isn't left empty for any systems

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Types are not being included with the package

Bug Description

When pip installing fideslang the types are not being included even though they are present so when using the package types have to be ignored.

Steps to Reproduce

Install fideslang
Run mypy

Expected behavior

Types should be included.

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

Version: 1.3.3
OS:
Python Version:
Docker Version:

Additional context

Add any other context about the problem here.

Taxonomy visualizer image not showing

Docs Update Description

The taxonomy visualization image is not working and hasn't been for a few days. It's very helpful to see. https://ethyca.github.io/fideslang/

Describe what you'd like to see documented, and link the page if applicable

Additional context

Add any other context about the problem here.