Giter VIP home page Giter VIP logo

diffsync's Introduction

DiffSync

DiffSync is a utility library that can be used to compare and synchronize different datasets.

For example, it can be used to compare a list of devices from 2 inventory systems and, if required, synchronize them in either direction.

Primary Use Cases

DiffSync is at its most useful when you have multiple sources or sets of data to compare and/or synchronize, and especially if any of the following are true:

  • If you need to repeatedly compare or synchronize the data sets as one or both change over time.
  • If you need to account for not only the creation of new records, but also changes to and deletion of existing records as well.
  • If various types of data in your data set naturally form a tree-like or parent-child relationship with other data.
  • If the different data sets have some attributes in common and other attributes that are exclusive to one or the other.

Overview of DiffSync

DiffSync acts as an intermediate translation layer between all of the data sets you are diffing and/or syncing. In practical terms, this means that to use DiffSync, you will define a set of data models as well as the “adapters” needed to translate between each base data source and the data model. In Python terms, the adapters will be subclasses of the Adapter class, and each data model class will be a subclass of the DiffSyncModel class.

Diffsync Components

Once you have used each adapter to load each data source into a collection of data model records, you can then ask DiffSync to “diff” the two data sets, and it will produce a structured representation of the difference between them. In Python, this is accomplished by calling the diff_to() or diff_from() method on one adapter and passing the other adapter as a parameter.

Diffsync Diff Creation

You can also ask DiffSync to “sync” one data set onto the other, and it will instruct your adapter as to the steps it needs to take to make sure that its data set accurately reflects the other. In Python, this is accomplished by calling the sync_to() or sync_from() method on one adapter and passing the other adapter as a parameter.

Diffsync Sync

Simple Example

A = DiffSyncSystemA()
B = DiffSyncSystemB()

A.load()
B.load()

# Show the difference between both systems, that is, what would change if we applied changes from System B to System A
diff_a_b = A.diff_from(B)
print(diff_a_b.str())

# Update System A to align with the current status of system B
A.sync_from(B)

# Update System B to align with the current status of system A
A.sync_to(B)

You may wish to peruse the diffsync GitHub topic for examples of projects using this library.

Documentation

The documentation is available on Read The Docs.

Installation

Option 1: Install from PyPI.

$ pip install diffsync

Option 2: Install from a GitHub branch, such as main as shown below.

$ pip install git+https://github.com/networktocode/diffsync.git@main

Contributing

Pull requests are welcomed and automatically built and tested against multiple versions of Python through GitHub Actions.

The project is following Network to Code software development guidelines and are leveraging the following:

  • Black, Pylint, Bandit, flake8, and pydocstyle, mypy for Python linting, formatting and type hint checking.
  • pytest, coverage, and unittest for unit tests.

You can ensure your contribution adheres to these checks by running invoke tests from the CLI. The command invoke build builds a docker container with all the necessary dependencies (including the redis backend) locally to facilitate the execution of these tests.

Questions

Please see the documentation for detailed documentation on how to use diffsync. For any additional questions or comments, feel free to swing by the Network to Code slack channel (channel #networktocode). Sign up here

diffsync's People

Contributors

chadell avatar dakonr avatar dependabot[bot] avatar dgarros avatar fragmentedpacket avatar glennmatthews avatar grelleum avatar itdependsnetworks avatar jamesharr avatar josh-silvas avatar jvanderaa avatar kircheneer avatar lonestar-swish avatar ms-gh-admin avatar renovate[bot] avatar ubaumann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffsync's Issues

Option to refresh/verify data model after create/update/delete

Environment

  • DiffSync version: 1.0.0

Proposed Functionality

We could have an option for DiffSync to refresh data model contents from the underlying backend system or dataset, after doing a create,update,delete operation (e.g. in _sync_from_diff_element()), so as to verify that the operation was actually reflected in the backend.

With this option enabled, DiffSync could report a create() as failed if no underlying record was actually created, a create() or update() as incomplete if some attributes were not set correctly, a delete() as failed if the underlying record still exists, etc.

This option should probably be off by default for performance reasons; also this would be mostly used as a debugging tool during development.

Use Case

When developing a new adapter and associated DiffSyncModel classes, the default model implementations of create, update, delete report success and update their local status without actually interacting with the backend in any way. If these methods are left unimplemented, or only partially implemented (creating/updating the uid keys of a model without setting its optional attributes for example) then this can give a false impression of success/completeness that will only be corrected by inspecting the backend and/or running another sync attempt. Being able to automatically identify and flag incomplete synchronization actions would make gap analysis during development much easier.

Add option for progress-bar type reporting

Environment

  • DiffSync version: 1.2.0

Proposed Functionality

DiffSync to provide hooks for status reporting, for example:

  • number of records processed so far in diff calculation
  • number of records remaining for diff calculation
  • number of diff elements processed so far in synchronization
  • number of diff elements remaining to process for synchronization

Example API to consider is that of urllib's reporthook (https://docs.python.org/3/library/urllib.request.html#urllib.request.URLopener.retrieve):

If reporthook is given, it must be a function accepting three numeric parameters: A chunk number, the maximum size chunks are read in and the total size of the download (-1 if unknown). It will be called once at the start and after each chunk of data is read from the network.

Use Case

When using DiffSync for a large set of records, both diffing and syncing may take some time to complete. Although logging can be enabled to get highly detailed information about DiffSync's progress, it would be useful to have a less-detailed status/progress information API as well, which could be used to (for example) update a progress bar.

Support multiple types of datastore

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Currently DiffSync Adapters are always leveraging an internal in-memory datastore that is storing the entire dataset.
It would be great to support different types of datastore, like Redis in addition to the in-memory datastore.
As an option it would be useful to deactivate the internal datastore as well or provide a solution to pull the data directly from the remote system.

Use Case

When we are dealing with a large dataset, the volume of data stored in-memory can become very large and can present some challenges. And external datastore like Redis would reduce the volume of data stored in memory.
In some cases, DiffSync is running very closely to an existing database and duplicating the data in memory is redundant and inefficient.

Option to skip deletion of records on a case-by-case basis

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Add a model flag that can be used to control whether an unmatched model class or instance will trigger deletion (and/or creation) of records when a sync operation is run.

Use Case

The existing global IGNORE_UNMATCHED_DST flag is not sufficiently granular as it applies to all records and all models. In some cases that may be adequate, but in others there needs to be per-model or even per-record control over this behavior -- for example, an application may not wish to delete unmatched Device records (perhaps a device is temporarily offline and hence not included in the source data), but may still wish to delete unmatched Interface records (as they reflect incorrect information about existing devices).

Backwards incompatibility: DiffSyncActions is no longer an enum in 1.4.3

Environment

  • DiffSync version: 1.4.3
  • Python version: 3.7

Observed Behavior

I found that the most recent version 1.4.3 of diffsync probably introduces a breaking change, which is not recorded in release notes.

That is removing Enum from DiffSyncActions's base classes. This change causes backwards incompatibility:

  • DiffSyncActions is not iterable now.
  • Attributes inherited from Enum are not accessible by DiffSyncActions members, e.g. name, and value.

Steps to Reproduce

To make it easier to understand the impact of the breaking changes, I write a short code snippet that reproduces the breaking changes.
The following code runs well in 1.4.2 but gets crashed in 1.4.3.

import diffsync.enum
from enum import Enum

print(issubclass(diffsync.enum.DiffSyncActions, Enum))
try:
    print(list(diffsync.enum.DiffSyncActions))
except Exception as ex:
    print(ex)
print(diffsync.enum.DiffSyncActions.CREATE.name)

Output in diffsync 1.4.2:

True
[<DiffSyncActions.CREATE: 'create'>, <DiffSyncActions.UPDATE: 'update'>, <DiffSyncActions.DELETE: 'delete'>, <DiffSyncActions.NO_CHANGE: None>]
CREATE

Output in diffsync 1.4.3:

False
'type' object is not iterable
Traceback (most recent call last):
  File "a.py", line 9, in <module>
    print(diffsync.enum.DiffSyncActions.CREATE.name)
AttributeError: 'str' object has no attribute 'name'

Maybe recording these changes in release notes helps to avoid user confusion during updating this version?

Kind regrads,

@eXceediDeaL

Configurable abort-on-failure vs continue-on-failure

When performing a diff or a sync, it needs to be possible to configure DSync to either continue after encountering failures or abort gracefully after encountering the first failure.

This is needed for the SOT Sync project.

print_detailed() should be replaced with a string builder

Environment

  • DSync version:

Proposed Functionality

Currently the print_detailed APIs on DSync, DSyncModel, Diff, and DiffElement print to stdout when called. These should be changed/refactored so that they instead construct and return an assembled string, which the caller can then print(), log.debug(), etc. as desired.

Use Case

The current functionality is useful for debugging but not for integrated use cases where logging would be more appropriate.

Possibly change "_src" and "_dst" keys in Diff.dict()?

Environment

  • DiffSync version: 1.0.0

Proposed Functionality

In the dict constructed by Diff.dict(), change the _dst and _src keys to something more intuitive, aesthetically pleasing, and/or "diff-like" -- perhaps - and +, or < and >?

Use Case

The current "_src" and "_dst" keys were selected to avoid any likely conflict with the child DiffElement names, e.g.:

'DC1': {'_dst': {'parent_location_name': 'New York', 'status': 'in-transit'},
        '_src': {'parent_location_name': 'Tennessee', 'status': 'active'},
        'device': {...},
        'prefix': {...},

but they're kinda ugly as keys.

Add a new `skip` counter in diff.summary()

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Extend the list of counter returned by diff.summary() to include skip in addition to create, update, delete & no-change

Use Case

Currently the models that are being skipped because of some global or model flags like SKIP_UNMATCHED_SRC | SKIP_UNMATCHED_DST are not accounted for in the diff

Add model flags to control which crud method should be executed on sync

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

It would be very useful to have a few more Model flags to control which CRUD methods (create/update/delete) would be called during a sync()

I would like to propose

  • CRUD_NO_UPDATE: Do not call update() on the DiffSyncModel during sync(), the model and the changes will still be visible in the diff.
  • CRUD_NO_DELETE: Do not call delete() on the DiffSyncModel during sync(), the model and the changes will still be visible in the diff.
  • CRUD_NO_UPDATE_DELETE = CRUD_NO_UPDATE | CRUD_NO_DELETE

I wish we could support CRUD_NO_CREATE but I don't think this is possible right now because we can't pass context to this method since the model doesn't exist yet.

Use Case

The main use case for me would be to protect some object to be READ_ONLY but still show up in the diff while other objects of the same type would remain READ_WRITE.
As an example
When using the network-importer, once we have done the initial import of the SOT and the data has been cleaned up, it would be useful to protect some objects from being updated/deleted in the SOT but it's still interesting to have these objects show up in the diff.
Today for this use case we are using the flag IGNORE but the object is completely ignored and doesn't show up in the diff at all.

DiffSyncModel: Create/Update ordering and Delete ordering

This came up in internal conversations around usage and some potential improvements.

Potentially adding a flag that will set the create/update to process parents before children and then also the reverse, children are deleted before parents.

An example would be Nautobot and the dependencies of objects within Nautobot. Say you want to delete a site, but children objects exist such as devices, you need to delete the devices before deleting the site. This caused some intermittent and hard to troubleshoot scenarios.

It was brought up that some of these deletions would be deferred as well which may not be wanted.

Create an Enum to track the valid type of Actions

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Today, the list of valid actions : create, update, delete is not clearly defined and the value is hardcoded in multiple places in the code
It would be good to create a proper enum for these values and use it everywhere instead of having hardcoded values

Use Case

House keeping

Docs Update: Clarify sublassing DiffSyncModels for different backends

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

The README makes mention of extending a "base" DiffSyncModel for handling CRUD actions in a backend, but doesn't do a great job of visualizing this concept.

you need to extend your DiffSyncModel class(es) to define your own create, update and/or delete methods for each model.

I think extending the example out a little more would go a long way to showing how to build your models and adapters.

class Device(DiffSyncModel):
    """Example model of a network Device."""_modelname = "device"
    _identifiers = ("name",)
    _attributes = ()
    _children = {"interface": "interfaces"}
​
    name: str
    site_name: Optional[str]  # note that this attribute is NOT included in _attributes
    role: Optional[str]  # note that this attribute is NOT included in _attributes
    interfaces: List = list()
​
​
class SystemADevice(Device):

    system_A_unique_field: Optional[str] = None
    
    @classmethod
    def create(cls, diffsync, ids, attrs):
        """Talk to SystemA to create device"""
        passclass SystemBDevice(Device):
    
    system_B_unique_field: Optional[str] = None

    @classmethod
    def create(cls, diffsync, ids, attrs):
        """Talk to SystemB to create device"""
        pass

Use Case

This should help newbies (like myself) to get a better idea of how to architect a diffsync-based integration.

Migrate CI to Github Action

Environment

  • DiffSync version: latest

Proposed Functionality

Migrate CI to Github Action and deprecate Travis

Use Case

Align with other NTC projects

Unclear Exception Message

Environment

  • DiffSync version: 1.3.0
  • Python version 3.7

Observed Behavior

When attempting a diff between two SoTs using the SSoT plugin I get the following Exception error:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/diffsync/__init__.py", line 543, in diff_from
    return differ.calculate_diffs()
  File "/usr/local/lib/python3.7/site-packages/diffsync/helpers.py", line 92, in calculate_diffs
    self.diff.add(diff_element)
  File "/usr/local/lib/python3.7/site-packages/diffsync/diff.py", line 58, in add
    raise ObjectAlreadyExists(f"Already storing a {element.type} named {element.name}")
diffsync.exceptions.ObjectAlreadyExists: Already storing a port named 0/0

This error message is unhelpful as I can't determine from the error message what's the parent context the Object is in relation to.

Expected Behavior

I would expect more information to be provided in the error message denoting the parent context of the Object that already exists so further investigation of the issue can be done. The current message is unhelpful as I have no idea which device the 0/0 port is on that it's referencing.

Steps to Reproduce

  1. Create an application that utilizes DiffSync.
  2. Have duplicate items in diff to cause Exception.
  3. Attempt diff and expect Exception to be thrown.

Update `sync_from` and `sync_to` to return the diff and the status of the sync

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Currently the functions sync_to/from are not returning anything, whether the sync was completed or not.
It would be useful to return at least the status of the sync and eventually the diff that was generated by the function.

Use Case

Developer experience

DiffElement `has_diffs()` method not returning booleans as expected

Environment

  • DiffSync version: 1.0.0
  • Python version 3.7.2

Observed Behavior

When a .has_diffs() call is made to a diffsync.diff.DiffElement object, the object returns False when children do, indeed, have diffs. This occurs whether or not the include_children=True argument is passed into the method call.

Expected Behavior

I expected .has_diffs() to evaluate to True in the case that diffs do not exist in the parent object but do exist in one of the children, and for the method to also evaluate to True when include_children=True is passed in as an argument and diffs exist in the children

Steps to Reproduce

  1. Create a DiffSyncModel child class for provider with attributes slug, name, and site. Add the slug to the _identifiers tuple, name to the _attributes tuple, and a dictionary of {'site': 'sites'} to the _children attribute.
  2. Create a DiffSyncModel child class for site with the attribute name and slug
  3. Create a DiffSync child class to represent a netbox backend. Define a load() method to load regions and sites from netbox, ensuring sites are added as children of regions using the add_child method. Make the top_level attribute regions
  4. Create a DiffSync child class to represent a YAML backend. Define a load() method to load regions and sites from YAML files, ensuring sites are added as children of regions using the add_child method. Make the top_level attribute regions
  5. Modify the data such that a diff exists between the YAML data and the Netbox data for a site within a given region, but data for the the region does not differ.
  6. Define a sync_from() method on the YAML backend taking one argument of source and define a break point (or a pdb.set_trace() inside the method.
  7. Write code that calls sync_from() on the YAML backend, passing in the instantiated netbox backend as the source argument.
  8. Execute the code. When you get a pdb shell, run the following commands:
diff = self.diff_from(source)
regions = [region for region in diff.get_children]
regions[0].has_diffs()
regions[0].has_diffs(include_children=True)

Observe that the region who's children has diffs does not itself show any diffs, regardless of whether include_children is passed in as an argument or not.

[Question] How to synchronise between different database engines?

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Both Keepass and Bitwarden manage "credentials" but have a different schema and a way to access.
The examples 1 & 2 of diffsync describe the use of datasets with identical "schemas" and the same access.
The README.md confuses me (I don't seem to find what I look for).

  1. Is diffsync the right module to try to synchronise these databases keepass and bitwarden?

  2. Would this the way the creator of this module envisioned synchronisation:
    A sub class of DiffSyncModel. E.g. CredModel so that DiffSync can compare the important elements in a generic way.
    This could than be the base class for a KeepassCredModel and BitwardenCredModel.
    These 2 latter classes each have their specific create/update/delete class methods,
    coping with the specific schema and access method.
    There would be 2 dataset classes: KeepassDataset and BitwardenDataset both inheriting from DiffSync.

Diff should be able to provide a summary (# create/update/delete)

Environment

  • DiffSync version: 1.0.0

Proposed Functionality

A Diff object should be able to provide a summary of its contents (i.e., the number of objects that would be created/updated/deleted if this diff were used for a synchronization between systems).

Use Case

Logging, usability.

Add name attribute to DSync object

Environment

  • DSync version: master

Proposed Functionality

Add a mandatory name attribute to each DSync object and pass this name to the Diff object

Use Case

Currently during a diff we are missing a user friendly identifier to indicate what we are comparing and where some objects are missing etc..
Right now the diff is using some generic SOURCE and DEST identifier but it's not always clear which one is SOURCE and which one is DEST.
With a name clearly defined for each object, it will be easier to identify where a given data is coming from etc ...

DiffSyncModel create/update/delete APIs have no logging context

Environment

  • DiffSync version: 1.2.0

Proposed Functionality

Extend the DiffSyncModel create, update, and delete APIs with an additional logger parameter, or provide a public log API on the already-included diffsync instance.

Use Case

Currently a DiffSyncModel implementation must construct its own logging context from scratch and lacks access to the context of any surrounding sync operation.

User Defined Diff Class is not used when creating child_diff

Environment

  • DSync version: 1.0.0
  • Python version 3.7.7

Observed Behavior

When generating a diff or a sync with a custom diff_class, the main class is instantiated with the proper class but the children of this class are still instantiated using the default Diff class

Expected Behavior

When a custom diff_class is provided, the main class an all its children should be created using the custom diff_class

Steps to Reproduce

  1. Generate a diff from 2 DSync objects with multiple nested models to ensure the top level DiffElement have a child_diff defined
  2. Check the type of child_diff

I believe the issue is line 155 in the diff.py file
https://github.com/networktocode/dsync/blob/master/dsync/diff.py#L155

Add write() method to DiffSync class

Environment

  • DiffSync version: 1.0.0

Currently, to add bulk write operations to a DiffSync subclass object, one has to override the "sync_from" function on super(). This is in contrast to each individual DiffSyncModel's create(), update(), and delete() methods which are more idempotent in manner, and called by the "sync_from()" method on super.

Proposed Functionality

It may be beneficial to be able to leave the downstream create(), update(), and delete() DiffSyncModel methods which are called by "sync_from" by default unimplemented and add a write() method to the DiffSync class. This would provide a framework for bulk write operations, without blowing away the logic implemented on "sync_from()"

Use Case

class BackendYAML(DiffSync):
def write(self, source):
"""Bulk write operation to dump data to disk from another backend
Called automatically by super().sync_from()

    Args:
        source (DiffSync): DiffSync object from which data is being synchronized
    """
    # Validate whether or not any changes need to be made to the circuits/providers files
    self._write_providers_from(source)
    self._write_circuits_from(source)

Configurable preserve-unmatched-records vs delete-unmatched-records

Currently, if there are records in the target system that aren't in the source system, DSync will delete these unmatched records from the target system when performing a sync. In some scenarios, it may be desirable to instead preserve these records without modification. This should be a configurable option.

This is needed for the SOT Sync project.

Fix build pipeline for Read The Doc

Environment

  • DiffSync version: 1.4.1
  • Python version 3.7

Observed Behavior

The build pipeline for Read The Doc appears to be broken at the moment
The PR #95 didn't help for sure but I think it was broken even before that.

Looking at other projects, it doesn't look like we have a clear pattern in place but I think we should replicate that we have in netutils with a dedicated requirements.txt file, just for RTD
https://github.com/networktocode/netutils/blob/develop/docs/requirements.txt

Expected Behavior

A new version of the documentation should be build and published to RTD when we have a new commit in main

Empty DiffSync instance evaluates as false

Environment

  • DiffSync version: 1.3.0
  • Python version: 3.6

Observed Behavior

When using diffsync within Nautobot-plugin-ssot, it would be nice to allow the Destination (Target) to have no data.

An empty DiffSync() evaluates to False. This causes a failure to proceed to the Diff.

the base DiffSync class has a len() method, an object’s default bool() casting must use that to determine whether it evaluates as truthy:

diffsync = DiffSync()
bool(diffsync)
False
diffsync.add(DiffSyncModel())
bool(diffsync)
True

Expected Behavior

diffsync proceeds to diff and then syncs data to the Destination (Target).

Get or create function

Environment

  • DiffSync version: 1.1.0

Ability to either get or create an object like the Django ORM

Function that takes the identifiers of an object and either gets or create an object.
Current work around is wrap an object get in an exception of ObjectNotFound.

Use Case

Currently using try and except to catch if an object already exists.

            try:
                self.add(vrf)
            except ObjectAlreadyExists:
                pass

Expected use

self.get_or_create(vrf)
self.update_or_create(vrf)

Add Apache 2.0 License

Environment

  • DSync version: master

Proposed Functionality

Before making this repo public we need to add a license at the root of the repo and at the top of each file.

I've been using this one for the onboarding plugin

Copyright 2020 Network to Code <[email protected]>
Network to Code, LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Dsync name is not available on Pypi

Environment

  • DSync version: 1.0.0

Proposed Functionality

dsync is not avaialable as a package name in pypi, so we need to give the package another name, We could use:

  • diff-sync
  • diffsync
  • ntc-dsync
  • ??

My preference would be to use diff-sync for the package and keep the name in python to dsync.

Create a new release on Pypi to include new changes

Hi,

The last release 1.3.0 is from 30 April 2021. Since then, many new features were added (such as the get_or_instantiate() method).

It would be nice to make a new release to include these changes. This would also mean updating the CHANGELOG.md file.

Thanks.

Add support for Unsorted List

Environment

  • DSync version: master

Proposed Functionality

Currently when an attribute is defined as a list, DSync will report a diff if the lists have the same content but in a different order.
In some cases that's the expected behavior but in other cases the order doesn't matter and it's hard to predict how things will be loaded on both adapters.
It would be great to be able to explicitly define if a list should be sorted or not when we are calculating the diff.

Use Case

In some cases it's hard to predict how a list will be loaded which can lead to false positive when we are generating the diff. A possible workaround is to ensure that, as we construct the list, the content is always ordered but it adds some complexity in the adapter.

Documentation on readthedocs

Environment

  • DiffSync version: 1.0.0

Proposed Functionality

Documentation generation and publication to readthedocs.org, including examples

Use Case

Discoverability, usability.

Create an IGNORE_CASE flag to prevent case-sensitive mismatches

Environment

  • DiffSync version: 1.4.1

Proposed Functionality

Implement either a global or model flag (or both) called IGNORE_CASE, that will tell DiffSync to ignore case-sensitive mismatches.

Example for Global Flags:

from diffsync.enum import DiffSyncFlags
flags = DiffSyncFlags.IGNORE_CASE
diff = nautobot.diff_from(local, flags=flags)

Example for Model Flags:

from diffsync import DiffSync
from diffsync.enum import DiffSyncModelFlags
from model import MyDeviceModel

class MyAdapter(DiffSync):

    device = MyDeviceModel

    def load(self, data):
        """Load all devices into the adapter and add the flag IGNORE to all firewall devices."""
        for device in data.get("devices"):
            obj = self.device(name=device["name"])
            if "firewall" in device["name"]:
                obj.model_flags = DiffSyncModelFlags.IGNORE_CASE
            self.add(obj)

Use Case

Currently, if we are trying to sync the same object from different backends that have the same name but without the same case (i.e.: "my-device" & "My-Device"), they will be marked as different, thus deleting the first device to replace it with the new one.

Below is an example to show the current limitations of not having such flag. As you can see from the DATA_BACKEND_A and DATA_BACKEND_B variables, the values are the same, but the first is in all caps, whereas the second is all lowercase.

from diffsync.logging import enable_console_logging
from diffsync import DiffSync
from diffsync import DiffSyncModel


class Site(DiffSyncModel):
    _modelname = "site"
    _identifiers = ("name",)

    name: str

    @classmethod
    def create(cls, diffsync, ids, attrs):
        print(f"Create {cls._modelname}")
        return super().create(ids=ids, diffsync=diffsync, attrs=attrs)

    def update(self, attrs):
        print(f"Update {self._modelname}")
        return super().update(attrs)

    def delete(self):
        print(f"Delete {self._modelname}")
        super().delete()
        return self


DATA_BACKEND_A = ["SITE-A"]
DATA_BACKEND_B = ["site-A"]


class BackendA(DiffSync):
    site = Site

    top_level = ["site"]

    def load(self):
        for site_name in DATA_BACKEND_A:
            site = self.site(name=site_name)
            self.add(site)


class BackendB(DiffSync):
    site = Site

    top_level = ["site"]

    def load(self):
        for site_name in DATA_BACKEND_B:
            site = self.site(name=site_name)
            self.add(site)


def main():
    enable_console_logging(verbosity=0)

    backend_a = BackendA(name="Backend-A")
    backend_a.load()

    backend_b = BackendB(name="Backend-B")
    backend_b.load()

    backend_a.sync_to(backend_b)


if __name__ == "__main__":
    main()

Upon executing this script, the output is:

Create site
Delete site

So we are replacing an object that could potentially be the same.

The implementation of this flag could help mitigate unexpected results when the user knows he might have case-insensitive data from both backends, and remove the need to use functions such as .lower() or .casefold() each time he creates a new object.

Enable Github Discussions for Project

(This didn't seem to fit either issue template so I'm not using one, sorry!)

I think enabling the Discussions feature on GitHub for this project would be beneficial to the project overall. I, for one, have ideas and questions regarding diffsync and the NTC Slack is just too ephemeral to hash them out meaningfully, and also don't make sense as "Issuess". Plus, those Slack discussions will be lost to future diffsync users who are going through the same discoveries.

Thanks for doing what you do!

Breaking API change in DiffSync 1.4.0

Environment

  • DiffSync version: 1.4.0
  • Python version: any

Observed Behavior

#90 introduced a breaking API change in that DiffElement.action changed from a string value to an enum value. This impacts projects such as network-importer (networktocode/network-importer#256) and any other project that is relying on the value of DiffElement.action, such as to implement custom Diff ordering based on the action.

In the short term it may be simplest just to revert the entirety of #90 and cut a new DiffSync release.

Expected Behavior

API to remain stable in minor and patch releases.

Steps to Reproduce

  1. See for example networktocode/network-importer#256

Detailed logging via configurable API

Any given DSync model should be able to generate log messages using a generic API, without knowing or caring whether these logs are going to stdout, creating a set of NetBox database records, etc. This API needs to be configurable to specify its target.

This is needed for the SOT Sync project.

Rename master branch to main

Environment

  • DiffSync version: latest

Proposed Functionality

Rename master branch to main

Use Case

Align with other NTC repositories

Support current structlog major version.

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Currently the pyproject.toml file restricts structlog the major version to 20.
structlog = "^20.1.0"
The current major version is 21.

Use Case

For my project, there is another required package that has pinned a minimum structlog version of 21.0.0. When adding diffsync as a requirement to my project, pip will install a much older version of the other required package where the minimum structlog version was 20. This causes an issue with needed features not being available.

Support for Diff subclasses in diff generation

A DSync subclass or instance thereof should be able to specify a preferred Diff subclass and have its diff_from/diff_to APIs automatically return an instance of this subclass instead of the base Diff class.

This is needed for the SOT Sync project to allow the creation of Diff subclass instances that are serializable to the NetBox ORM.

Pydantic and structlog are not installed by default

Environment

  • DiffSync version: 1.4.0
  • Python version All

Observed Behavior

When installing DiffSync from pip, some mandatory dependencies like pydantic or structlog are not being installed automatically

Expected Behavior

All mandatory dependencies should be installed by default

ModuleNotFoundError: No module named 'packaging'

Environment

diffsync 1.4.1
Python 3.9.10

Observed Behaviour

......
  File "/Users/x/pwsync/.venv/lib/python3.9/site-packages/pwsync/sync.py", line 10, in <module>
    from diffsync.logging import enable_console_logging
  File "/Users/x/pwsync/.venv/lib/python3.9/site-packages/diffsync/logging.py", line 22, in <module>
    from packaging import version
ModuleNotFoundError: No module named 'packaging'

Expected Behaviour

No need to make my script depend on packaging, it should be handled as a transitive dep from the diffsync module.

Steps to Reproduce

Seems diffsync module is not specifying a dependency on packaging module.
It seems triggered by this code:

from diffsync.logging import enable_console_logging

Extend `sync_from` and `sync_to` to accept an existing diff

Environment

  • DiffSync version: 1.3.0

Proposed Functionality

Currently doing a diff_to/from followed by a sync_to/from, result in calculating the diff twice because sync_to/from are calculating a new diff automatically.
The proposal is to extend sync_from and sync_to to accept an existing diff

    self.log_info(message="Loading current data from Data Source...")
    diffsync1 = DataSourceDiffSync(job=self, sync=self.sync)
    diffsync1.load()

    self.log_info(message="Loading current data from Nautobot...")
    diffsync2 = NautobotDiffSync(job=self, sync=self.sync)
    diffsync2.load()

    diffsync_flags = DiffSyncFlags.CONTINUE_ON_FAILURE

    self.log_info(message="Calculating diffs...")
    diff = diffsync1.diff_to(diffsync_1, flags=diffsync_flags)

    if not self.kwargs["dry_run"]:
        self.log_info(message="Syncing from Data Source to Nautobot...")
        diffsync1.sync_to(diffsync2, flags=diffsync_flags, diff=diff). <<<<<<<<<<<
        self.log_info(message="Sync complete")

Use Case

Performance improvement, there is no need to calculate the diff twice on the same dataset

Flag object to be ignore during a diff

Environment

  • DSync version: master

Proposed Functionality

Add a flag per DSyncModel object to indicate that a specific object should be ignored during the diff/sync.

Use Case

We have a situation right now for the network-importer where the netbox adapter is getting some cables from netbox and for various reasons these cables should be ignore because they can't be touched ...
Currently we don't have a way to solve this situation, if the objects exist in the network adapter but are removed from the netbox adapter, they will be flag as MISSING and the sync will try to create them.
Adding a flag per object that indicate if the object should be ignore all together would solve this situation and I can imagine other use cases where it will be useful to explicitly ignore an object.

Control the order in which objects are created/updated/deleted during a sync

Environment

  • DSync version: master

Proposed Functionality

Allow users to control the order in which objects are created/updated/deleted during a sync.
This logic could be different per type of object (device, interface etc ..)

Use Case

In some cases the order in which objects are created on a remote system is important because one object can be dependent on another one.
For example, if we have a list of interfaces, with a lag interface and 2 lag members, we need to ensure that the lag interface
gets created first but deleted last.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.