Giter VIP home page Giter VIP logo

cognite-replicator's Issues

Dataset Awareness

We have discovered that when using Cognite Replicator, our source Project has dataset id's set, but in the target Project all dataset id's are null. Please add support for replicating datasets and setting dataset id's on all entities to the replicator.

The workaround we will be operating on in the meantime is to run a script afterward to copy the Datasets and set the dataset id's.

datapoints.replicate fails to replicate datapoints from certain time series

I have tried to use the datapoints.replicate function as shown below:

datapoints.replicate(
    source_client,
    target_client,
    external_ids=["NO1_day_ahead_price_2022-07-13T08:58:28"],
    start=datetime.datetime(2000,1,1,1).timestamp()*1000,
    end=datetime.datetime(2040,1,1,1).timestamp()*1000,
)

Before running the function I ensured that a time-series with the given external id existed in both the source and target project. However, after running the replication, none of the datapoints from the time-series in the source project had been replicated to the target project. Moreover, datapoints.replicate() did not return any error and simply finished as normal. What can be causing this issue?

Note! I have only experienced the missing datapoints replication with a subset of time-series. The only link I can find between these time series is that they all have "Is Step" set to "True", and that they have datapoints with timestamps into the future.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • Update dependency cognite-sdk to v7.54.11
  • Update codecov/codecov-action action to v4
  • Update dependency protobuf to v5
  • Update dependency pytest to v8
  • Update dependency pytest-cov to v5
  • Update dependency pytz to v2024
  • Update dependency sphinx to v7
  • Update dependency sphinx-rtd-theme to v2
  • Update dependency tox to v4
  • Update dependency twine to v5
  • Update docker/build-push-action action to v6
  • Update docker/login-action action to v3
  • Update docker/setup-buildx-action action to v3
  • ๐Ÿ” Create all rate-limited PRs at once ๐Ÿ”

Edited/Blocked

These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

dockerfile
Dockerfile
github-actions
.github/workflows/cd.yml
  • actions/checkout v3
  • docker/setup-buildx-action v2
  • docker/login-action v2
  • docker/build-push-action v3
.github/workflows/ci.yml
  • actions/checkout v3
  • actions/setup-python v4
  • cognitedata/lint-action v1.6.0
  • actions/checkout v3
  • actions/setup-python v4
  • codecov/codecov-action v3
.github/workflows/python-publish.yml
  • actions/checkout v3
  • actions/setup-python v4
pep621
pyproject.toml
  • poetry >=0.12
poetry
pyproject.toml
  • cognite-sdk ^7.13.8
  • google-cloud-logging ^1.12
  • python ^3.11
  • pyyaml ^6.0.1
  • protobuf ^4.0.0
  • black ^22.8
  • isort ^4.3
  • pre-commit ^1.18
  • pytest ^6.2.5
  • pytest-cov ^2.7.1
  • pytest-mock ^1.11.2
  • sphinx ^2.4.4
  • sphinx-rtd-theme ^0.4.3
  • toml ^0.10.0
  • tox ^3.14
  • tox-pyenv ^1.1
  • twine ^3.1.1
  • pytz *

  • Check this box to trigger a request for Renovate to run again on this repository

Continous deployment pipeline

  • Publish package to PyPI
  • Publish docker image to docker hub
  • Publish code coverage
  • Publish documentation on readthedocs

Add Support For Mapping of Annotations

Annotations are stored in Cognite as events of type cognite_annotation. There are sometimes id values in metadata fields CDF_ANNOTATION_resource_id and CDF_ANNOTATION_file_id that we have seen. We depend on these id's being correct to show the list of files applicable to an asset or other entities. When these events are being replicated to the target project, the id's are not being updated with the target id.

Please add support for contextualization annotations in the replicator. Thanks!

pip install fails

pip install cognite-replicator
Collecting cognite-replicator
  Downloading cognite_replicator-1.2.6-py3-none-any.whl (45 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 45.9/45.9 kB 1.2 MB/s eta 0:00:00
Collecting cognite-sdk<6.0.0,>=5.4.4 (from cognite-replicator)
  Downloading cognite_sdk-5.12.0-py3-none-any.whl (291 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 291.7/291.7 kB 4.3 MB/s eta 0:00:00
Collecting google-cloud-logging<2.0,>=1.12 (from cognite-replicator)
  Downloading google_cloud_logging-1.15.3-py2.py3-none-any.whl (141 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 141.6/141.6 kB 5.8 MB/s eta 0:00:00
Requirement already satisfied: protobuf<5.0.0,>=4.0.0 in /Users/[email protected]/Library/Caches/pypoetry/virtualenvs/file-extractor-function-bVgch4Fw-py3.11/lib/python3.11/site-packages (from cognite-replicator) (4.24.3)
Collecting pyyaml<6.0.0,>=5.1.0 (from cognite-replicator)
  Downloading PyYAML-5.4.1.tar.gz (175 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 175.1/175.1 kB 4.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  ร— Getting requirements to build wheel did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [68 lines of output]
      /private/var/folders/f_/n9ywg_t948gg6y4_j7dm3n400000gn/T/pip-build-env-eijdcqyg/overlay/lib/python3.11/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg````

Datapoints replication produces Connection Error

Dear SA,

Please see the screenshot for the issue. Datapoints get replicated in a small handful of timeseries (my estimate is 30 out of 300 roughly). I am trying to replicate datapoints from publicdata to my personal tenant in greenfield. Replication of assets and timeseries was successful, with proper linking between ts and assets.

What I have tried: playing with batch_size. It produces the same warning/error.

This could be a wider problem with our API endpoints.

Screen Shot 2020-07-30 at 10 46 51 AM

Not running with CDF function

We are trying to run this code with CDF function, but it is not running also not giving any error,
Any idea how to make it run through CDF function?

Add support for relationships

Relationships are currently not in the scope of the replicator. Some use cases have relationships, and it would be great to have the ability to replicate them as well.

Should test against supported python versions

Currently tests are only run against Py3.7. In the readme you say you support 3.6, so you should run your tests against that version aswell. In the sdk we solve this using tox, so you can take a look there.

Replication fails for timeseries that already exists

This is maybe not a bug, but we need to see what we can do about this

Replication fails for timeseries that existed in the tenant from before

For example, timeserie A exists already in the tenant, but does not have the metadata
_replicatedTime
_replicatedSource
_replicatedInternalId

If we now setup replication on the timeserie A, the replication will fail because of duplicate

Duplicated: [{'legacyName': 'xxxxxxxxx-LDB_P'}, {'legacyName': 'xxxxxxxX.Value'}, {'legacyName': 'OilSample_xxxxB_K'}

Unnecessary batching of create time series/events?

I see that time series/events are split into batches of 10,000 and posted in parallell. The SDK already does this, so it shouldn't be necessary to do it here as well.

Also, the batch_size passed to replicate() is not respected. A dynamic batch size is calculated based on the number of threads allocated.

Prepare repo for open source

Tasks needed to complete before we are ready to open source the repo.

  • Add CHANGELOG.md
  • Add CONTRIBUTING.md
  • Add Code of Conduct
  • Prepare to publish to PyPI in pyproject.toml
  • Setup githooks for black and isort
  • Setup CI pipeline for running unittests (or run them on githooks?)
  • Create CLI commands for running the replication
  • Fix logging

Replicating Events without Assets causes Exception

Cause: replication.py:get_asset_ids may return an empty list
Possible solutions:

  • Filter out events without asset ids
  • Create event, with assetIds = None
  • Replicate required assets

Screenshot 2019-09-18 at 14 53 21

Sample API interaction:
With empty assetIds:
Screenshot 2019-09-18 at 14 56 24

With no assetIds key:
Screenshot 2019-09-18 at 14 58 19

Replication fails if timeseries is not found

The replication fails if one of the timeseries listed in the yaml file is not found in the source tenant

Replicate the issue:

  • Add a time series that does not exist in the yml config file
  • Run replication

Output:

2020-01-30 14:01:35,281 cognite-sdk DEBUG - HTTP Error 400 POST https://api.cognitedata.com/api/v1/projects/akerbp/timeseries/byids: timeseries ids not found: (id: null | externalId: VALI_23-PT-92532:X.Value)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/cognite/client/utils/_concurrency.py", line 127, in execute_tasks_concurrently
    res = f.result()
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 111, in _post
    "POST", url_path, json=json, headers=headers, params=params, timeout=self._config.timeout
  File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 139, in _do_request
    self._raise_API_error(res, payload=json_payload)
  File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 650, in _raise_API_error
    raise CogniteAPIError(msg, code, x_request_id, missing=missing, duplicated=duplicated, extra=extra)
cognite.client.exceptions.CogniteAPIError: timeseries ids not found: (id: null | externalId: VALI_23-PT-92532:X.Value) | code: 400 | X-Request-ID: ba8b983a-ab3c-92c8-927b-b507785bd232
Missing: [{'externalId': 'VALI_23-PT-92532:X.Value'}]

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/site-packages/cognite/replicator/__main__.py", line 177, in <module>
    main()
  File "/usr/local/lib/python3.7/site-packages/cognite/replicator/__main__.py", line 146, in main
    exclude_pattern=config.get("timeseries_exclude_pattern"),
  File "/usr/local/lib/python3.7/site-packages/cognite/replicator/time_series.py", line 161, in replicate
    ts_src = client_src.time_series.retrieve_multiple(external_ids=target_external_ids)
  File "/usr/local/lib/python3.7/site-packages/cognite/client/_api/time_series.py", line 142, in retrieve_multiple
    ids=ids, external_ids=external_ids, ignore_unknown_ids=ignore_unknown_ids, wrap_ids=True
  File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 259, in _retrieve_multiple
    utils._concurrency.collect_exc_info_and_raise(tasks_summary.exceptions)
  File "/usr/local/lib/python3.7/site-packages/cognite/client/utils/_concurrency.py", line 102, in collect_exc_info_and_raise
    ) from missing_exc
cognite.client.exceptions.CogniteNotFoundError: Not found: [{'externalId': 'VALI_23-PT-92532:X.Value'}]

TypeError: copy_events() takes 6 positional arguments but 7 were given

Attempting to copy events will error with the following error:

TypeError: copy_events() takes 6 positional arguments but 7 were given self._target(*self._args, **self._kwargs) self._target(*self._args, **self._kwargs) self.run() File "/Users/viet/.pyenv/versions/3.8.3/lib/python3.8/threading.py", line 870, in run

Environment:
[tool.poetry.dependencies] python = "^3.8" cognite-replicator = "^0.8.1"

Bug when no datapoint start date

datapoints replication fails when there is no start date and no datapoint already in the timeseries
The code currently takes either the time of the latest datapoint of the time series if there is one. If not it takes the datapoint start parameter.
If both are missing => bug

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.