cognitedata / cognite-replicator Goto Github PK
View Code? Open in Web Editor NEWA package of scripts for replicating data between CDF tenants
Home Page: https://cognite-cognite-replicator.readthedocs-hosted.com/
License: Apache License 2.0
A package of scripts for replicating data between CDF tenants
Home Page: https://cognite-cognite-replicator.readthedocs-hosted.com/
License: Apache License 2.0
We have discovered that when using Cognite Replicator, our source Project has dataset id's set, but in the target Project all dataset id's are null. Please add support for replicating datasets and setting dataset id's on all entities to the replicator.
The workaround we will be operating on in the meantime is to run a script afterward to copy the Datasets and set the dataset id's.
I have tried to use the datapoints.replicate function as shown below:
datapoints.replicate(
source_client,
target_client,
external_ids=["NO1_day_ahead_price_2022-07-13T08:58:28"],
start=datetime.datetime(2000,1,1,1).timestamp()*1000,
end=datetime.datetime(2040,1,1,1).timestamp()*1000,
)
Before running the function I ensured that a time-series with the given external id existed in both the source and target project. However, after running the replication, none of the datapoints from the time-series in the source project had been replicated to the target project. Moreover, datapoints.replicate() did not return any error and simply finished as normal. What can be causing this issue?
Note! I have only experienced the missing datapoints replication with a subset of time-series. The only link I can find between these time series is that they all have "Is Step" set to "True", and that they have datapoints with timestamps into the future.
@jorge-sanchez-2020
@gaetan-h
@petreeb
Hey, FYI, the replication.thread function is shared by TimeSeries, File, and Events. If you change the signature parameters here, you NEED to change it in timeseries.py, files.py, and events.py
It was only changed in timeseries.py. Files and Event replication is currently broken.
5380a21
Cognite replicator is not compatible for cognite-sdk
>= 6.0. Main change of note is that datapoints now live under client.time_series.data
instead of client.datapoints
.
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
These updates are currently rate-limited. Click on a checkbox below to force their creation now.
These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
Dockerfile
.github/workflows/cd.yml
actions/checkout v3
docker/setup-buildx-action v2
docker/login-action v2
docker/build-push-action v3
.github/workflows/ci.yml
actions/checkout v3
actions/setup-python v4
cognitedata/lint-action v1.6.0
actions/checkout v3
actions/setup-python v4
codecov/codecov-action v3
.github/workflows/python-publish.yml
actions/checkout v3
actions/setup-python v4
pyproject.toml
poetry >=0.12
pyproject.toml
cognite-sdk ^7.13.8
google-cloud-logging ^1.12
python ^3.11
pyyaml ^6.0.1
protobuf ^4.0.0
black ^22.8
isort ^4.3
pre-commit ^1.18
pytest ^6.2.5
pytest-cov ^2.7.1
pytest-mock ^1.11.2
sphinx ^2.4.4
sphinx-rtd-theme ^0.4.3
toml ^0.10.0
tox ^3.14
tox-pyenv ^1.1
twine ^3.1.1
pytz *
This should not cause an error, check if it is possible to avoid the replicator to fail.
Try to run the replicator by having 2 identical timeseries to copy and see if the issue can be avoided.
It should be tested locally maybe first and then can be pushed to development and then production if all is ok.
Annotations are stored in Cognite as events of type cognite_annotation
. There are sometimes id values in metadata fields CDF_ANNOTATION_resource_id
and CDF_ANNOTATION_file_id
that we have seen. We depend on these id's being correct to show the list of files applicable to an asset or other entities. When these events are being replicated to the target project, the id's are not being updated with the target id.
Please add support for contextualization annotations in the replicator. Thanks!
pip install cognite-replicator
Collecting cognite-replicator
Downloading cognite_replicator-1.2.6-py3-none-any.whl (45 kB)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 45.9/45.9 kB 1.2 MB/s eta 0:00:00
Collecting cognite-sdk<6.0.0,>=5.4.4 (from cognite-replicator)
Downloading cognite_sdk-5.12.0-py3-none-any.whl (291 kB)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 291.7/291.7 kB 4.3 MB/s eta 0:00:00
Collecting google-cloud-logging<2.0,>=1.12 (from cognite-replicator)
Downloading google_cloud_logging-1.15.3-py2.py3-none-any.whl (141 kB)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 141.6/141.6 kB 5.8 MB/s eta 0:00:00
Requirement already satisfied: protobuf<5.0.0,>=4.0.0 in /Users/[email protected]/Library/Caches/pypoetry/virtualenvs/file-extractor-function-bVgch4Fw-py3.11/lib/python3.11/site-packages (from cognite-replicator) (4.24.3)
Collecting pyyaml<6.0.0,>=5.1.0 (from cognite-replicator)
Downloading PyYAML-5.4.1.tar.gz (175 kB)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 175.1/175.1 kB 4.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
ร Getting requirements to build wheel did not run successfully.
โ exit code: 1
โฐโ> [68 lines of output]
/private/var/folders/f_/n9ywg_t948gg6y4_j7dm3n400000gn/T/pip-build-env-eijdcqyg/overlay/lib/python3.11/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg````
Should compare current version to versions already in pypi and only push package if it does not already exist.
Dear SA,
Please see the screenshot for the issue. Datapoints get replicated in a small handful of timeseries (my estimate is 30 out of 300 roughly). I am trying to replicate datapoints from publicdata to my personal tenant in greenfield. Replication of assets and timeseries was successful, with proper linking between ts and assets.
What I have tried: playing with batch_size. It produces the same warning/error.
This could be a wider problem with our API endpoints.
We are trying to run this code with CDF function, but it is not running also not giving any error,
Any idea how to make it run through CDF function?
Relationships are currently not in the scope of the replicator. Some use cases have relationships, and it would be great to have the ability to replicate them as well.
Currently tests are only run against Py3.7. In the readme you say you support 3.6, so you should run your tests against that version aswell. In the sdk we solve this using tox, so you can take a look there.
Would be nice to get the documentation hosted on readthedocs along with the documentation for our other open-source python packages. https://cognite-docs.readthedocs-hosted.com/en/latest/
This is maybe not a bug, but we need to see what we can do about this
Replication fails for timeseries that existed in the tenant from before
For example, timeserie A exists already in the tenant, but does not have the metadata
_replicatedTime
_replicatedSource
_replicatedInternalId
If we now setup replication on the timeserie A, the replication will fail because of duplicate
Duplicated: [{'legacyName': 'xxxxxxxxx-LDB_P'}, {'legacyName': 'xxxxxxxX.Value'}, {'legacyName': 'OilSample_xxxxB_K'}
I see that time series/events are split into batches of 10,000 and posted in parallell. The SDK already does this, so it shouldn't be necessary to do it here as well.
Also, the batch_size passed to replicate()
is not respected. A dynamic batch size is calculated based on the number of threads allocated.
events and time series are missing a few parameters (batch_size and num_threads)
Tasks needed to complete before we are ready to open source the repo.
The replication fails if one of the timeseries listed in the yaml file is not found in the source tenant
Replicate the issue:
Output:
2020-01-30 14:01:35,281 cognite-sdk DEBUG - HTTP Error 400 POST https://api.cognitedata.com/api/v1/projects/akerbp/timeseries/byids: timeseries ids not found: (id: null | externalId: VALI_23-PT-92532:X.Value)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/cognite/client/utils/_concurrency.py", line 127, in execute_tasks_concurrently
res = f.result()
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 111, in _post
"POST", url_path, json=json, headers=headers, params=params, timeout=self._config.timeout
File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 139, in _do_request
self._raise_API_error(res, payload=json_payload)
File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 650, in _raise_API_error
raise CogniteAPIError(msg, code, x_request_id, missing=missing, duplicated=duplicated, extra=extra)
cognite.client.exceptions.CogniteAPIError: timeseries ids not found: (id: null | externalId: VALI_23-PT-92532:X.Value) | code: 400 | X-Request-ID: ba8b983a-ab3c-92c8-927b-b507785bd232
Missing: [{'externalId': 'VALI_23-PT-92532:X.Value'}]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/site-packages/cognite/replicator/__main__.py", line 177, in <module>
main()
File "/usr/local/lib/python3.7/site-packages/cognite/replicator/__main__.py", line 146, in main
exclude_pattern=config.get("timeseries_exclude_pattern"),
File "/usr/local/lib/python3.7/site-packages/cognite/replicator/time_series.py", line 161, in replicate
ts_src = client_src.time_series.retrieve_multiple(external_ids=target_external_ids)
File "/usr/local/lib/python3.7/site-packages/cognite/client/_api/time_series.py", line 142, in retrieve_multiple
ids=ids, external_ids=external_ids, ignore_unknown_ids=ignore_unknown_ids, wrap_ids=True
File "/usr/local/lib/python3.7/site-packages/cognite/client/_api_client.py", line 259, in _retrieve_multiple
utils._concurrency.collect_exc_info_and_raise(tasks_summary.exceptions)
File "/usr/local/lib/python3.7/site-packages/cognite/client/utils/_concurrency.py", line 102, in collect_exc_info_and_raise
) from missing_exc
cognite.client.exceptions.CogniteNotFoundError: Not found: [{'externalId': 'VALI_23-PT-92532:X.Value'}]
Build fails sometimes even when relevant code is unchanged. Can be adhoc fixed by wiping build environment and then rebuilding. https://readthedocs.com/projects/cognite-cognite-replicator/builds/
Attempting to copy events will error with the following error:
TypeError: copy_events() takes 6 positional arguments but 7 were given self._target(*self._args, **self._kwargs) self._target(*self._args, **self._kwargs) self.run() File "/Users/viet/.pyenv/versions/3.8.3/lib/python3.8/threading.py", line 870, in run
Environment:
[tool.poetry.dependencies] python = "^3.8" cognite-replicator = "^0.8.1"
When creating new assets if it times out then the retry fails due to duplicate external ids since some of the assets were created before timing out.
datapoints replication fails when there is no start date and no datapoint already in the timeseries
The code currently takes either the time of the latest datapoint of the time series if there is one. If not it takes the datapoint start parameter.
If both are missing => bug
to have the ability for the replicator not to overwrite metadata when timeseries are created
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.