ocha-dap / hdx-python-api Goto Github PK
View Code? Open in Web Editor NEWPython API for interacting with the HDX Data Portal
Home Page: http://data.humdata.org
License: MIT License
Python API for interacting with the HDX Data Portal
Home Page: http://data.humdata.org
License: MIT License
Hello!
Thank you for developing the package, it aids quite well in uploading data to HDX.
Regularly, on a weekly basis, I upload around 200 datasets to HDX. However, every week I encounter the same error, and I don't know exactly where it comes from unless there are rate limits.
The error is this:
hdx.data.hdxobject.HDXError: Failed when trying to read: id=e1df3a45-5052-4ef0-bc68-12d887286d35! (POST)
(the id number varies)
This comes at different moments in the script, just before a country is processed by get_resources(). This has been an issue for some weeks (even months), and I have started to realize it happens after a specific number of countries are uploaded. Giving me a stronger impression of rate limits being the culprit. Furthermore, usually waiting for a minute and then re-running the script solves the problem.
Yet, I couldn't find any documentation about rate limits. Hence my question, are there rate limits?
Hi HDX Team!
I have a few general questions please about fields returned by this amazing API ...
There are lots of fields in the dataset and resource records returned by the API. I've looked through the documentation but I cannot seem to find definitions of the fields on these records. For example, if I search for 'total_res_downloads' I can't seem to find anything. There are lots of super useful fields we could perhaps use, but I wondered if these fields might have definitions somewhere I could review?
For download and page view metrics, I see fields on the 'dataset' record, but not on the 'resource' (file) records. It would be great to know which resources are being downloaded as an indication of their popularity. Sorry if I've asked this before, but is it possible to identify via the API any metrics on how often a resources was downloaded in, say, the last 14 days?
I would like to identify recently changed data on HDX. I see last_modified on the resource and dataset records, will this field change on dataset if one of its resources is updated?
Thanks so much.
Having worked with this library a bit more, it made more sense to remove the fetching and validation of locations from the main Configuration object. This could be replaced by a separate import e.g.
from hdx.locations import get_valid_locations
locations = get_valid_locations(configuration) #makes a network call to appropriate HDX endpoint
if some_code not in locations:
This way the control over how network requests is made, or how the list of locations is cached, is the responsibility of the library user. If the locations is a plain python List this means it can also be replaced by a test object without patching or deep mocks
demo-data.humdata.org seems like it was changed recently, I'm getting this error:
HDXError: Failed when trying to read: id={uid}! (POST) - ['https://demo-data.humdata.org/api/action/related_list', 400, u'"Bad request - Action name not known: related_list"']
Hi There,
Thanks so much for this amazing library!
Today I upgraded to the latest version ...
%pip install hdx-python-api==6.0.8
But when I try and import ...
from hdx.api.configuration import Configuration
I get ...
ImportError: cannot import name 'deprecated' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)
I am on Python 3.10. Below I provide the full error and list of the Python packages.
I've tried downgrading typing_extensions, pydantic, but no luck.
I suspect this is likely something with pip and/or the Databricks Python env I'm using, but any tips appreciated.
Thanks so much!
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
File <command-2615425375951413>:2
1 from hdx.utilities.easy_logging import setup_logging
----> 2 from hdx.api.configuration import Configuration
3 from hdx.data.dataset import Dataset
5 from pyspark.sql.functions import col, udf
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/hdx/api/configuration.py:13
10 import requests
12 from . import __version__
---> 13 from hdx.utilities.dictandlist import merge_two_dictionaries
14 from hdx.utilities.email import Email
15 from hdx.utilities.loader import load_json, load_yaml
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/hdx/utilities/dictandlist.py:7
4 from collections import UserDict
5 from typing import Any, Callable, Dict, List, Optional, Union
----> 7 from .frictionless_wrapper import get_frictionless_tableresource
8 from .typehint import ListDict, ListTuple, ListTupleDict
11 def merge_two_dictionaries(
12 a: Dict, b: Dict, merge_lists: bool = False
13 ) -> Dict:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/hdx/utilities/frictionless_wrapper.py:5
2 from typing import Any, Optional, Tuple
4 import requests
----> 5 from frictionless import (
6 Control,
7 Detector,
8 Dialect,
9 FrictionlessException,
10 system,
11 )
12 from frictionless.errors import ResourceError
13 from frictionless.formats import CsvControl, ExcelControl, JsonControl
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/__init__.py:1
----> 1 from .actions import convert, describe, extract, index, list, transform, validate
2 from .analyzer import Analyzer
3 from .catalog import Catalog, Dataset
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/actions/__init__.py:1
----> 1 from .convert import convert
2 from .describe import describe
3 from .extract import extract
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/actions/convert.py:7
5 from ..exception import FrictionlessException
6 from ..platform import platform
----> 7 from ..resource import Resource
9 if TYPE_CHECKING:
10 from ..dialect import Dialect
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/resource/__init__.py:1
----> 1 from .resource import Resource
2 from .types import *
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/resource/resource.py:10
7 import attrs
8 from typing_extensions import Self
---> 10 from .. import errors, fields, helpers, settings
11 from ..detector import Detector
12 from ..dialect import Control, Dialect
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/fields/__init__.py:1
----> 1 from .any import AnyField
2 from .array import ArrayField
3 from .boolean import BooleanField
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/fields/any.py:5
1 from __future__ import annotations
3 import attrs
----> 5 from ..schema import Field
8 @attrs.define(kw_only=True, repr=False)
9 class AnyField(Field):
10 type = "any"
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/schema/__init__.py:1
----> 1 from .field import Field
2 from .schema import Schema
3 from .types import *
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/schema/field.py:13
11 from ..exception import FrictionlessException
12 from ..metadata import Metadata
---> 13 from ..system import system
15 if TYPE_CHECKING:
16 from ..types import IDescriptor
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/system/__init__.py:1
----> 1 from .adapter import Adapter
2 from .loader import Loader
3 from .mapper import Mapper
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/system/adapter.py:5
1 from __future__ import annotations
3 from typing import TYPE_CHECKING, Any
----> 5 from .. import models
7 if TYPE_CHECKING:
8 from ..catalog import Catalog
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/models.py:3
1 from typing import Any, Dict, Optional
----> 3 from pydantic import BaseModel
6 class PublishResult(BaseModel):
7 url: Optional[str] = None
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/__init__.py:13
3 import pydantic_core
4 from pydantic_core.core_schema import (
5 FieldSerializationInfo,
6 FieldValidationInfo,
(...)
10 ValidatorFunctionWrapHandler,
11 )
---> 13 from . import dataclasses
14 from ._internal._annotated_handlers import (
15 GetCoreSchemaHandler as GetCoreSchemaHandler,
16 )
17 from ._internal._annotated_handlers import (
18 GetJsonSchemaHandler as GetJsonSchemaHandler,
19 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/dataclasses.py:11
7 from typing import TYPE_CHECKING, Any, Callable, Generic, NoReturn, TypeVar, overload
9 from typing_extensions import Literal, dataclass_transform
---> 11 from ._internal import _config, _decorators, _typing_extra
12 from ._internal import _dataclasses as _pydantic_dataclasses
13 from ._migration import getattr_migration
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/_internal/_config.py:9
6 from pydantic_core import core_schema
7 from typing_extensions import Literal, Self
----> 9 from ..config import ConfigDict, ExtraValues, JsonEncoder, JsonSchemaExtraCallable
10 from ..errors import PydanticUserError
11 from ..warnings import PydanticDeprecatedSince20
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/config.py:9
6 from typing_extensions import Literal, TypeAlias, TypedDict
8 from ._migration import getattr_migration
----> 9 from .deprecated.config import BaseConfig
10 from .deprecated.config import Extra as _Extra
12 if TYPE_CHECKING:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/deprecated/config.py:6
3 import warnings
4 from typing import TYPE_CHECKING, Any
----> 6 from typing_extensions import Literal, deprecated
8 from .._internal import _config
9 from ..warnings import PydanticDeprecatedSince20
ImportError: cannot import name 'deprecated' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)
Here are the libraries I have installed ...
pyphonetics
0.5.3
pyright
1.1.294
pyrsistent
0.18.0
pytest
7.4.0
python-apt
2.4.0+ubuntu1
python-dateutil
2.8.2
python-io-wrapper
0.3.1
python-lsp-jsonrpc
1.0.0
python-lsp-server
1.7.1
python-slugify
8.0.1
pytoolconfig
1.2.2
pytz
2022.1
PyYAML
6.0.1
pyzmq
23.2.0
quantulum3
0.9.0
ratelimit
2.2.1
requests
2.28.1
requests-file
1.5.1
rfc3986
2.0.0
rich
13.5.2
rope
1.7.0
ruamel.yaml
0.17.32
ruamel.yaml.clib
0.2.7
s3transfer
0.6.0
scikit-learn
1.1.1
scipy
1.9.1
seaborn
0.11.2
SecretStorage
3.3.1
Send2Trash
1.8.0
setuptools
63.4.1
shellingham
1.5.0.post1
simpleeval
0.9.13
six
1.16.0
soupsieve
2.3.1
sphinxcontrib-napoleon
0.7
ssh-import-id
5.11
stack-data
0.6.2
statsmodels
0.13.2
stringcase
1.2.0
structlog
23.1.0
tableschema-to-template
0.0.13
tabulate
0.9.0
tenacity
8.1.0
terminado
0.13.1
testpath
0.6.0
text-unidecode
1.3
threadpoolctl
2.2.0
tokenize-rt
4.2.1
tomli
2.0.1
tornado
6.1
traitlets
5.1.1
typer
0.9.0
typing_extensions
4.7.1
ujson
5.4.0
unattended-upgrades
0.1
Unidecode
1.3.6
urllib3
1.26.11
validators
0.21.2
virtualenv
20.16.3
wadllib
1.3.6
wcwidth
0.2.5
webencodings
0.5.1
whatthepatch
1.0.2
wheel
0.37.1
widgetsnbextension
3.6.1
xlrd
2.0.1
xlrd3
1.1.0
XlsxWriter
3.1.2
xlwt
1.3.0
yapf
0.31.0
zipp
1.0.0
demo = RemoteCKAN('http://10.11.35.55:5050/', apikey='31ccccccc', user_agent='admin')
configuration = Configuration.create(hdx_site='prod',hdx_key="31cccccc",user_agent='admin',remoteckan=demo)
organization = Organization(configuration=configuration)
organization['name'] = 'cc'
organization['title'] = 'cc'
organization['description'] = 'cc'
organization['id'] = 'cc'
organization['test'] = 'jfccjfjf'
organization.create_in_hdx()
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/hdx/data/organization.py", line 117, in create_in_hdx
self._create_in_hdx('organization', 'id', 'name')
File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 320, in _create_in_hdx
self.check_required_fields()
File "/usr/local/lib/python2.7/dist-packages/hdx/data/organization.py", line 99, in check_required_fields
self._check_required_fields('organization', ignore_fields)
File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 204, in _check_required_fields
for field in self.configuration[object_type]['required_fields']:
TypeError: string indices must be integers, not str
Below is the code snippet that produces the behavior (targets a dataset on feature). Note that the format mapping dictionary has entries that will change "badformat" to "goodformat" and vice versa, so you don't have to manually edit the test dataset manually each time.
# isolated snippet that tests ability to identify a change and make it in ckan.
# currently it executes, but the change is not recorded in ckan
format_dict = pd.read_csv('https://docs.google.com/spreadsheets/d/1-fvtlOBQF9xZ-X8yttfRaYTFhlWrwR6AQatkPEKClck/export?format=csv')
testdataset = 'cjtest200915'
test = Dataset.read_from_hdx(testdataset)
testres = test.get_resources()
print("number of resources = "+str(len(testres)))
old_type = testres[0].get_file_type()
print ("old resource format = "+old_type)
old_type_lower_case = testres[0].get_file_type().lower()
print ("lowercased = "+old_type_lower_case)
new_type = format_dict.loc[format_dict['existing_format'] == old_type_lower_case,'approved_format'].iloc[0]
print ("new resource format (from mapping doc) = "+new_type)
#update with hdx python
testres[0].set_file_type(new_type)
print("resource format after setting = "+testres[0].get_file_type())
#this approach is the one that fails to update the ressource format
test.update_in_hdx(update_resources=True,hxl_update=False,ignore_check=True,operation='patch')
#this approach succeeds
#testres[0].update_in_hdx(operation='patch',ignore_check=True)
test = Dataset.read_from_hdx(testdataset)
testres = test.get_resources()
old_type = testres[0].get_file_type()
print ("resource format after updating = "+test.get_resources()[0].get_file_type())
Hi!
I have some code that has been running fine in various environments but now seems to return an error (please see below). It was working fine but seemed to have stopped.
Is there something I'm doing wrong please?
Thanks!
from hdx.utilities.easy_logging import setup_logging
from hdx.api.configuration import Configuration
from hdx.data.dataset import Dataset
def setup_hdx_connection(agent_name):
try:
Configuration.create(hdx_site="prod", user_agent=agent_name, hdx_read_only=True)
except:
print("Configuration already created, continuing ...")
setup_hdx_connection(f"AgentName")
datasets = Dataset.search_in_hdx()
---------------------------------------------------------------------------
CKANAPIError Traceback (most recent call last)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/data/hdxobject.py:115, in HDXObject._read_from_hdx(self, object_type, value, fieldname, action, **kwargs)
114 try:
--> 115 result = self.configuration.call_remoteckan(action, data)
116 return True, result
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/api/configuration.py:374, in Configuration.call_remoteckan(self, *args, **kwargs)
373 kwargs["apikey"] = apikey
--> 374 return self.remoteckan().call_action(*args, **kwargs)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/ckanapi/remoteckan.py:97, in RemoteCKAN.call_action(self, action, data_dict, context, apikey, files, requests_kwargs)
96 status, response = self._request_fn(url, data, headers, files, requests_kwargs)
---> 97 return reverse_apicontroller_action(url, status, response)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/ckanapi/common.py:134, in reverse_apicontroller_action(url, status, response)
133 # don't recognize the error
--> 134 raise CKANAPIError(repr([url, status, response]))
CKANAPIError: ['https://data.humdata.org/api/action/package_search', 403, '<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body>\r\n<center><h1>403 Forbidden</h1></center>\r\n</body>\r\n</html>\r\n']
The above exception was the direct cause of the following exception:
HDXError Traceback (most recent call last)
File <command-2615425375951422>:2
1 print(f"Searching for ALL datasets in HDX to get datasets")
----> 2 datasets = Dataset.search_in_hdx()
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/data/dataset.py:1098, in Dataset.search_in_hdx(cls, query, configuration, page_size, **kwargs)
1096 rows = min(rows_left, page_size)
1097 kwargs["rows"] = rows
-> 1098 _, result = dataset._read_from_hdx(
1099 "dataset",
1100 query,
1101 "q",
1102 Dataset.actions()["search"],
1103 **kwargs,
1104 )
1105 datasets = list()
1106 if result:
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/data/hdxobject.py:120, in HDXObject._read_from_hdx(self, object_type, value, fieldname, action, **kwargs)
118 return False, f"{fieldname}={value}: not found!"
119 except Exception as e:
--> 120 raise HDXError(
121 f"Failed when trying to read: {fieldname}={value}! (POST)"
122 ) from e
HDXError: Failed when trying to read: q=*:*! (POST)
Note that when running on Desktop I get the above error, but I see this error above it, but in a browser, I can access https://data.humdata.org/api/action/package_search just fine ...
---------------------------------------------------------------------------
CKANAPIError Traceback (most recent call last)
File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/data/hdxobject.py:115](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/data/hdxobject.py:115), in HDXObject._read_from_hdx(self, object_type, value, fieldname, action, **kwargs)
114 try:
--> 115 result = self.configuration.call_remoteckan(action, data)
116 return True, result
File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/api/configuration.py:374](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/api/configuration.py:374), in Configuration.call_remoteckan(self, *args, **kwargs)
373 kwargs["apikey"] = apikey
--> 374 return self.remoteckan().call_action(*args, **kwargs)
File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/remoteckan.py:97](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/remoteckan.py:97), in RemoteCKAN.call_action(self, action, data_dict, context, apikey, files, requests_kwargs)
96 status, response = self._request_fn(url, data, headers, files, requests_kwargs)
---> 97 return reverse_apicontroller_action(url, status, response)
File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/common.py:134](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/common.py:134), in reverse_apicontroller_action(url, status, response)
133 # don't recognize the error
--> 134 raise CKANAPIError(repr([url, status, response]))
CKANAPIError: ['https://data.humdata.org/api/action/package_search', 403, '\r\n\r\n\r\n
403 Forbidden
\r\n\r\n\r\n']
I'd suggest adding python 2.7 support to enable use of this code by the wider humanitarian community.
It would be great to have 1 click
publishing of datasets to HDX from GeoNode systems (http://geonode.org/), which contain many datasets potentially useful on HDX. For instance, a popup that appears asking if you would like to publish to HDX.
GeoNode is currently written in Python 2.7 (https://github.com/geonode/geonode). GeoNode (as well as other systems) can only be upgraded to Python 3.0 once all their dependencies are upgraded, which could still be years away.
A standard python API that shields the complexity of CKAN's REST API is great! However, to seamlessly integrate HDX publishing into workflows, the code will need to run directly within business systems rather than as a separate script.
dataset.add_country_locations
and dataset.add_country_location
both allow countries to be added to a dataset on HDX but don't clear out the existing set of countries (even if dataset['groups'] = []
first).
Allow environment variables HDX_KEY_STAGE
Per Alex G, we can use the abbreviations from https://data.humdata.org/api/action/group_list?all_fields=true to specify countries. However, since this library is doing ISO3 validation using Geonames, countries / regions e.g. nepal_earthquake
will be dropped.
I'm integrating HDX to an application to publish datasets regularly, but I'd like to avoid pushing datasets to prod
site when other developers work on their local machines.
Is it possible? I tried with the following script:
from hdx.utilities.easy_logging import setup_logging
from hdx.hdx_configuration import Configuration
from hdx.data.dataset import Dataset
from hdx.data.resource import Resource
import datetime
setup_logging()
Configuration.create(hdx_site='stage', user_agent='A_Quick_Example', hdx_key='hdx key from my prod site')
dataset = Dataset({
'name': 'demo dataset',
'private': True,
'title': 'demo dataset title',
'notes': 'demo dataset notes',
'license_id': 'ODC-ODbL',
'methodology': 'Registry',
'data_update_frequency': 'Every day',
'dataset_date': datetime.datetime.now().strftime('%Y-%m-%d'),
'dataset_source': 'Angostura',
})
dataset.set_maintainer('196196be-6037-4488-8b71-d786adf4c081') # An user that updated the ucdp-data-for-australia dataset from https://stage.data-humdata-org.ahconu.org/
dataset.set_organization('hdx') # after digging for some organization from https://stage.data-humdata-org.ahconu.org/
dataset.add_country_location('VEN')
dataset.add_tag('americas')
resource = Resource({
'name': 'test',
'description': 'description',
'format': 'CSV'
})
resource.set_file_to_upload('sample.csv')
dataset.add_update_resource(resource)
dataset.create_in_hdx()
but that give me the following error:
Traceback (most recent call last):
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 316, in _write_to_hdx
return self.configuration.call_remoteckan(self.actions()[action], data, files=files)
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/hdx_configuration.py", line 307, in call_remoteckan
return self.remoteckan().call_action(*args, **kwargs)
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/ckanapi/remoteckan.py", line 87, in call_action
return reverse_apicontroller_action(url, status, response)
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/ckanapi/common.py", line 128, in reverse_apicontroller_action
raise NotAuthorized(err)
ckanapi.errors.NotAuthorized: {'message': 'Access denied: <function package_create at 0x7f5b257bbb90> requires an authenticated user', '__type': 'Authorization Error'}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/dataset.py", line 565, in create_in_hdx
self._save_to_hdx('create', 'name', force_active=True)
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 343, in _save_to_hdx
result = self._write_to_hdx(action, self.data, id_field_name, file_to_upload)
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 322, in _write_to_hdx
raisefrom(HDXError, 'Failed when trying to %s%s! (POST)' % (action, idstr), e)
File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/utilities/__init__.py", line 28, in raisefrom
six.raise_from(exc_type(message), exc)
File "<string>", line 3, in raise_from
hdx.data.hdxobject.HDXError: Failed when trying to create demo dataset! (POST)
This is working till now, suddenly I installed this on my same version OS and same version python2.7. Stop working
Collecting libhxl>=4.24.1 Downloading libhxl-4.24.1.tar.gz (91 kB) ERROR: Command errored out with exit status 1: command: /usr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9T48_N/libhxl/setup.py'"'"'; __file__='"'"'/tmp/pip-install-9T48_N/libhxl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-Avj4GK cwd: /tmp/pip-install-9T48_N/libhxl/ Complete output (5 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-9T48_N/libhxl/setup.py", line 7, in <module> raise RuntimeError("libhxl requires Python 3 or higher") RuntimeError: libhxl requires Python 3 or higher ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. WARNING: You are using pip version 20.3.3; however, version 20.3.4 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.
Our old application build on python2. Please help me to fix this. Thanks
File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 289, in _save_to_hdx
result = self._write_to_hdx(action, self.data, id_field_name, file_to_upload)
File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 271, in _write_to_hdx
six.raise_from(HDXError('Failed when trying to %s %s! (POST)' % (action, self.data[id_field_name])), e)
File "/usr/local/lib/python2.7/dist-packages/six.py", line 718, in raise_from
raise value
HDXError: Failed when trying to update 1d55221a-5fda-459c-9295-54185f392e81! (POST)
When invalid values are POSTed to the CKAN api, it fails this way - the actual CKAN exception has useful information about why the request failed, but they are squelched when caught and re-raised as HDXError
I also realized we should add any new dependencies, namely six
to the requirements in setup.py
Configurations are stored as global state on the Configuration
module, example:
Configuration.create(hdx_site='prod')
I would prefer if Configurations were individually instantiated as a "connection" object. This would make application code easier to unit test and more decoupled. A good example is the AWS Boto API e.g.
from boto.s3.connection import S3Connection
conn = S3Connection('', '')
Hi here !
I am facing some issues while instantiating configuration , I was using hdx-python-api 2.1.5 before
HDX_URL_PREFIX = Configuration.create(
hdx_site=os.getenv('HDX_SITE', 'demo'),
hdx_key=os.getenv('HDX_API_KEY'),
user_agent="my_user_agent_name"
)
Now I am trying to upgrade it to latest version of this library while doing so
I am. getting error like this
TypeError: __init__() got an unexpected keyword argument 'allowed_methods'
from 176 line of hdx>utilities >session.py> get_session
retries = Retry(
total=5,
backoff_factor=0.4,
status_forcelist=status_forcelist,
allowed_methods=allowed_methods,
raise_on_redirect=True,
raise_on_status=True,
)
when I disable this code block and avoid passing allowed_methods like this ,
retries = Retry(
total=5,
backoff_factor=0.4,
status_forcelist=status_forcelist,
# allowed_methods=allowed_methods,
raise_on_redirect=True,
raise_on_status=True,
)
works fine for me but looking for better suggestion if there are any . Feel free to suggest me different way / navigate me to docs where I can find about it .
Full Traceback :
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 648, in create
configuration=configuration, remoteckan=remoteckan, **kwargs
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 607, in _create
cls._configuration.setup_session_remoteckan(remoteckan, **kwargs)
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 472, in setup_session_remoteckan
full_agent=self.get_user_agent(), **kwargs
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 443, in create_session_user_agent
**kwargs,
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/utilities/session.py", line 176, in get_session
raise_on_status=True,
TypeError: __init__() got an unexpected keyword argument 'allowed_methods'
I have obtained hdx api key and I'm trying to implement programmatically pushing ckan dataset to HDX.
from hdx.utilities.easy_logging import setup_logging
from hdx.hdx_configuration import Configuration
from hdx.data.dataset import Dataset
conf = Configuration(
hdx_site='prod',
user_agent='admin',
hdx_key=hdx_api_key
)
dataset_class_object = Dataset(initial_data=data, configuration=conf)
resources = dataset_class_object.get_resources()
dataset_class_object.check_required_fields(['dataset_source', 'maintainer', 'dataset_date', 'data_update_frequency', 'groups', 'methodology'])
dataset_class_object.create_in_hdx()
ERROR [ckan.views.api] Field dataset_source is missing in dataset!
Traceback (most recent call last):
File "/home/ljupka/ckan_custom/lib/default/src/ckan/ckan/views/api.py", line 288, in action
result = function(context, request_data)
File "/home/ljupka/ckan_custom/lib/default/src/ckan/ckan/logic/init.py", line 464, in wrapped
result = _action(context, data_dict, **kw)
File "/home/ljupka/ckan_custom/lib/default/src/ckanext-custom/ckanext/custom/logic/action/get.py", line 1837, in push_dataset_to_hdx
dataset_class_object.create_in_hdx()
File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/dataset.py", line 512, in create_in_hdx
self.check_required_fields(allow_no_resources=allow_no_resources)
File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/dataset.py", line 352, in check_required_fields
self._check_required_fields('dataset', ignore_fields)
File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/hdxobject.py", line 208, in _check_required_fields
raise HDXError('Field %s is missing in %s!' % (field, object_type))
HDXError: Field dataset_source is missing in dataset!
The database file "ACLED Conflict Data for Africa 1997-2016" used in the usage example in the readme.md does not exist anymore and consequently fails the example code.
I was able to run the example at https://github.com/OCHA-DAP/hdx-python-api#a-quick-example by changing to a different dataset 'acled-conflict-data-for-africa-1997-lastyear'
currently they are sent as upper case
mark_data_updated is broken due to an error in dataset_update_filestore_resource in which timezone information was incorrectly added to the iso formatted string due to the following erroneous change to replace deprecated utcnow() function
datetime.utcnow().isoformat(timespec="microseconds") '2024-04-25T22:42:58.501073'
became:
datetime.now(timezone.utc).isoformat(timespec="microseconds") '2024-04-25T22:43:10.746691+00:00'
Hi HDX Team!
I am carrying out analysis on some datasets from Kenya, part of which requires I download tabular data. This works really well most of the time, but in calling ...
dataset = Dataset.read_from_hdx(row["id"])
resources = dataset.get_resources()
for resource in resources:
url, path = resource.download(dir)
We sometimes get ...
Traceback (most recent call last):
File "", line 744, in download_data
url, path = resource.download(dir)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-dbff6b46-9075-4f38-a1a6-07585d60da84/lib/python3.9/site-packages/hdx/data/resource.py", line 515, in download
path = downloader.download_file(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-dbff6b46-9075-4f38-a1a6-07585d60da84/lib/python3.9/site-packages/hdx/utilities/downloader.py", line 428, in download_file
return self.stream_path(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-dbff6b46-9075-4f38-a1a6-07585d60da84/lib/python3.9/site-packages/hdx/utilities/downloader.py", line 347, in stream_path
raise DownloadError(errormsg) from e
hdx.utilities.base_downloader.DownloadError: Download of https://data.humdata.org/dataset/db3e1a76-76d8-4206-9e5e-382336c51472/resource/562aad8a-2d68-4560-8c8b-237459958183/download/dtm_kenya_b2_baseleine_multi_sectoral_assessment_nov_2022.xlsx failed in retrieval of stream!
However, if I use the URL in a browser ...
It seems to work fine and I get a file.
I am using hdx-python-api==6.0.8 and Python 3.9.5.
Thanks a lot!
It would be better if this external network call was an explicit method on the Configuration. Configuration objects should be able to be used in environments without network access. It is also easy to make the mistake of calling Configuration.create many times without realizing that each call results in a network request, resulting in hundreds or more superfluous requests to the CKAN API.
This is somewhat related to #7 - if it was explicit that Configuration.create is only called once, and the resulting object was passed around, repeated calls to get_validlocations would be less of an issue
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.