Giter VIP home page Giter VIP logo

hdx-python-api's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdx-python-api's Issues

Are there rate limits in the API?

Hello!

Thank you for developing the package, it aids quite well in uploading data to HDX.

Regularly, on a weekly basis, I upload around 200 datasets to HDX. However, every week I encounter the same error, and I don't know exactly where it comes from unless there are rate limits.

The error is this:
hdx.data.hdxobject.HDXError: Failed when trying to read: id=e1df3a45-5052-4ef0-bc68-12d887286d35! (POST) (the id number varies)

This comes at different moments in the script, just before a country is processed by get_resources(). This has been an issue for some weeks (even months), and I have started to realize it happens after a specific number of countries are uploaded. Giving me a stronger impression of rate limits being the culprit. Furthermore, usually waiting for a minute and then re-running the script solves the problem.

Yet, I couldn't find any documentation about rate limits. Hence my question, are there rate limits?

API field definitions

Hi HDX Team!

I have a few general questions please about fields returned by this amazing API ...

  1. There are lots of fields in the dataset and resource records returned by the API. I've looked through the documentation but I cannot seem to find definitions of the fields on these records. For example, if I search for 'total_res_downloads' I can't seem to find anything. There are lots of super useful fields we could perhaps use, but I wondered if these fields might have definitions somewhere I could review?

  2. For download and page view metrics, I see fields on the 'dataset' record, but not on the 'resource' (file) records. It would be great to know which resources are being downloaded as an indication of their popularity. Sorry if I've asked this before, but is it possible to identify via the API any metrics on how often a resources was downloaded in, say, the last 14 days?

  3. I would like to identify recently changed data on HDX. I see last_modified on the resource and dataset records, will this field change on dataset if one of its resources is updated?

Thanks so much.

move client slide location validation to a separate module

Having worked with this library a bit more, it made more sense to remove the fetching and validation of locations from the main Configuration object. This could be replaced by a separate import e.g.

from hdx.locations import get_valid_locations
locations = get_valid_locations(configuration) #makes a network call to appropriate HDX endpoint
if some_code not in locations:

This way the control over how network requests is made, or how the list of locations is cached, is the responsibility of the library user. If the locations is a plain python List this means it can also be replaced by a test object without patching or deep mocks

hdx-python-api 1.4.2 not compatible with CKAN 2.6?

demo-data.humdata.org seems like it was changed recently, I'm getting this error:

HDXError: Failed when trying to read: id={uid}! (POST) - ['https://demo-data.humdata.org/api/action/related_list', 400, u'"Bad request - Action name not known: related_list"']

Getting error ImportError: cannot import name 'deprecated' from 'typing_extensions' on importing hdx

Hi There,

Thanks so much for this amazing library!

Today I upgraded to the latest version ...

%pip install hdx-python-api==6.0.8

But when I try and import ...

from hdx.api.configuration import Configuration

I get ...

ImportError: cannot import name 'deprecated' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)

I am on Python 3.10. Below I provide the full error and list of the Python packages.

I've tried downgrading typing_extensions, pydantic, but no luck.

I suspect this is likely something with pip and/or the Databricks Python env I'm using, but any tips appreciated.

Thanks so much!

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File <command-2615425375951413>:2
      1 from hdx.utilities.easy_logging import setup_logging
----> 2 from hdx.api.configuration import Configuration
      3 from hdx.data.dataset import Dataset
      5 from pyspark.sql.functions import col, udf

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/hdx/api/configuration.py:13
     10 import requests
     12 from . import __version__
---> 13 from hdx.utilities.dictandlist import merge_two_dictionaries
     14 from hdx.utilities.email import Email
     15 from hdx.utilities.loader import load_json, load_yaml

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/hdx/utilities/dictandlist.py:7
      4 from collections import UserDict
      5 from typing import Any, Callable, Dict, List, Optional, Union
----> 7 from .frictionless_wrapper import get_frictionless_tableresource
      8 from .typehint import ListDict, ListTuple, ListTupleDict
     11 def merge_two_dictionaries(
     12     a: Dict, b: Dict, merge_lists: bool = False
     13 ) -> Dict:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/hdx/utilities/frictionless_wrapper.py:5
      2 from typing import Any, Optional, Tuple
      4 import requests
----> 5 from frictionless import (
      6     Control,
      7     Detector,
      8     Dialect,
      9     FrictionlessException,
     10     system,
     11 )
     12 from frictionless.errors import ResourceError
     13 from frictionless.formats import CsvControl, ExcelControl, JsonControl

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/__init__.py:1
----> 1 from .actions import convert, describe, extract, index, list, transform, validate
      2 from .analyzer import Analyzer
      3 from .catalog import Catalog, Dataset

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/actions/__init__.py:1
----> 1 from .convert import convert
      2 from .describe import describe
      3 from .extract import extract

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/actions/convert.py:7
      5 from ..exception import FrictionlessException
      6 from ..platform import platform
----> 7 from ..resource import Resource
      9 if TYPE_CHECKING:
     10     from ..dialect import Dialect

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/resource/__init__.py:1
----> 1 from .resource import Resource
      2 from .types import *

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/resource/resource.py:10
      7 import attrs
      8 from typing_extensions import Self
---> 10 from .. import errors, fields, helpers, settings
     11 from ..detector import Detector
     12 from ..dialect import Control, Dialect

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/fields/__init__.py:1
----> 1 from .any import AnyField
      2 from .array import ArrayField
      3 from .boolean import BooleanField

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/fields/any.py:5
      1 from __future__ import annotations
      3 import attrs
----> 5 from ..schema import Field
      8 @attrs.define(kw_only=True, repr=False)
      9 class AnyField(Field):
     10     type = "any"

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/schema/__init__.py:1
----> 1 from .field import Field
      2 from .schema import Schema
      3 from .types import *

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/schema/field.py:13
     11 from ..exception import FrictionlessException
     12 from ..metadata import Metadata
---> 13 from ..system import system
     15 if TYPE_CHECKING:
     16     from ..types import IDescriptor

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/system/__init__.py:1
----> 1 from .adapter import Adapter
      2 from .loader import Loader
      3 from .mapper import Mapper

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/system/adapter.py:5
      1 from __future__ import annotations
      3 from typing import TYPE_CHECKING, Any
----> 5 from .. import models
      7 if TYPE_CHECKING:
      8     from ..catalog import Catalog

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/frictionless/models.py:3
      1 from typing import Any, Dict, Optional
----> 3 from pydantic import BaseModel
      6 class PublishResult(BaseModel):
      7     url: Optional[str] = None

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/__init__.py:13
      3 import pydantic_core
      4 from pydantic_core.core_schema import (
      5     FieldSerializationInfo,
      6     FieldValidationInfo,
   (...)
     10     ValidatorFunctionWrapHandler,
     11 )
---> 13 from . import dataclasses
     14 from ._internal._annotated_handlers import (
     15     GetCoreSchemaHandler as GetCoreSchemaHandler,
     16 )
     17 from ._internal._annotated_handlers import (
     18     GetJsonSchemaHandler as GetJsonSchemaHandler,
     19 )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/dataclasses.py:11
      7 from typing import TYPE_CHECKING, Any, Callable, Generic, NoReturn, TypeVar, overload
      9 from typing_extensions import Literal, dataclass_transform
---> 11 from ._internal import _config, _decorators, _typing_extra
     12 from ._internal import _dataclasses as _pydantic_dataclasses
     13 from ._migration import getattr_migration

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/_internal/_config.py:9
      6 from pydantic_core import core_schema
      7 from typing_extensions import Literal, Self
----> 9 from ..config import ConfigDict, ExtraValues, JsonEncoder, JsonSchemaExtraCallable
     10 from ..errors import PydanticUserError
     11 from ..warnings import PydanticDeprecatedSince20

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/config.py:9
      6 from typing_extensions import Literal, TypeAlias, TypedDict
      8 from ._migration import getattr_migration
----> 9 from .deprecated.config import BaseConfig
     10 from .deprecated.config import Extra as _Extra
     12 if TYPE_CHECKING:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-224f76c5-41c2-44bc-9c0e-c5491c4d906f/lib/python3.10/site-packages/pydantic/deprecated/config.py:6
      3 import warnings
      4 from typing import TYPE_CHECKING, Any
----> 6 from typing_extensions import Literal, deprecated
      8 from .._internal import _config
      9 from ..warnings import PydanticDeprecatedSince20

ImportError: cannot import name 'deprecated' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)

Here are the libraries I have installed ...

pyphonetics
0.5.3
pyright
1.1.294
pyrsistent
0.18.0
pytest
7.4.0
python-apt
2.4.0+ubuntu1
python-dateutil
2.8.2
python-io-wrapper
0.3.1
python-lsp-jsonrpc
1.0.0
python-lsp-server
1.7.1
python-slugify
8.0.1
pytoolconfig
1.2.2
pytz
2022.1
PyYAML
6.0.1
pyzmq
23.2.0
quantulum3
0.9.0
ratelimit
2.2.1
requests
2.28.1
requests-file
1.5.1
rfc3986
2.0.0
rich
13.5.2
rope
1.7.0
ruamel.yaml
0.17.32
ruamel.yaml.clib
0.2.7
s3transfer
0.6.0
scikit-learn
1.1.1
scipy
1.9.1
seaborn
0.11.2
SecretStorage
3.3.1
Send2Trash
1.8.0
setuptools
63.4.1
shellingham
1.5.0.post1
simpleeval
0.9.13
six
1.16.0
soupsieve
2.3.1
sphinxcontrib-napoleon
0.7
ssh-import-id
5.11
stack-data
0.6.2
statsmodels
0.13.2
stringcase
1.2.0
structlog
23.1.0
tableschema-to-template
0.0.13
tabulate
0.9.0
tenacity
8.1.0
terminado
0.13.1
testpath
0.6.0
text-unidecode
1.3
threadpoolctl
2.2.0
tokenize-rt
4.2.1
tomli
2.0.1
tornado
6.1
traitlets
5.1.1
typer
0.9.0
typing_extensions
4.7.1
ujson
5.4.0
unattended-upgrades
0.1
Unidecode
1.3.6
urllib3
1.26.11
validators
0.21.2
virtualenv
20.16.3
wadllib
1.3.6
wcwidth
0.2.5
webencodings
0.5.1
whatthepatch
1.0.2
wheel
0.37.1
widgetsnbextension
3.6.1
xlrd
2.0.1
xlrd3
1.1.0
XlsxWriter
3.1.2
xlwt
1.3.0
yapf
0.31.0
zipp
1.0.0

string indices must be integers, not str

demo = RemoteCKAN('http://10.11.35.55:5050/', apikey='31ccccccc', user_agent='admin')
configuration = Configuration.create(hdx_site='prod',hdx_key="31cccccc",user_agent='admin',remoteckan=demo)
organization = Organization(configuration=configuration)
organization['name'] = 'cc'
organization['title'] = 'cc'
organization['description'] = 'cc'
organization['id'] = 'cc'
organization['test'] = 'jfccjfjf'
organization.create_in_hdx()

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/hdx/data/organization.py", line 117, in create_in_hdx
self._create_in_hdx('organization', 'id', 'name')
File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 320, in _create_in_hdx
self.check_required_fields()
File "/usr/local/lib/python2.7/dist-packages/hdx/data/organization.py", line 99, in check_required_fields
self._check_required_fields('organization', ignore_fields)
File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 204, in _check_required_fields
for field in self.configuration[object_type]['required_fields']:
TypeError: string indices must be integers, not str

update to resource format fails without error

Below is the code snippet that produces the behavior (targets a dataset on feature). Note that the format mapping dictionary has entries that will change "badformat" to "goodformat" and vice versa, so you don't have to manually edit the test dataset manually each time.

# isolated snippet that tests ability to identify a change and make it in ckan.
# currently it executes, but the change is not recorded in ckan

format_dict = pd.read_csv('https://docs.google.com/spreadsheets/d/1-fvtlOBQF9xZ-X8yttfRaYTFhlWrwR6AQatkPEKClck/export?format=csv')
testdataset = 'cjtest200915'

test = Dataset.read_from_hdx(testdataset)

testres = test.get_resources()
print("number of resources = "+str(len(testres)))
old_type = testres[0].get_file_type()
print ("old resource format = "+old_type)

old_type_lower_case = testres[0].get_file_type().lower()
print ("lowercased = "+old_type_lower_case)

new_type = format_dict.loc[format_dict['existing_format'] == old_type_lower_case,'approved_format'].iloc[0]
print ("new resource format (from mapping doc) = "+new_type)

#update with hdx python

testres[0].set_file_type(new_type)
print("resource format after setting = "+testres[0].get_file_type())

#this approach is the one that fails to update the ressource format
test.update_in_hdx(update_resources=True,hxl_update=False,ignore_check=True,operation='patch')

#this approach succeeds
#testres[0].update_in_hdx(operation='patch',ignore_check=True)


test = Dataset.read_from_hdx(testdataset)

testres = test.get_resources()
old_type = testres[0].get_file_type()
print ("resource format after updating = "+test.get_resources()[0].get_file_type())

Getting "Failed when trying to read: q=*:*! (POST)" when retrieving datasets

Hi!

I have some code that has been running fine in various environments but now seems to return an error (please see below). It was working fine but seemed to have stopped.

Is there something I'm doing wrong please?

Thanks!

Environment:

  • Python 3.9.5 (see here for full environment)
  • hdx-python-api 6.0.8

Code:

from hdx.utilities.easy_logging import setup_logging
from hdx.api.configuration import Configuration
from hdx.data.dataset import Dataset

def setup_hdx_connection(agent_name):
    try:
        Configuration.create(hdx_site="prod", user_agent=agent_name, hdx_read_only=True)
    except:
        print("Configuration already created, continuing ...")

setup_hdx_connection(f"AgentName")

datasets = Dataset.search_in_hdx()

Error:

---------------------------------------------------------------------------
CKANAPIError                              Traceback (most recent call last)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/data/hdxobject.py:115, in HDXObject._read_from_hdx(self, object_type, value, fieldname, action, **kwargs)
    114 try:
--> 115     result = self.configuration.call_remoteckan(action, data)
    116     return True, result

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/api/configuration.py:374, in Configuration.call_remoteckan(self, *args, **kwargs)
    373 kwargs["apikey"] = apikey
--> 374 return self.remoteckan().call_action(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/ckanapi/remoteckan.py:97, in RemoteCKAN.call_action(self, action, data_dict, context, apikey, files, requests_kwargs)
     96     status, response = self._request_fn(url, data, headers, files, requests_kwargs)
---> 97 return reverse_apicontroller_action(url, status, response)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/ckanapi/common.py:134, in reverse_apicontroller_action(url, status, response)
    133 # don't recognize the error
--> 134 raise CKANAPIError(repr([url, status, response]))

CKANAPIError: ['https://data.humdata.org/api/action/package_search', 403, '<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body>\r\n<center><h1>403 Forbidden</h1></center>\r\n</body>\r\n</html>\r\n']

The above exception was the direct cause of the following exception:

HDXError                                  Traceback (most recent call last)
File <command-2615425375951422>:2
      1 print(f"Searching for ALL datasets in HDX to get datasets")
----> 2 datasets = Dataset.search_in_hdx()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/data/dataset.py:1098, in Dataset.search_in_hdx(cls, query, configuration, page_size, **kwargs)
   1096 rows = min(rows_left, page_size)
   1097 kwargs["rows"] = rows
-> 1098 _, result = dataset._read_from_hdx(
   1099     "dataset",
   1100     query,
   1101     "q",
   1102     Dataset.actions()["search"],
   1103     **kwargs,
   1104 )
   1105 datasets = list()
   1106 if result:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/hdx/data/hdxobject.py:120, in HDXObject._read_from_hdx(self, object_type, value, fieldname, action, **kwargs)
    118     return False, f"{fieldname}={value}: not found!"
    119 except Exception as e:
--> 120     raise HDXError(
    121         f"Failed when trying to read: {fieldname}={value}! (POST)"
    122     ) from e

HDXError: Failed when trying to read: q=*:*! (POST)

Note that when running on Desktop I get the above error, but I see this error above it, but in a browser, I can access https://data.humdata.org/api/action/package_search just fine ...

---------------------------------------------------------------------------
CKANAPIError                              Traceback (most recent call last)
File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/data/hdxobject.py:115](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/data/hdxobject.py:115), in HDXObject._read_from_hdx(self, object_type, value, fieldname, action, **kwargs)
    114 try:
--> 115     result = self.configuration.call_remoteckan(action, data)
    116     return True, result

File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/api/configuration.py:374](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/hdx/api/configuration.py:374), in Configuration.call_remoteckan(self, *args, **kwargs)
    373 kwargs["apikey"] = apikey
--> 374 return self.remoteckan().call_action(*args, **kwargs)

File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/remoteckan.py:97](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/remoteckan.py:97), in RemoteCKAN.call_action(self, action, data_dict, context, apikey, files, requests_kwargs)
     96     status, response = self._request_fn(url, data, headers, files, requests_kwargs)
---> 97 return reverse_apicontroller_action(url, status, response)

File [~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/common.py:134](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/humanitarian-insights-platform/~/opt/miniconda3/envs/ddenv/lib/python3.8/site-packages/ckanapi/common.py:134), in reverse_apicontroller_action(url, status, response)
    133 # don't recognize the error
--> 134 raise CKANAPIError(repr([url, status, response]))

CKANAPIError: ['https://data.humdata.org/api/action/package_search', 403, '\r\n\r\n\r\n
403 Forbidden
\r\n\r\n\r\n']

Python 2.7 Support

I'd suggest adding python 2.7 support to enable use of this code by the wider humanitarian community.

It would be great to have 1 click publishing of datasets to HDX from GeoNode systems (http://geonode.org/), which contain many datasets potentially useful on HDX. For instance, a popup that appears asking if you would like to publish to HDX.

GeoNode is currently written in Python 2.7 (https://github.com/geonode/geonode). GeoNode (as well as other systems) can only be upgraded to Python 3.0 once all their dependencies are upgraded, which could still be years away.

A standard python API that shields the complexity of CKAN's REST API is great! However, to seamlessly integrate HDX publishing into workflows, the code will need to run directly within business systems rather than as a separate script.

Unable to remove country locations

dataset.add_country_locations and dataset.add_country_location both allow countries to be added to a dataset on HDX but don't clear out the existing set of countries (even if dataset['groups'] = [] first).

Is it possible to push datasets using `hdx_site = 'test'`?

I'm integrating HDX to an application to publish datasets regularly, but I'd like to avoid pushing datasets to prod site when other developers work on their local machines.

Is it possible? I tried with the following script:

from hdx.utilities.easy_logging import setup_logging
from hdx.hdx_configuration import Configuration
from hdx.data.dataset import Dataset
from hdx.data.resource import Resource

import datetime

setup_logging()

Configuration.create(hdx_site='stage', user_agent='A_Quick_Example', hdx_key='hdx key from my prod site')

dataset = Dataset({
    'name': 'demo dataset',
    'private': True,
    'title': 'demo dataset title',
    'notes': 'demo dataset notes',
    'license_id': 'ODC-ODbL',
    'methodology': 'Registry',
    'data_update_frequency': 'Every day',
    'dataset_date': datetime.datetime.now().strftime('%Y-%m-%d'),
    'dataset_source': 'Angostura',
})

dataset.set_maintainer('196196be-6037-4488-8b71-d786adf4c081') # An user that updated the ucdp-data-for-australia dataset from https://stage.data-humdata-org.ahconu.org/
dataset.set_organization('hdx') # after digging for some organization from https://stage.data-humdata-org.ahconu.org/
dataset.add_country_location('VEN')
dataset.add_tag('americas')

resource = Resource({
    'name': 'test',
    'description': 'description',
    'format': 'CSV'
})
resource.set_file_to_upload('sample.csv')
dataset.add_update_resource(resource)

dataset.create_in_hdx()

but that give me the following error:

Traceback (most recent call last):
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 316, in _write_to_hdx
    return self.configuration.call_remoteckan(self.actions()[action], data, files=files)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/hdx_configuration.py", line 307, in call_remoteckan
    return self.remoteckan().call_action(*args, **kwargs)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/ckanapi/remoteckan.py", line 87, in call_action
    return reverse_apicontroller_action(url, status, response)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/ckanapi/common.py", line 128, in reverse_apicontroller_action
    raise NotAuthorized(err)
ckanapi.errors.NotAuthorized: {'message': 'Access denied: <function package_create at 0x7f5b257bbb90> requires an authenticated user', '__type': 'Authorization Error'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/dataset.py", line 565, in create_in_hdx
    self._save_to_hdx('create', 'name', force_active=True)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 343, in _save_to_hdx
    result = self._write_to_hdx(action, self.data, id_field_name, file_to_upload)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 322, in _write_to_hdx
    raisefrom(HDXError, 'Failed when trying to %s%s! (POST)' % (action, idstr), e)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/utilities/__init__.py", line 28, in raisefrom
    six.raise_from(exc_type(message), exc)
  File "<string>", line 3, in raise_from
hdx.data.hdxobject.HDXError: Failed when trying to create demo dataset! (POST)

Required python3 while installing hdx-python-api==4.3.0

This is working till now, suddenly I installed this on my same version OS and same version python2.7. Stop working

Collecting libhxl>=4.24.1 Downloading libhxl-4.24.1.tar.gz (91 kB) ERROR: Command errored out with exit status 1: command: /usr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-9T48_N/libhxl/setup.py'"'"'; __file__='"'"'/tmp/pip-install-9T48_N/libhxl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-Avj4GK cwd: /tmp/pip-install-9T48_N/libhxl/ Complete output (5 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-9T48_N/libhxl/setup.py", line 7, in <module> raise RuntimeError("libhxl requires Python 3 or higher") RuntimeError: libhxl requires Python 3 or higher ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. WARNING: You are using pip version 20.3.3; however, version 20.3.4 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.

Our old application build on python2. Please help me to fix this. Thanks

Error handling shouldn't hide exceptions when interacting with CKAN

File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 289, in _save_to_hdx
result = self._write_to_hdx(action, self.data, id_field_name, file_to_upload)
File "/usr/local/lib/python2.7/dist-packages/hdx/data/hdxobject.py", line 271, in _write_to_hdx
six.raise_from(HDXError('Failed when trying to %s %s! (POST)' % (action, self.data[id_field_name])), e)
File "/usr/local/lib/python2.7/dist-packages/six.py", line 718, in raise_from
    raise value
HDXError: Failed when trying to update 1d55221a-5fda-459c-9295-54185f392e81! (POST)

When invalid values are POSTed to the CKAN api, it fails this way - the actual CKAN exception has useful information about why the request failed, but they are squelched when caught and re-raised as HDXError

Configuration stored as connection objects instead of on module singleton

Configurations are stored as global state on the Configuration module, example:

Configuration.create(hdx_site='prod')

I would prefer if Configurations were individually instantiated as a "connection" object. This would make application code easier to unit test and more decoupled. A good example is the AWS Boto API e.g.

from boto.s3.connection import S3Connection
conn = S3Connection('', '')

Unable to instantiate Configuration

Hi here !
I am facing some issues while instantiating configuration , I was using hdx-python-api 2.1.5 before

HDX_URL_PREFIX = Configuration.create(
    hdx_site=os.getenv('HDX_SITE', 'demo'),
    hdx_key=os.getenv('HDX_API_KEY'),
    user_agent="my_user_agent_name"
)

Now I am trying to upgrade it to latest version of this library while doing so
I am. getting error like this

TypeError: __init__() got an unexpected keyword argument 'allowed_methods'

from 176 line of hdx>utilities >session.py> get_session

    retries = Retry(
        total=5,
        backoff_factor=0.4,
        status_forcelist=status_forcelist,
        allowed_methods=allowed_methods,
        raise_on_redirect=True,
        raise_on_status=True,
    )

when I disable this code block and avoid passing allowed_methods like this ,

    retries = Retry(
       total=5,
       backoff_factor=0.4,
       status_forcelist=status_forcelist,
       # allowed_methods=allowed_methods,
       raise_on_redirect=True,
       raise_on_status=True,
   )

works fine for me but looking for better suggestion if there are any . Feel free to suggest me different way / navigate me to docs where I can find about it .

Full Traceback :

 File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 648, in create
  configuration=configuration, remoteckan=remoteckan, **kwargs
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 607, in _create
  cls._configuration.setup_session_remoteckan(remoteckan, **kwargs)
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 472, in setup_session_remoteckan
  full_agent=self.get_user_agent(), **kwargs
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/api/configuration.py", line 443, in create_session_user_agent
  **kwargs,
File "/Users/kshitij/opt/anaconda3/envs/export/lib/python3.6/site-packages/hdx/utilities/session.py", line 176, in get_session
  raise_on_status=True,
TypeError: __init__() got an unexpected keyword argument 'allowed_methods'

Push data to HDX

I have obtained hdx api key and I'm trying to implement programmatically pushing ckan dataset to HDX.

from hdx.utilities.easy_logging import setup_logging
from hdx.hdx_configuration import Configuration
from hdx.data.dataset import Dataset

conf = Configuration(
        hdx_site='prod',
        user_agent='admin',
        hdx_key=hdx_api_key
        )
dataset_class_object = Dataset(initial_data=data, configuration=conf)
resources = dataset_class_object.get_resources()
dataset_class_object.check_required_fields(['dataset_source', 'maintainer', 'dataset_date', 'data_update_frequency', 'groups', 'methodology'])
dataset_class_object.create_in_hdx()

ERROR [ckan.views.api] Field dataset_source is missing in dataset!
Traceback (most recent call last):
File "/home/ljupka/ckan_custom/lib/default/src/ckan/ckan/views/api.py", line 288, in action
result = function(context, request_data)
File "/home/ljupka/ckan_custom/lib/default/src/ckan/ckan/logic/init.py", line 464, in wrapped
result = _action(context, data_dict, **kw)
File "/home/ljupka/ckan_custom/lib/default/src/ckanext-custom/ckanext/custom/logic/action/get.py", line 1837, in push_dataset_to_hdx
dataset_class_object.create_in_hdx()
File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/dataset.py", line 512, in create_in_hdx
self.check_required_fields(allow_no_resources=allow_no_resources)
File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/dataset.py", line 352, in check_required_fields
self._check_required_fields('dataset', ignore_fields)
File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/hdxobject.py", line 208, in _check_required_fields
raise HDXError('Field %s is missing in %s!' % (field, object_type))
HDXError: Field dataset_source is missing in dataset!

Database link in readme.md does not exist

The database file "ACLED Conflict Data for Africa 1997-2016" used in the usage example in the readme.md does not exist anymore and consequently fails the example code.

Fix for broken mark_data_updated

mark_data_updated is broken due to an error in dataset_update_filestore_resource in which timezone information was incorrectly added to the iso formatted string due to the following erroneous change to replace deprecated utcnow() function

datetime.utcnow().isoformat(timespec="microseconds") '2024-04-25T22:42:58.501073'
became:
datetime.now(timezone.utc).isoformat(timespec="microseconds") '2024-04-25T22:43:10.746691+00:00'

Intermittent stream_path error on downloading some files

Hi HDX Team!

I am carrying out analysis on some datasets from Kenya, part of which requires I download tabular data. This works really well most of the time, but in calling ...

dataset = Dataset.read_from_hdx(row["id"])
resources = dataset.get_resources()
for resource in resources:
     url, path = resource.download(dir)

We sometimes get ...

Traceback (most recent call last):
File "", line 744, in download_data
url, path = resource.download(dir)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-dbff6b46-9075-4f38-a1a6-07585d60da84/lib/python3.9/site-packages/hdx/data/resource.py", line 515, in download
path = downloader.download_file(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-dbff6b46-9075-4f38-a1a6-07585d60da84/lib/python3.9/site-packages/hdx/utilities/downloader.py", line 428, in download_file
return self.stream_path(
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-dbff6b46-9075-4f38-a1a6-07585d60da84/lib/python3.9/site-packages/hdx/utilities/downloader.py", line 347, in stream_path
raise DownloadError(errormsg) from e
hdx.utilities.base_downloader.DownloadError: Download of https://data.humdata.org/dataset/db3e1a76-76d8-4206-9e5e-382336c51472/resource/562aad8a-2d68-4560-8c8b-237459958183/download/dtm_kenya_b2_baseleine_multi_sectoral_assessment_nov_2022.xlsx failed in retrieval of stream!

However, if I use the URL in a browser ...

https://data.humdata.org/dataset/db3e1a76-76d8-4206-9e5e-382336c51472/resource/562aad8a-2d68-4560-8c8b-237459958183/download/dtm_kenya_b2_baseleine_multi_sectoral_assessment_nov_2022.xlsx

It seems to work fine and I get a file.

I am using hdx-python-api==6.0.8 and Python 3.9.5.

Thanks a lot!

hdx.get_validlocations() network call should not be part of Configuration.create

It would be better if this external network call was an explicit method on the Configuration. Configuration objects should be able to be used in environments without network access. It is also easy to make the mistake of calling Configuration.create many times without realizing that each call results in a network request, resulting in hundreds or more superfluous requests to the CKAN API.

This is somewhat related to #7 - if it was explicit that Configuration.create is only called once, and the resulting object was passed around, repeated calls to get_validlocations would be less of an issue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.