dr-leo / pandasdmx Goto Github PK

View Code? Open in Web Editor NEW

125.0 125.0 57.0 2.48 MB

Python interface to SDMX

License: Apache License 2.0

Python 100.00%

pandasdmx's People

Contributors

Stargazers

Watchers

pandasdmx's Issues

Error with concat=True

get_data(...,concat=False) works but I am getting an error when I set get_data(...,concat=True). Not sure exactly what is going on here but error is below.

Python 2.7.8 (default, Dec 14 2014, 15:11:16)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin

In[2]: from pandasdmx import client
In[3]: client("Eurostat").get_data("namq_10_gdp",".CLV10_MEUR.SWDA.B1GQ.DE",concat=True)

Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
client("Eurostat").get_data("namq_10_gdp",".CLV10_MEUR.SWDA.B1GQ.DE",concat=True)
File "/usr/local/lib/python2.7/site-packages/pandasdmx.py", line 507, in get_data
[code_sets[k] for k in sorted_keys])
File "/usr/local/lib/python2.7/site-packages/pandas/core/index.py", line 3576, in from_product
labels = cartesian_product([c.codes for c in categoricals])
File "/usr/local/lib/python2.7/site-packages/pandas/tools/util.py", line 33, in cartesian_product
a[0] = 1
IndexError: index 0 is out of bounds for axis 0 with size 0

Documentation on readthedocs is broken

The code examples in the documentation html on readthedocs contain various tracebacks which don't look like they are supposed to be there, eg. in http://pandasdmx.readthedocs.io/en/master/usage.html#extracting-the-dataflows-in-a-particular-category

In [9]: list(cat_response.categoryscheme.MOBILE_NAVI['07'])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-be40f9ac125d> in <module>()
----> 1 list(cat_response.categoryscheme.MOBILE_NAVI['07'])

~/checkouts/readthedocs.org/user_builds/pandasdmx/envs/master/lib/python3.5/site-packages/pandaSDMX-0.7.0-py3.5.egg/pandasdmx/model.py in __iter__(self)
    420         m = self._reader.message
    421         # We assume that only dataflow definitions are categorised.
--> 422         resource = m.dataflow
    423         idx = (self._reader.read_as_str('cat_scheme_id', self), self.id)
    424         return (resource[c.artefact.id] for c in m._categorisation[idx])

~/checkouts/readthedocs.org/user_builds/pandasdmx/envs/master/lib/python3.5/site-packages/pandaSDMX-0.7.0-py3.5.egg/pandasdmx/model.py in __getattr__(self, name)
    716             return getattr(self, old2new[name])
    717         else:
--> 718             raise AttributeError
    719 
    720 

AttributeError:

ENH: Add support for requests-cache to cache received files at backend level

This features is already in

https://github.com/pydata/pandas-datareader

Consider making pandas datareader a dependency of pandaSDMX to leverage their work.

ENH: add SDMX-JSON reader

The SDMX community is working on a JSON alternative to XML-based SDMX-ML.

https://github.com/sdmx-twg

The data-related parts are in an advanced stage. The OECD is using this draft format in beta mode. The ECB uses it as well although we do not depend on it as ECB also offers XML. The OECD offers SDMX-2.1 data as JSON only. Thus, a reader for JSON would give pandaSDMX immediate access to OECD data. More agencies are likely to adopt JSON as a simpler replacement for XML.

An SDMX-JSON reader for pandaSDMX is thus a high priority. It could be implemented as follows:

start from a copy of data2pandas.py
Use
https://pypi.python.org/pypi/jsonpath-rw/

instead of LXML. The XPath expressions would have to be replaced by their function jsonpath equivalents, if possible. I haven#t checked this.

Perhaps tweak the reader methods to properly interact with jsonpath.
api.py: extend the metadata on agencies by adding a 'reader# item so that each agency may be assigned a reader class. The get_reader method needs to be adjusted; currently it instantiates the only reader as there is no choice anyway.
Test with sample data provided on the github page of the SDMX Technical Working Group

AttributeError: exit

Hi,

My project was working fine untill today afternoon...
Since then, the following code is reporting an "AttributeError: exit"

from pandasdmx import Request
esta = Request('ESTAT')
une_resp = esta.get('data','demo_gind')

I attached a printscreen of the error:

Do you have an idea of the source of the issue ?

Thank you

issue with OECD

Hi Dr.Leo,

Thank you so much for your work, it is really useful !

I'm using pandasdmx to get data from the OECD.
Unlike other agencies like Eurostat or ECB, i'm struggling with the OECD.
I have these errors but I fail solving it:

list of data id:

When I want to have the list of the dataflow:
flows=oecd.dataflow()
it does not work because the url doesn't exist
here the error message:

400 Client Error: Semantic Error for url: http://stats.oecd.org/SDMX-JSON/dataflow/OECD

getting the data

Then when I want to get the data (it does exist)
resp = Request('OECD').data('TSE_2010', key='ALL', params={'startPeriod': '2007'})
File "", line unknown
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

other example:
resp = Request('OECD').data('IDD', key='ALL', params={'startPeriod': '2007'})
File "", line unknown
XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

Would you help me to solve this problem ?
I saw on one issue, that your code works for OECD, and I think I have the latest version of your library, so I don't know what I'm doing wrong.

Thank you very much,

Christelle

SDMX 2.0 issue maybe?

Hi Dr Leo,
Great work in concept, execution and documentation and I am really excited by what I can do with this tool.
I have encountered a problem that might be an issue with SDMX 2.0 though a work around or any support would be greatly appreciated.
Here's the code:
r=Request()
response=r.get(url='http://stat.abs.gov.au/restsdmx/sdmx.ashx/GetData/HF/2.0.99.10.132_3.M/ABS')
and the error:
ValueError: Unsupported root tag: {http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}MessageGroup
Happy to help and providers as I get seek more data sources.
Thanks in Advance

Issue with IMF

Hi Dr. Leo,

Thank you for developing the very useful pandasdmx tool!

Unfortunately I'm having some issues downloading data from the IMF database. Choosing the right codes from the codelists worked so far, but putting them into the response function leads to an empty sheet :

import pandasdmx
from pandasdmx import Request
imf = Request('IMF_SDMXCENTRAL')
resp = imf.data('UEM', key = {'REF_AREA':'AT'})
data = resp.write()

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-8ccba3e6ac87> in <module>()
----> 1 data = resp.write()

C:\ProgramData\Anaconda3\lib\site-packages\pandasdmx\api.py in write(self, source, **kwargs)
    633         if not source:
    634             source = self.msg
--> 635         return self._writer.write(source=source, **kwargs)
    636 
    637     def write_source(self, filename):

C:\ProgramData\Anaconda3\lib\site-packages\pandasdmx\writer\data2pandas.py in write(self, source, asframe, dtype, attributes, reverse_obs, fromfreq, parse_time)
    114                     pd_series, pd_attributes = zip(*series_list)
    115                 elif dtype:
--> 116                     key_fields = series_list[0].name._fields
    117                     pd_series = series_list
    118                 elif attributes:

IndexError: list index out of range

When I try to get an overview over the dataset, this happens :
data = resp.data
data.columns.names

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-0786dd9c8bac> in <module>()
----> 1 data.columns.names

AttributeError: 'DataSet' object has no attribute 'columns'

data.series
series_l = list(data.series)[]

It seems, as if the dataset I'm downloading is empty. I have tested the same code with data from estat and had no problems to use it.
Do you have any advice what's wrong ?

With best regards,
Marco

Need Conda package for pandaSDMX

Conda is the de-facto packaging and distribution of dependencies for Jupyter Notebook components. PIP is not installed as a default on the datascience workbook images: https://github.com/jupyter/docker-stacks/tree/master/datascience-notebook

It is still possible to work around this by forcing PIP install and install pandaSDMX:

FROM jupyter/datascience-notebook

RUN conda install -y pip
RUN pip install pandasdmx

We should certainly have an up-to-date package in Conda as data science notebooks are probably a major platform for users of pandaSDMX (as myself).

I found this documentation on distributing an existing PIP-ready project to conda:
http://conda.pydata.org/docs/build_tutorials/pkgs.html

@dr-leo what's your point on this ?

jsonpath-rw path sha256 mismatch error

I installed pandaSDMX through the Anaconda prompt and it gave me the following error.

(base) C:\Users\andras>conda install -c alcibiade pandasdmx
Solving environment: done

## Package Plan ##

  environment location: C:\Users\andras\Anaconda3

  added / updated specs:
    - pandasdmx


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    jsonpath-rw-1.4.0          |           py36_0          25 KB  alcibiade
    pandasdmx-0.7.0            |           py36_0          89 KB  alcibiade
    ------------------------------------------------------------
                                           Total:         114 KB

The following NEW packages will be INSTALLED:

    jsonpath-rw: 1.4.0-py36_0 alcibiade
    pandasdmx:   0.7.0-py36_0 alcibiade

Proceed ([y]/n)? y


Downloading and Extracting Packages
jsonpath-rw 1.4.0: ############################################################################################ | 100%
pandasdmx 0.7.0: ####################################################################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: -
SafetyError: The package for jsonpath-rw located at C:\Users\andras\Anaconda3\pkgs\jsonpath-rw-1.4.0-py36_0
appears to be corrupted. The path 'Scripts/jsonpath.bat'
has a sha256 mismatch.
  reported sha256: 34df90d6d81192abec268c483cb423d4a8068fc45be7fe97b9f7ca251c455fe6
  actual sha256: 70c065bcbbd56aea542f169e56b6d43ddaad81863acdb2ff9bcf2be16c8c43b8

ClobberError: This transaction has incompatible packages due to a shared path.
  packages: alcibiade::jsonpath-rw-1.4.0-py36_0, alcibiade::jsonpath-rw-1.4.0-py36_0
  path: 'scripts/jsonpath-script.py'




done
Executing transaction: done

Nonetheless I could import the library and execute basic commands.

In [14]: from pandasdmx import Request

ecb = Request('ECB')

ecb
Out[15]: <pandasdmx.api.Request at 0x2d0b949df28>

ecb.categoryscheme()
Out[16]: <pandasdmx.api.Response at 0x2d0b94a7ac8>

ecb_response = ecb.categoryscheme()

ecb_response
Out[18]: <pandasdmx.api.Response at 0x2d0b6c0a748>

ecb_response.url
Out[19]: 'http://sdw-wsrest.ecb.int/service/categoryscheme/latest?references=parentsandsiblings'

Error: tempfile.SpooledTemporaryFile constructor does not have encoding keyword in 2.7

>>> metadata = pandasdmx.Request('ESTAT').datastructure('DSD_nrg_pc_204').write()

/home/kotzer/anaconda2/lib/python2.7/site-packages/pandasdmx/api.pyc in get(self, resource_type, resource_id, agency, key, params, headers, fromfile, tofile, url, get_footer_url, memcache, writer)
    295             'Requesting resource from URL/file %s', (base_url or fromfile))
    296         source, url, resp_headers, status_code = self.client.get(
--> 297             base_url, params=params, headers=headers, fromfile=fromfile)
    298         logger.info(
    299             'Loaded file into memory from URL/file: %s', (url or fromfile))

/home/kotzer/anaconda2/lib/python2.7/site-packages/pandasdmx/remote.pyc in get(self, url, fromfile, params, headers)
     96         else:
     97             source, final_url, resp_headers, status_code = self.request(
---> 98                 url, params=params, headers=headers)
     99         return source, final_url, resp_headers, status_code
    100 

/home/kotzer/anaconda2/lib/python2.7/site-packages/pandasdmx/remote.pyc in request(self, url, params, headers)
    124                 else:
    125                     enc, fmode = None, 'w+b'
--> 126                 source = STF(max_size=self.max_size, mode=fmode, encoding=enc)
    127                 for c in response.iter_content(chunk_size=1000000,
    128                                                decode_unicode=bool(enc)):

TypeError: __init__() got an unexpected keyword argument 'encoding'

Working with custom / unlisted agency

Hello!

I wasn't quite sure where to post this and got a bit confused looking through all the docs and reading through the source. Basically I'm trying to use this site as a source: http://andmebaas.stat.ee/. And more specifically this table: http://andmebaas.stat.ee/Index.aspx?DataSetCode=VK10_3&lang=en.

Here is what I've tried but keep getting stuck:

from pandasdmx import *
r = Request()
ds = r.get(url="http://andmebaas.stat.ee/restsdmx/sdmx.ashx/GetDataStructure/VK10_3", resource_type='datastructure')
ds.dimensions

To which I get:

AttributeError: 'StructureMessage' object has no attribute 'dimensions'

Any help would be super appreciated, thanks!

Pandas Writer failure on INSEE responses

I'm calling INSEE data series like this:

import pandasdmx as pd
insee = pd.Request('INSEE')
resp_catABC = insee.data(resource_id='SERIES_BDM/001572432+001572433+001572434')
resp_catABC.write()

The outcome is:

jovyan@aba7a04d4412:~/work$ python3 test.py 
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    resp_catABC.write()
  File "/opt/conda/lib/python3.5/site-packages/pandasdmx/api.py", line 491, in write
    return self._writer.write(source=source, **kwargs)
  File "/opt/conda/lib/python3.5/site-packages/pandasdmx/writer/data2pandas.py", line 109, in write
    reverse_obs, fromfreq, parse_time))
  File "/opt/conda/lib/python3.5/site-packages/pandasdmx/writer/data2pandas.py", line 107, in <genexpr>
    series_list = list(s for s in self.iter_pd_series(
  File "/opt/conda/lib/python3.5/site-packages/pandasdmx/writer/data2pandas.py", line 184, in iter_pd_series
    elif freq_key in series.key._fields:
UnboundLocalError: local variable 'freq_key' referenced before assignment

I'm using Python 3.5.2 with pandasdmx 0.5.2 (in a jupyter/datascience-notebook docker container)

Am I doing something wrong here or is it a bug ?

I checked and the FREQ attribute is set to 'M' in the response XML, so the code in the data2pandas should work.

Thanks for the support !

Frequency issue with Insee data

Hi,

I am trying to download data from INSEE but I have troubles downloading IPCH data.
When trying to retrieve data, I have the following error:

/~/python3.5/site-packages/pandas/core/indexes/period.py in _assert_can_do_setop(self, other)
    955         if self.freq != other.freq:
    956             msg = _DIFFERENT_FREQ_INDEX.format(self.freqstr, other.freqstr)
--> 957             raise IncompatibleFrequency(msg)
    958 
    959     def _wrap_union_result(self, other, result):

IncompatibleFrequency: Input has different freq=M from PeriodIndex(freq=A-DEC)

Here is a MWE to reproduce the error:

from pandasdmx import Request
insee = Request('INSEE')
ipch = 'IPCH-2015'
data = insee.data(resource_id = ipch,key={'FREQ':'M','COICOP2016':'032'})

Thank you very much!

cc @Asigwalt

pandaSDMX queries nonexistant OECD urls

Whenever I try to query OECD, it 404s

>oecd=Request('OECD')
>oecd.categoryscheme()
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-9-daef2c78d60b> in <module>()
----> 1 oecd.categoryscheme()
/home/shep/.local/lib/python2.7/site-packages/pandasdmx/api.pyc in get(self, resource_type, resource_id, agency, version, key, params, headers, fromfile, tofile, url, get_footer_url, memcache, writer)
    319             'Requesting resource from URL/file %s', (base_url or fromfile))
    320         source, url, resp_headers, status_code = self.client.get(
--> 321             base_url, params=params, headers=headers, fromfile=fromfile)
    322         if source is None:
    323             raise SDMXException('Server error:', status_code, url)
/home/shep/.local/lib/python2.7/site-packages/pandasdmx/remote.pyc in get(self, url, fromfile, params, headers)
     96         else:
     97             source, final_url, resp_headers, status_code = self.request(
---> 98                 url, params=params, headers=headers)
     99         return source, final_url, resp_headers, status_code
    100 
/home/shep/.local/lib/python2.7/site-packages/pandasdmx/remote.pyc in request(self, url, params, headers)
    140             code = int(response.status_code)
    141             if 400 <= code <= 499:
--> 142                 raise response.raise_for_status()
    143             return source, response.url, response.headers, code
/usr/local/lib/python2.7/dist-packages/requests/models.pyc in raise_for_status(self)
    907 
    908         if http_error_msg:
--> 909             raise HTTPError(http_error_msg, response=self)
    910 
    911     def close(self):
HTTPError: 404 Client Error: Not Found for url: http://stats.oecd.org/SDMX-JSON/categoryscheme/latest?references=parentsandsiblings
>flows=oecd.dataflow()
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-13-e70914a15fa9> in <module>()
----> 1 flows=oecd.dataflow()
/home/shep/.local/lib/python2.7/site-packages/pandasdmx/api.pyc in get(self, resource_type, resource_id, agency, version, key, params, headers, fromfile, tofile, url, get_footer_url, memcache, writer)
    319             'Requesting resource from URL/file %s', (base_url or fromfile))
    320         source, url, resp_headers, status_code = self.client.get(
--> 321             base_url, params=params, headers=headers, fromfile=fromfile)
    322         if source is None:
    323             raise SDMXException('Server error:', status_code, url)
/home/shep/.local/lib/python2.7/site-packages/pandasdmx/remote.pyc in get(self, url, fromfile, params, headers)
     96         else:
     97             source, final_url, resp_headers, status_code = self.request(
---> 98                 url, params=params, headers=headers)
     99         return source, final_url, resp_headers, status_code
    100 
/home/shep/.local/lib/python2.7/site-packages/pandasdmx/remote.pyc in request(self, url, params, headers)
    140             code = int(response.status_code)
    141             if 400 <= code <= 499:
--> 142                 raise response.raise_for_status()
    143             return source, response.url, response.headers, code
/usr/local/lib/python2.7/dist-packages/requests/models.pyc in raise_for_status(self)
    907 
    908         if http_error_msg:
--> 909             raise HTTPError(http_error_msg, response=self)
    910 
    911     def close(self):
HTTPError: 400 Client Error: Semantic Error for url: http://stats.oecd.org/SDMX-JSON/dataflow/OECD/latest?references=parentsandsiblings

I'm using the latest version on the pip repo.

ENH: async requests with aiohttp as replacement for requests - py35+ only

The python-zeep Project should provide a good Illustration how this could be achieved.

Add ISTAT as data provider

Hi,

I just read the docs and there is an example to select data from a particular agency. I wonder if there is a feature that allows us to point to a particular SDMX url link and read the file directly? Example, Banca d'Italia webiste:
https://a2a.bancaditalia.it/infostat/dataservices/export/EN/SDMXv21/DATA/CUBE/BANKITALIA/DIFF/FSI

This is from: http://www.istat.it/nsdp/

I saved the file to my local disk. I read the docs online but the section on that is quite limited. All I managed to do is the following. Not sure what else I need to do to convert data into a pandas.DataFrame... Any ideas? Would be great if the docs had a full example of how reading from a file worked.

import pandasdmx as sdmx
req = sdmx.Request()
res = req.get(fromfile='z:/tmp/file.xml')

Thank you.

ENH: odo support

Hello,

Maybe you should consider to support odo
http://odo.readthedocs.org/

Kind regards

Add World Bank as new agency

The World Bank "World Integrated Trade Solution" is available vis SDMX. The user guide is available here: http://wits.worldbank.org/data/public/WITSAPI_UserGuide.pdf. The following block seems to work.
'WBG_WITS': { 'name': 'WBG_WITS', 'url': 'http://wits.worldbank.org/API/V1/SDMX/V21/rest', 'resources': { 'data': { 'headers': {}, }, } },

v0.3.0: AttributeError in StructureMessages

Building the docs with 0.3.0 Fails. The test Suite succeeds though.

There is an IndexError in the "Basic usage" section. This is already fixed but not yet committed.

However, the AttributeError is hard to reproduce. It seems dataflows and codelists attributes are not set properly when building the docs with Sphinx and ipython directive. I have no clue what happens. Could be a Jupyter Transition Problem. Need to fix this asap.

Could not run on Raspberry Pi

I installed pandaSDMX on a Rapsberry Pi 3 successfully, but I could not run my script because of dependencies: apparently, Numpy is not found

IndexError: list index out of range

Hi,

when executing this simple example with the subscription header params as specified in the UNESCO docs I get an error.

from pandasdmx import Request
request = Request('UNESCO', headers={'Ocp-Apim-Subscription-Key': 'XXX'})
symbol= 'UNESCO,CE,1.0/EDU.._T.._T.._T........'
response = request.data(symbol)
df = response.write()

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-21-5b250bcfe1a0> in <module>()
----> 1 df = response.write()

/usr/local/lib/python3.6/site-packages/pandasdmx/api.py in write(self, source, **kwargs)
    633         if not source:
    634             source = self.msg
--> 635         return self._writer.write(source=source, **kwargs)
    636 
    637     def write_source(self, filename):

/usr/local/lib/python3.6/site-packages/pandasdmx/writer/data2pandas.py in write(self, source, asframe, dtype, attributes, reverse_obs, fromfreq, parse_time)
    114                     pd_series, pd_attributes = zip(*series_list)
    115                 elif dtype:
--> 116                     key_fields = series_list[0].name._fields
    117                     pd_series = series_list
    118                 elif attributes:

IndexError: list index out of range

It seems as if no data could be extracted from the response, since series_list is empty. The error still occurs, when I'm using the subscription key as a URL param: UNESCO,CE,1.0/EDU.._T.._T.._T........?subscription-key=XXX. However, when I request the URL in the browser the data seems to be fine. http://api.uis.unesco.org/sdmx/data/UNESCO,CE,1.0/EDU.._T.._T.._T........?subscription-key=XXX (replace with your key).

Thanks in advance,

Rudi

ENH: api.Request.get: validate key against the DSD, allow key arg to be a dict

This would make the API more usable. One could write something like:

data=estat.get(resource_type0'data', resource_id='une_rt_a', key=dict(GEO='DE+FR'))

The get method should then download the DSD and construct the key string such as '.DE+FR...'.

ENH: add reader for SDMX 2.0

Start from sdmxml.py, copy it to sdmxml20.py and work through the XPath expressions accoring to the SDMX 2.0 standard. Check if this requires a separate model for SDMX 2.0. If so, model.py would have to be replaced by a model package. This would require some bigger changes to other parts of the API, but it should be feasible. However, a reader for JSON is a clear priority.

Error raised by the Example from documentation

Running the Example from the library documentation raises following error:

Traceback (most recent call last):
File "/home/martin/NetBeansProjects/NewPythonProject/newpythonproject/newpythonproject.py", line 7, in
une_df = une_resp.write(s for s in une_resp.msg.data.series if s.key.AGE == 'TOTAL')
File "/usr/local/lib/python3.4/dist-packages/pandasdmx/api.py", line 392, in write
return self._writer.write(source=source, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/pandasdmx/writer/data2pandas.py", line 127, in write
d_frame = PD.concat(list(pd_series), axis=1, copy=False)
TypeError: concat() got an unexpected keyword argument 'copy'

After fixing the file /usr/local/lib/python3.4/dist-packages/pandasdmx/writer/data2pandas.py it seems to work.

Add support for other SDMX data sources.

It doesn't seem possible to use a SDMX agency other than the default ESTAT, ECB and ILO.

Can support for any SDMX website be built?

ENH: add missing attributes and artefacts to the model and sdmxml reader

The model is incomplete, e.g. DataflowDefinition lacks some attributes, DataSet as well. It would be very useful to fill these gaps.

Improve on sdmxjson reader to parse data sets with missing entities

For a description see
https://groups.google.com/forum/#!topic/sdmx-python/O3w5J6Ex7fw
and
https://groups.google.com/forum/#!topic/sdmx-python/AMC8HWdRl_w

The latter might require changes to the data2pd writer as well.

Error with some EUROSTAT flows if params['references'] is "all"

If I run the following:

from pandasdmx import Request

key_args = {"UNIT"    : "CP_MEUR",
            "NA_ITEM" : "P51G"   ,
            "ASSET10" : "N1G"    ,
            "NACE_R2" : "A"      }

req = Request('ESTAT')
resp = req.get(resource_type="data", resource_id="nama_10_a64_p5", key=key_args)

I get the following:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-2-2d64bff3dfb5> in <module>()
      5 
      6 req = Request('ESTAT')
----> 7 resp = req.get(resource_type="data", resource_id="nama_10_a64_p5", key=key_args)

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/api.pyc in get(self, resource_type, resource_id, agency, key, params, fromfile, tofile, url, get_footer_url, memcache)
    219         # Now get the SDMX message either via http or as local file
    220         source, url, headers, status_code = self.client.get(
--> 221             base_url, params=params, fromfile=fromfile)
    222         # write msg to file and unzip it as required, then parse it
    223         with source:

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/remote.py in get(self, url, fromfile, params)
     80         else:
     81             source, final_url, headers, status_code = self.request(
---> 82                 url, params=params)
     83         return source, final_url, headers, status_code
     84 

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/remote.py in request(self, url, params)
    104             code = int(response.status_code)
    105             if 400 <= code <= 499:
--> 106                 raise response.raise_for_status()
    107             return source, response.url, response.headers, code

/usr/lib/python2.7/dist-packages/requests/models.pyc in raise_for_status(self)
    849 
    850         if http_error_msg:
--> 851             raise HTTPError(http_error_msg, response=self)
    852 
    853     def close(self):

HTTPError: 400 Client Error: Bad Request

This doesn't happen with all Request.get() (e.g. I never had this problem with ECB data so far).

The problem is "solved" if in remote.py, at the beginning of the definition of REST.request(), I add the lines:

        if 'references' in params:
            del params['references']

To be honest, I have no idea of what this means, but I didn't notice any side effect.

ObsKeyTuple error

Hi,

https://github.com/dr-leo/pandaSDMX/blob/master/pandasdmx/reader/sdmxml.py#L194

Replace ObsKeyTuple with self._ObsTuple ?

Thanks.

AttributeError: 'Reader' object has no attribute 'footer_text'

The following code:

resp = Request('ESTAT').get("data", 'naio_10_pyp16', key={'FREQ': 'D', 'INDUSE': 'TOTAL'})

results in:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-477ab9e6fe65> in <module>()
----> 1 resp = Request('ESTAT').get("data", 'naio_10_pyp16', key={'FREQ': 'D', 'INDUSE': 'TOTAL'})

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/api.py in get(self, resource_type, resource_id, agency, key, params, fromfile, tofile, url, get_footer_url, memcache, writer)
    242             # Retrieve the first URL in the footer, if any
    243             url_l = [
--> 244                 i for i in msg.footer.text if remote.requests.utils.urlparse(i).scheme]
    245             if url_l:
    246                 # found an URL. Wait and try to request it

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/model.py in text(self)
     61     @property
     62     def text(self):
---> 63         return self._reader.footer_text(self)
     64 
     65     @property

AttributeError: 'Reader' object has no attribute 'footer_text'

For the recors: before your commits of the last couple of weeks, the call still failed, but with a different error: an IndexError: list index out of range in writer/data2pandas.pyc in function write(self, source, asframe, dtype, attributes, reverse_obs, fromfreq, parse_time), at line level_names = list(index_source[0].name._fields)

Add Bank of Italy as new agency

Multiple .dist-info directories error

I am trying to install with pip install pandasdmx in a conda environment with Python3.5. I get this error message.

AssertionError: Multiple .dist-info directories: /Users/BadWizard/anaconda3/envs/py35/lib/python3.5/site-packages/pandaSDMX-0.5.dist-info, /Users/BadWizard/anaconda3/envs/py35/lib/python3.5/site-packages/pandaSDMX-0.5.1.dist-info

Problem with ``req.categoryscheme()``

req = Request('ESTAT')
req.categoryscheme()

raises AttributeError: __exit__ because http://ec.europa.eu/eurostat/SDMX/diss-web/rest/categoryscheme gives a "501 - Not Implemented" error.

Curiously, if the provider is ECB the problem doesn't arise... although it does in the automatically generated documentation!

ENH: Add in-browser query builder with selection from agencies, dataflows and codelists - based on Jupyter widgets

Many data providers offer web-based SDMX query builders to ease data access. The user may select data and metadata by browsing through dataflow lists and codelists. See for example
http://ec.europa.eu/eurostat/web/sdmx-web-services/query-builder

It would be great to have similar functionalities in the Jupyter notebook. The obvious choice is
http://jupyter.org/widgets.html

AttributeError at import

In branch v0.3.2dev, a simple

from pandasdmx import Request

yields the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-86e12c071071> in <module>()
----> 1 from pandasdmx import Request

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/__init__.py in <module>()
     15 
     16 
---> 17 from pandasdmx.api import Request
     18 
     19 

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/api.py in <module>()
     20 from pandasdmx import remote
     21 from pandasdmx.utils import str_type
---> 22 from pandasdmx.reader.sdmxml import SDMXMLReader
     23 from importlib import import_module
     24 from zipfile import ZipFile, is_zipfile

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/reader/sdmxml.py in <module>()
     15 
     16 from pandasdmx.utils import DictLike, namedtuple_factory
---> 17 from pandasdmx import model
     18 from pandasdmx.reader import BaseReader
     19 from lxml import etree

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/model.py in <module>()
    654 
    655 
--> 656 class StructureMessage(Message):
    657     _content_types = Message._content_types.copy()
    658     _content_types.extend([

/home/pietro/nobackup/repo/pandaSDMX/pandasdmx/model.py in StructureMessage()
    655 
    656 class StructureMessage(Message):
--> 657     _content_types = Message._content_types.copy()
    658     _content_types.extend([
    659         ('codelists', 'read_identifiables', Codelist, None),

AttributeError: 'list' object has no attribute 'copy'

ENH: return datasets into a xarray

Returning multiindexed datasets into a xarray would be very useful

Error when importing module

Hi - I am getting the following error on import:

File "/usr/local/lib/python2.7/dist-packages/pandaSDMX-0.6-py2.7.egg/pandasdmx/utils/anynamedtuple.py", line 156
SyntaxError: unqualified exec is not allowed in function 'namedtuple' it contains a nested function with free variables

I tried installing using pip and also using the setup.py from v0.6 from the git repo and get the same error.

Is there a different version/patch I should use?

BUG,DOC: long_description fails to render with pypi

https://pypi.org/project/pandasdmx/

https://github.com/pypa/readme_renderer/blob/master/README.rst#check-description-locally

You can check the long_description syntax w/:

twine check

Add support for proxy

Include proxy parameter of requests.get in REST() class and consequently in Request()

AssertionError: Multiple .dist-info directories: pandaSDMX-0.8 and 0.7.0

While trying to install this package I ran in to the issue in the title, full traceback here:

Exception:
Traceback (most recent call last):
  File "~/.virtualenvs/default/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "~/.virtualenvs/default/lib/python3.5/site-packages/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "~/.virtualenvs/default/lib/python3.5/site-packages/pip/req/req_set.py", line 784, in install
    **kwargs
  File "~/.virtualenvs/default/lib/python3.5/site-packages/pip/req/req_install.py", line 851, in install
    self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
  File "~/.virtualenvs/default/lib/python3.5/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
    isolated=self.isolated,
  File "~/.virtualenvs/default/lib/python3.5/site-packages/pip/wheel.py", line 345, in move_wheel_files
    clobber(source, lib_dir, True)
  File "~/.virtualenvs/default/lib/python3.5/site-packages/pip/wheel.py", line 305, in clobber
    ', '.join(info_dir))
AssertionError: Multiple .dist-info directories: ~/.virtualenvs/default/lib/python3.5/site-packages/pandaSDMX-0.8.dist-info, ~/.virtualenvs/default/lib/python3.5/site-packages/pandaSDMX-0.7.0.dist-info

To solve it I had to specify the version in pip to make installing work (version 0.8 didn't work),

pip install pandasSDMX==0.7.0

worked as far as I can tell.

Missing initialization of `series_index` when using OECD agency

Hi @dr-leo,

when using pandasdmx for a download of the OECD data, I am getting the following error.

To reproduce:

from pandasdmx import Request
symbol = "SNA_TABLE1/AUS+AUT+BEL+CAN+CZE+DNK+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ITA+JPN+KOR+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+ESP+SWE+CHE+TUR+GBR+USA+ARG+NMEC+BRA+BGR+CHL+CHN+COL+CRI+HRV+CYP+EST+IND+IDN+ISR+LVA+LTU+MKD+MLT+PER+ROU+RUS+SAU+SVN+ZAF+DEW+FRME+OTF+EU15+EU28+OECD+OECDE+EA19.B1_GA.C"
request = Request(agency='OECD', log_level=10)
response = request.data(symbol)
df = response.write()

The error:

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-45-2617ddf5597f> in <module>()
      3 request = Request(agency='OECD', log_level=10)
      4 response = request.data(symbol)
----> 5 df = response.write()

/opt/virtualenv/develop/lib/python3.5/site-packages/pandasdmx/api.py in write(self, source, **kwargs)
    633         if not source:
    634             source = self.msg
--> 635         return self._writer.write(source=source, **kwargs)
    636 
    637     def write_source(self, filename):

/opt/virtualenv/develop/lib/python3.5/site-packages/pandasdmx/writer/data2pandas.py in write(self, source, asframe, dtype, attributes, reverse_obs, fromfreq, parse_time)
    107                 series_list = list(s for s in self.iter_pd_series(
    108                     iter_series, dim_at_obs, dtype, attributes,
--> 109                     reverse_obs, fromfreq, parse_time))
    110                 if dtype and attributes:
    111                     # series_list is actually a list of pairs of series

/opt/virtualenv/develop/lib/python3.5/site-packages/pandasdmx/writer/data2pandas.py in <genexpr>(.0)
    105         else:
    106             if asframe:
--> 107                 series_list = list(s for s in self.iter_pd_series(
    108                     iter_series, dim_at_obs, dtype, attributes,
    109                     reverse_obs, fromfreq, parse_time))

/opt/virtualenv/develop/lib/python3.5/site-packages/pandasdmx/writer/data2pandas.py in iter_pd_series(self, iter_series, dim_at_obs, dtype, attributes, reverse_obs, fromfreq, parse_time)
    226             if dtype:
    227                 value_series = PD.Series(
--> 228                     obs_values, index=series_index, name=series.key)
    229 
    230             if attributes:

UnboundLocalError: local variable 'series_index' referenced before assignment

I'm using the master branch. The url is produced correctly and the data seems to be available in the response. However, the parsing in the writer fails.

If I understand it correctly, in lines 163-167 the frequency key should be detected. However, the response data has no frequency in the OECD data, but there is no else case for that. Specifically, the if in line 169 has no else. Therefore, the variable series_index is still not set in line 228. In my understanding, the simplest solution would be to add an else case after line 214 in order to close the if in line 169. Since there is no frequency available, one could just create a simple Index:

else:
    series_index = PD.Index(obs_dim, name=dim_at_obs)

I tried in locally and it seems to work.

Regards Rudi

ENH: semester and bi-monthly series

@dr-leo INSEE has some series on a semester basis (every 6 month) and on a bi-monthly basis (every 2 month)
pandas.Period() has no support for such frequencies. Do you see any workaround ?
Frequencies '6M' and '2M' don't see an option because then pandas.Period().ordinal increases by 6 (or 2) instead of 1, between adjacent periods.
(I'm working with @srault95)

ENH: add support for structure-specific datasets

As of v0.2.1, pandaSDMX only supports generic datasets. SDMX 2.1 offers a much more compact data format that is much more suitable for large datasets. Both model and sdmxml reader needs to be extended to support this format. This will involve small changes to remote.py and api.py to set the http header to tell the server that we want structure-specific rather than generic data. Also, the DSD would have to be downloaded on the fly.

Custom header in get() function

Hi,

https://github.com/dr-leo/pandaSDMX/blob/master/pandasdmx/api.py#L244

This code is never called because it would have to be equal to None headers in the get () function

Thanks,

Stéphane.

Problem with INSEE: freq isn't recognized

Hi,

First, thanks for your work!

Second, I'm having problem querying the INSEE. I'm following the guide:

insee = Request('INSEE')
data_response = insee.data(resource_id = 'DEFM-CAT', 
                           key={'CORRECTION': 'BRUT',
                                'CAT-DE': 'A+B+C+D+E',})
data = data_response.data

It seems that the FREQ isn't recognized:

> set(s.key.FREQ for s in data.series)
AttributeError: 'SeriesKey' object has no attribute 'FREQ'

> data_response.write()
UnboundLocalError: local variable 'freq_key' referenced before assignment

But if you look at http://www.bdm.insee.fr/series/sdmx/data/DEFM-CAT, there is the FREQ keyword, e.g.:

<Series CAT-DE="A" GEOGRAPHIE="FM" CORRECTION="BRUT" FREQ="M" IDBANK="001572432" TITLE="Demandeurs d'emploi inscrits en fin de mois à Pôle emploi - Catégorie A - France métropolitaine - Série brute" LAST_UPDATE="2016-11-24" UNIT_MEASURE="IND" UNIT_MULT="3" REF_AREA="FM" DECIMALS="1" TIME_PER_COLLECT="FIN">

Am I missing something obvious?

Thanks!

unqualified exec is not allowed

from pandasdmx import Request

gives me the error

File "/home/hristo/mlenv/local/lib/python2.7/site-packages/pandasdmx/utils/anynamedtuple.py", line 156 exec(class_definition, namespace) SyntaxError: unqualified exec is not allowed in function 'namedtuple' it contains a nested function with free variables

Here is the "pip freeze" output:

alembic==0.9.3
algopy==0.5.5
Babel==2.3.4
backports-abc==0.5
backports.csv==1.0.5
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
backports.weakref==1.0rc1
beautifulsoup4==4.4.1
bleach==1.5.0
blinker==1.3
certifi==2017.4.17
chardet==3.0.4
click==6.6
configparser==3.5.0
cycler==0.10.0
decorator==4.0.11
entrypoints==0.2.3
enum34==1.1.6
extras==0.0.3
fixtures==2.0.0
Flask==0.11.1
Flask-Babel==0.11.1
Flask-Gravatar==0.4.2
Flask-HTMLmin==1.2
Flask-Login==0.3.2
Flask-Mail==0.9.1
Flask-Migrate==2.0.3
Flask-Principal==0.4.0
Flask-Script==2.0.5
Flask-Security==1.7.5
Flask-SQLAlchemy==2.1
Flask-WTF==0.12
funcsigs==1.0.2
functools32==3.2.3.post2
html5lib==1.0b3
htmlmin==0.1.10
idna==2.5
importlib==1.0.4
inflection==0.3.1
ipykernel==4.6.1
ipython==5.4.1
ipython-genutils==0.2.0
ipywidgets==6.0.0
itsdangerous==0.24
Jinja2==2.7.3
jsonpath-rw==1.4.0
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.1.0
jupyter-console==5.1.0
jupyter-core==4.3.0
Keras==2.0.6
linecache2==1.0.0
lxml==3.8.0
Mako==1.0.6
Markdown==2.6.8
MarkupSafe==0.23
matplotlib==2.0.2
mistune==0.7.4
mock==2.0.0
more-itertools==3.2.0
nbconvert==5.2.1
nbformat==4.3.0
notebook==5.0.0
numdifftools==0.9.20
numpy==1.13.1
pandas==0.20.3
pandas-datareader==0.5.0
pandaSDMX==0.7.0
pandocfilters==1.4.1
passlib==1.6.2
pathlib2==2.3.0
patsy==0.4.1
pbr==1.9.1
pexpect==4.2.1
pgadmin4==1.5
pickleshare==0.7.4
ply==3.10
prompt-toolkit==1.0.14
protobuf==3.3.0
psycopg2==2.7.1
ptyprocess==0.5.2
pycrypto==2.6.1
pyflux==0.4.15
pyforecast==1.0
Pygments==2.2.0
pyparsing==2.2.0
pyrsistent==0.11.13
python-dateutil==2.6.1
python-editor==1.0.3
python-mimeparse==1.5.1
pytz==2017.2
PyYAML==3.12
pyzmq==16.0.2
qtconsole==4.3.0
Quandl==3.2.0
requests==2.17.3
requests-file==1.4.2
requests-ftp==0.3.1
rpy2==2.8.6
scandir==1.5
scikit-learn==0.18.2
scipy==0.19.1
seaborn==0.8
simplegeneric==0.8.1
simplejson==3.6.5
singledispatch==3.4.0.3
six==1.10.0
sklearn==0.0
speaklater==1.3
SQLAlchemy==1.0.14
sqlparse==0.1.19
statsmodels==0.8.0
subprocess32==3.2.7
tensorflow==1.2.1
terminado==0.6
testpath==0.3.1
testtools==2.3.0
Theano==0.9.0
tornado==4.5.1
tqdm==4.14.0
traceback2==1.4.0
traitlets==4.3.2
unittest2==1.1.0
urllib3==1.21.1
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.9.6
widgetsnbextension==2.0.0
WTForms==2.0.2
xgboost==0.6a2

Response too large due to client request

With some queries, the sdmx server responds with the following message ...

Message code="413"
Response too large due to client request
Query size exceeds maximum limit (1000000 entries).

-- Code to replicate error --
estat = client('Eurostat', 'eurodb.db')
db = estat.get_dataflows()

gdp_table = db.execute('SELECT * FROM ESTAT_dataflows WHERE title LIKE "%GDP%"')
gdp_list = gdp_table.fetchall()

bop = gdp_list[1]
df, md = estat.get_data(bop, '', concat = True) # Request is too large and this call fails silently.

In theory, the filters parameter in the get_data function should help reduce the size of the request.

DSD response with UNSD agency

Hello dr-leo,
I am trying to harvest some data from national accounts from the UNSD agency.
I tried the following code:

from pandasdmx import model, Request
unsd = Request('UNSD')
cat_response = unsd.categoryscheme()
print(cat_response.write().categoryscheme)
dsd_id = unsd.categoryscheme().dataflow.NA_MAIN.structure.id
dsd_response = unsd.datastructure(resource_id = dsd_id)

and I get the following error:
Python\Anaconda3\lib\site-packages\requests\models.py", line 928, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://data.un.org/WS/rest/datastructure/UNSD/NA_MAIN/latest/?references=all

Is there anything I am doing wrong?

Best,

Charlescad

SDMX 1.0 support?

Is it possible to add support for the 1.0 version of the standard used by the Federal Reserve Board? I assume this may be more involved than adding support for 2.0.

dr-leo / pandasdmx Goto Github PK

pandasdmx's People

Contributors

Stargazers

Watchers

Forkers

pandasdmx's Issues

Recommend Projects

Recommend Topics

Recommend Org