Giter VIP home page Giter VIP logo

cpi's Introduction

cpi's People

Contributors

dependabot[bot] avatar palewire avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cpi's Issues

Autoupdate BLS data

It could work something like this:

  1. Store somewhere in the library the datestamp of the last time the data was downloaded
  2. Also store the last time you checked for it
  3. Each time you import the library, or perhaps when you run inflate, check the datetime of the latest value in the CPI data
  4. Calculate the time difference between that latest value and the download times
  5. If those differences are greater a threshold (One month?), rerun the download routine.

Could that work?

inflate() year_or_month type error not catching years as string format

I'm using a dataset that stores dates as strings "YYYY-MM-DD". I sliced the first four characters, then inflate complained without providing useful error information.

Converting the year_or_month parameter to an integer and performing the check would solve this issue. Also, one could check if the type is a string and provide a more useful TypeError() msg.

    if type(year_or_month) != type(to):
        raise TypeError("Years can only be converted to other years. Months only to other months.")

https://github.com/datadesk/cpi/blob/36d049b45a3f318df97dbced56fbc12dd55415e2/cpi/__init__.py#L138

OperationalError: no such table: cu.data.3.AsizeNorthEast

I run this in Python

import cpi
cpi.update()

I get this error. It was working till 2 days back.


OperationalError Traceback (most recent call last)
/var/folders/8b/fjq89b5n05ldt2ytmw4pn5j80000gn/T/ipykernel_94105/2037719277.py in
----> 1 from economics import Inflation

~/opt/anaconda3/lib/python3.9/site-packages/economics/init.py in
----> 1 from cpi import CPI
2 from inflation import Inflation

~/opt/anaconda3/lib/python3.9/site-packages/cpi/init.py in
23 periods = parsers.ParsePeriod().parse()
24 periodicities = parsers.ParsePeriodicity().parse()
---> 25 series = parsers.ParseSeries(
26 periods=periods, periodicities=periodicities, areas=areas, items=items
27 ).parse()

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in parse(self)
165 def parse(self):
166 self.series_list = self.parse_series()
--> 167 self.parse_indexes()
168 return self.series_list
169

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in parse_indexes(self)
195 for file in self.FILE_LIST:
196 # ... and for each file ...
--> 197 for row in self.get_file(file):
198 # Get the series
199 series = self.series_list.get_by_id(row["series_id"])

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in get_file(self, file)
35
36 # Query this file
---> 37 query = cursor.execute(f'SELECT * FROM "{file}"')
38 columns = [d[0] for d in query.description]
39 result_list = [dict(zip(columns, r)) for r in query.fetchall()]

OperationalError: no such table: cu.data.3.AsizeNorthEast

April Labor Statistics Not Present

Hello, I've been using the package for monthly food inflation tracking -

cpi_df = cpi.series.get_by_id('CUUR0000SAF1').to_dataframe()
cpi_df_x = cpi_df.filter(col('period_type') == 'monthly')

It seems that the latest information present is March; however, the bureau released April's numbers on May 11th.

Can you help me understand the discrepancy?

Yes, I am also running the update function.
cpi.update()

Error when importing CPI in ipython3 Jupyter notebook via Anaconda: "no such table: cu area"

Hi palewire,

I am trying to use the cpi library in an ipython3 Jupyter notebook running via Anaconda Navigator on a Windows 10 OS. I can successfully run pip install cpi, but when I import cpi I get the same "no such table: cu area" error message that is mentioned here: #62

For context, I can install and import other libraries through the Jupyter notebook. Also, I was able to both install and import the cpi library in January 2023. Is it possible that the solution you implemented for issues #62 is not translating into my coding environment?
image

Thank you!

Store data globally, or somewhere configurable

If I understand what's happening, data lives and is updated here: https://github.com/datadesk/cpi/blob/master/cpi/data.csv

Two things make my queasy about this (with the caveat that I haven't used this in a project yet):

  • it's changing the codebase in flight, which is scary
  • if I have multiple installs, they could get out of sync, or I just end up with lots of copies of the same data

One way you could avoid that is having a global, or configurable, data cache. It might be $HOME/.python-cpi/data.csv by default, with the option to configure if you needed an isolated copy somewhere. The library could pre-populate the cache, or fall back on what's included in the codebase, or warn if the data is stale.

Error when importing CPI: sqlite3.OperationalError: no such table: cu.area

Version 1.0.5 works, but the newest verison 1.0.9 does not. It gives me this no such table error as seen below. It looks like it may be missing a DB file or some issues with the DB file its self:

Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cpi
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\cpi\__init__.py", line 21, in <module>
    areas = parsers.ParseArea().parse()
  File "C:\ProgramData\Anaconda3\lib\site-packages\cpi\parsers.py", line 60, in parse
    for row in self.get_file("cu.area"):
  File "C:\ProgramData\Anaconda3\lib\site-packages\cpi\parsers.py", line 37, in get_file
    query = cursor.execute(f'SELECT * FROM "{file}"')
sqlite3.OperationalError: no such table: cu.area
>>>

Error in cpi.update()

Following is the traceback for the issue I faced during cpi.update():

Version of cpi used: 1.0.17


AssertionError Traceback (most recent call last)
Cell In [40], line 1
----> 1 cpi.update()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi_init_.py:163, in update()
157 def update():
158 """
159 Updates the Consumer Price Index dataset at the core of this library.
160
161 Requires an Internet connection.
162 """
--> 163 Downloader().update()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi\download.py:75, in Downloader.update(self)
73 # Download the TSVs
74 logger.debug(f"Downloading {len(self.FILE_LIST)} files from the BLS")
---> 75 [self.get_tsv(file) for file in self.FILE_LIST]
77 # Insert the TSVs
78 logger.debug("Loading data into SQLite database")

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi\download.py:75, in (.0)
73 # Download the TSVs
74 logger.debug(f"Downloading {len(self.FILE_LIST)} files from the BLS")
---> 75 [self.get_tsv(file) for file in self.FILE_LIST]
77 # Insert the TSVs
78 logger.debug("Loading data into SQLite database")

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi\download.py:109, in Downloader.get_tsv(self, file)
107 tsv_path = self.get_data_dir() / f"{file}.tsv"
108 response = requests.get(url)
--> 109 assert response.ok
110 with open(tsv_path, "w") as fp:
111 fp.write(response.text)

AssertionError:

Very Slow to Load/Import

Python 3.6

Every time I import the library, it is extremely slow (~50 seconds).
I don't have time to review the codebase right now for a solution, so I am going to look for other libraries, but this really is a huge problem when you just want to rattle off some code quickly in the shell.

I tried with Python 3.7 from Anaconda distribution and normal 3.6, but both were exceedingly slow. I also tried calling cpi.update() and then closing and re-opening the shell and importing again, but it was still extremely slow.

sqlite3.OperationalError: near ")": syntax error

When I try to run cpi.update(), I get an error: 'sqlite3.OperationalError: near ")": syntax error'. I'm unsure how to update the underlying cpi tables. Even after I uninstalled and reinstalled, I only have data up until 2023-03-31.

Project seems dead, alternatives?

This project hasn't been updated in a long time, the data is incomplete and the inclusion of other currencies would be awesome. Does anybody know an alternative?

latest year in time series

Hi,
I've ran cpi.update() before applying the CPI function to a quarterly time series and it appeared to work. But oddly there is a gap between my nominal numbers and those adjusted for the CPI change. How can I see what month and/or year is being used by default to adjust figures? If by default I'm using the 2018 CPI index, shouldn't my 2018 nominal and adjusted numbers be the same?
Thanks,
Jason
This is the code I used:
cpi.update()
qcew['cpi_total_qtrly_wages'] = qcew.apply(lambda x: cpi.inflate(x.total_qtrly_wages, x.int_year) , axis=1)

Unable to download data

Currently BLS website is unavailable like this:
Screenshot from 2020-08-08 17-47-17

The library just parsing the error HTML page and there is no validation check whether the correct thing gets parsed from the website. Therefore, the importing of library is not working at all. Please add validation of parsed HTML page and a workaround for these kinds of problems (or give meaningful error messages on import).

adding support for datetime numpy arrays

While pd.apply() works for small datasets like the example from the docs

df['ADJUSTED'] = df.apply(lambda x: cpi.inflate(x.MEDIAN_HOUSEHOLD_INCOME, x.YEAR), axis=1)

it quickly falls apart if one tries to inflate long series because it inflates each value one at a time instead of taking advantage of numpy and pandas vectorization.

CPI already can handle numpy arrays and has both pandas and numpy as dependencies.
cpi_incomes
(100,000,000 rows in less than 2 seconds, pretty cool.)

The problem:

CPI takes year_or_month as either int or a date object and retrieves the corresponding source_index from cpi.db. This, as far as I understand, would need to be done for every item in the array therefore it would still be very time-consuming for very large datasets.

The solution:

I still don't have any solid solutions.

One way to approach this could be:

  1. receive a numpy array of dates for year_or_month

    • clean it so they all have 01 as day of month
      cpi_dates
  2. grab the unique values in this array of dates

    • even if you have 100,000,000 rows, you definitely don't have 100,000,000 different year-month combinations.
      • BLS' data goes back to 1913 (2017-1913=104 years, 104 * 12 = 1248 months + 10 months of 2018 as of now = 1258 unique values at most)
    • create a numpy array of those values matching their date (or a dict() to later use .map() on the dates array.)
  3. map the source_index values to the array of dates

    • look up the CPI value for each of those unique dates and map it back to the original numpy array of dates
  4. cpi.inflate() already just multiplies (value * target_index) / float(source_index)

    • numpy will take care of the rest

Even though most likely one would be inflating values to one specific year or month, this method could be applied to both year_or_month and to to inflate a series of values from a series of dates to a different series of dates.

The use:

The particular use I came up with was normalizing different types of incomes from public use microdata. For example, if I go to ipums and grab ACS data from 2000-2016 for incomes (earned wages, household income, farm income, social security, etc).
There are only 16 distinct years but if I use pd.apply() it would go row by row and it would simply never end:
acs2000-16


I don't have a experience with sqlite so I couldn't put together a proof of concept but I hope this explanation is helpful.

Autoupdate

Every time I have starting to execute the script, I need to wait for the update process. There should be an auto-update mechanism or a method to check whether the files are up to date.

sqlite3.OperationalError: near ")": syntax error

When I try to run cpi.update(), I get an error: sqlite3.OperationalError: near ")": syntax error. I'm unsure how to update the underlying cpi tables. Even after I uninstalled and reinstalled, I only have data up until 2023-03-31.

Operational error no such table: cu.area

When I try importing the CPI library after installation it gives me the error:
OperationalError Traceback (most recent call last)
/var/folders/cx/c0rnck312cxbw_4wbq7ylfw80000gn/T/ipykernel_38424/1241436912.py in
----> 1 import cpi

~/opt/anaconda3/lib/python3.9/site-packages/cpi/init.py in
19 # Parse data for use
20 logger.info("Parsing data files from the BLS")
---> 21 areas = parsers.ParseArea().parse()
22 items = parsers.ParseItem().parse()
23 periods = parsers.ParsePeriod().parse()

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in parse(self)
58 logger.debug("Parsing area file")
59 object_list = MappingList()
---> 60 for row in self.get_file("cu.area"):
61 obj = Area(row["area_code"], row["area_name"])
62 object_list.append(obj)

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in get_file(self, file)
35
36 # Query this file
---> 37 query = cursor.execute(f'SELECT * FROM "{file}"')
38 columns = [d[0] for d in query.description]
39 result_list = [dict(zip(columns, r)) for r in query.fetchall()]

OperationalError: no such table: cu.area

Could someone please help me fix this error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.