palewire / cpi Goto Github PK

View Code? Open in Web Editor NEW

127.0 7.0 23.0 253.07 MB

Quickly adjust U.S. dollars for inflation using the Consumer Price Index (CPI)

Home Page: https://palewi.re/docs/cpi/

License: MIT License

Makefile 0.01% Python 4.00% Jupyter Notebook 95.98% Shell 0.01%

python inflation cpi bls journalism money economics sqlite dollar dataset

cpi's Introduction

cpi's People

Contributors

Stargazers

Watchers

cpi's Issues

Autoupdate BLS data

It could work something like this:

Store somewhere in the library the datestamp of the last time the data was downloaded
Also store the last time you checked for it
Each time you import the library, or perhaps when you run inflate, check the datetime of the latest value in the CPI data
Calculate the time difference between that latest value and the download times
If those differences are greater a threshold (One month?), rerun the download routine.

Could that work?

Add support for other CPI series

Right now we only support CPI-U for the whole country. It would be cool to add more.

What is the right "latest" default value?

Right now it's the most recent year. Should it be the most recent month instead?

Validate series component keywords in the cli

Can the Python objects be cached after they are loaded?

Abstract eq back to the base model

inflate() year_or_month type error not catching years as string format

I'm using a dataset that stores dates as strings "YYYY-MM-DD". I sliced the first four characters, then inflate complained without providing useful error information.

Converting the year_or_month parameter to an integer and performing the check would solve this issue. Also, one could check if the type is a string and provide a more useful TypeError() msg.

    if type(year_or_month) != type(to):
        raise TypeError("Years can only be converted to other years. Months only to other months.")

https://github.com/datadesk/cpi/blob/36d049b45a3f318df97dbced56fbc12dd55415e2/cpi/__init__.py#L138

OperationalError: no such table: cu.data.3.AsizeNorthEast

I run this in Python

import cpi
cpi.update()

I get this error. It was working till 2 days back.

OperationalError Traceback (most recent call last)
/var/folders/8b/fjq89b5n05ldt2ytmw4pn5j80000gn/T/ipykernel_94105/2037719277.py in
----> 1 from economics import Inflation

~/opt/anaconda3/lib/python3.9/site-packages/economics/init.py in
----> 1 from cpi import CPI
2 from inflation import Inflation

~/opt/anaconda3/lib/python3.9/site-packages/cpi/init.py in
23 periods = parsers.ParsePeriod().parse()
24 periodicities = parsers.ParsePeriodicity().parse()
---> 25 series = parsers.ParseSeries(
26 periods=periods, periodicities=periodicities, areas=areas, items=items
27 ).parse()

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in parse(self)
165 def parse(self):
166 self.series_list = self.parse_series()
--> 167 self.parse_indexes()
168 return self.series_list
169

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in parse_indexes(self)
195 for file in self.FILE_LIST:
196 # ... and for each file ...
--> 197 for row in self.get_file(file):
198 # Get the series
199 series = self.series_list.get_by_id(row["series_id"])

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in get_file(self, file)
35
36 # Query this file
---> 37 query = cursor.execute(f'SELECT * FROM "{file}"')
38 columns = [d[0] for d in query.description]
39 result_list = [dict(zip(columns, r)) for r in query.fetchall()]

OperationalError: no such table: cu.data.3.AsizeNorthEast

April Labor Statistics Not Present

Hello, I've been using the package for monthly food inflation tracking -

cpi_df = cpi.series.get_by_id('CUUR0000SAF1').to_dataframe()
cpi_df_x = cpi_df.filter(col('period_type') == 'monthly')

It seems that the latest information present is March; however, the bureau released April's numbers on May 11th.

Can you help me understand the discrepancy?

Yes, I am also running the update function.
cpi.update()

sqlite3.OperationalError: no such table: cu.data.2.Summaries

I'm getting this error with the following code:

import cpi

cpi.update()

dataframe = cpi.series.get(seasonally_adjusted=False).to_dataframe()

print(dataframe)

This was working yesterday, any ideas?

never minded. fixed

fix

Error when importing CPI in ipython3 Jupyter notebook via Anaconda: "no such table: cu area"

Hi palewire,

I am trying to use the cpi library in an ipython3 Jupyter notebook running via Anaconda Navigator on a Windows 10 OS. I can successfully run pip install cpi, but when I import cpi I get the same "no such table: cu area" error message that is mentioned here: #62

For context, I can install and import other libraries through the Jupyter notebook. Also, I was able to both install and import the cpi library in January 2023. Is it possible that the solution you implemented for issues #62 is not translating into my coding environment?

Thank you!

Could the components of the series IDs be broken up and converted into humanized keyword arguments

I thinking about things like regions and seasonal adjustment

Documentation for using series component keywords

Store data globally, or somewhere configurable

If I understand what's happening, data lives and is updated here: https://github.com/datadesk/cpi/blob/master/cpi/data.csv

Two things make my queasy about this (with the caveat that I haven't used this in a project yet):

it's changing the codebase in flight, which is scary
if I have multiple installs, they could get out of sync, or I just end up with lots of copies of the same data

One way you could avoid that is having a global, or configurable, data cache. It might be $HOME/.python-cpi/data.csv by default, with the option to configure if you needed an isolated copy somewhere. The library could pre-populate the cache, or fall back on what's included in the codebase, or warn if the data is stale.

Accept months being input as datetime objects

Add a Django app to the library with a search interface

Error when importing CPI: sqlite3.OperationalError: no such table: cu.area

Version 1.0.5 works, but the newest verison 1.0.9 does not. It gives me this no such table error as seen below. It looks like it may be missing a DB file or some issues with the DB file its self:

Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cpi
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\site-packages\cpi\__init__.py", line 21, in <module>
    areas = parsers.ParseArea().parse()
  File "C:\ProgramData\Anaconda3\lib\site-packages\cpi\parsers.py", line 60, in parse
    for row in self.get_file("cu.area"):
  File "C:\ProgramData\Anaconda3\lib\site-packages\cpi\parsers.py", line 37, in get_file
    query = cursor.execute(f'SELECT * FROM "{file}"')
sqlite3.OperationalError: no such table: cu.area
>>>

Error in cpi.update()

Following is the traceback for the issue I faced during cpi.update():

Version of cpi used: 1.0.17

AssertionError Traceback (most recent call last)
Cell In [40], line 1
----> 1 cpi.update()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi_init_.py:163, in update()
157 def update():
158 """
159 Updates the Consumer Price Index dataset at the core of this library.
160
161 Requires an Internet connection.
162 """
--> 163 Downloader().update()

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi\download.py:75, in Downloader.update(self)
73 # Download the TSVs
74 logger.debug(f"Downloading {len(self.FILE_LIST)} files from the BLS")
---> 75 [self.get_tsv(file) for file in self.FILE_LIST]
77 # Insert the TSVs
78 logger.debug("Loading data into SQLite database")

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi\download.py:75, in (.0)
73 # Download the TSVs
74 logger.debug(f"Downloading {len(self.FILE_LIST)} files from the BLS")
---> 75 [self.get_tsv(file) for file in self.FILE_LIST]
77 # Insert the TSVs
78 logger.debug("Loading data into SQLite database")

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\cpi\download.py:109, in Downloader.get_tsv(self, file)
107 tsv_path = self.get_data_dir() / f"{file}.tsv"
108 response = requests.get(url)
--> 109 assert response.ok
110 with open(tsv_path, "w") as fp:
111 fp.write(response.text)

AssertionError:

Add support, testing and documentation for the Chained CPI

Command-line interface

MappingList should have the same dictionary-style lookup scheme as SeriesList

Add pandas documentation to the README

Very Slow to Load/Import

Python 3.6

Every time I import the library, it is extremely slow (~50 seconds).
I don't have time to review the codebase right now for a solution, so I am going to look for other libraries, but this really is a huge problem when you just want to rattle off some code quickly in the shell.

I tried with Python 3.7 from Anaconda distribution and normal 3.6, but both were exceedingly slow. I also tried calling cpi.update() and then closing and re-opening the shell and importing again, but it was still extremely slow.

sqlite3.OperationalError: near ")": syntax error

When I try to run cpi.update(), I get an error: 'sqlite3.OperationalError: near ")": syntax error'. I'm unsure how to update the underlying cpi tables. Even after I uninstalled and reinstalled, I only have data up until 2023-03-31.

Project seems dead, alternatives?

This project hasn't been updated in a long time, the data is incomplete and the inclusion of other currencies would be awesome. Does anybody know an alternative?

Humanize the names of series and link to the data

Abstract all the parsing back from the init file to a boot function

cpi.update() not updating in jupyter notebook

The website has up to October it seems (https://www.bls.gov/data/inflation_calculator.htm) but the library only has up to August. Running cpi.update() in the notebook doesn't seem to pull anything.

CLI options to list all areas, items, etc

Could be click subcommands

Create real documentation site that autodocs the models and methods

Aliases for common alternatives to the CPI-U

A good first one to add might be the CPI-U-RS preferred by the U.S. Census Bureau.

Add logging to the test routine

Add pandas example to README

It should apply across an entire dataframe and inflate a complete column.

An object oriented model for navigating the data

Ideally, the source CSVs should be rolled up into objects like Series and Index. Then we could begin to sort and filter different series based on attributes for tickets like #11 and #5.

Refactor downloader to remove pandas dependency

I wonder if csvkit could do the SQLite insert with less overhead.

Custom errors for get_by_id and other custom list class retrieval methods

latest year in time series

Hi,
I've ran cpi.update() before applying the CPI function to a quarterly time series and it appeared to work. But oddly there is a gap between my nominal numbers and those adjusted for the CPI change. How can I see what month and/or year is being used by default to adjust figures? If by default I'm using the 2018 CPI index, shouldn't my 2018 nominal and adjusted numbers be the same?
Thanks,
Jason
This is the code I used:
cpi.update()
qcew['cpi_total_qtrly_wages'] = qcew.apply(lambda x: cpi.inflate(x.total_qtrly_wages, x.int_year) , axis=1)

Method to get series using humanized inputs

Unable to download data

Currently BLS website is unavailable like this:

The library just parsing the error HTML page and there is no validation check whether the correct thing gets parsed from the website. Therefore, the importing of library is not working at all. Please add validation of parsed HTML page and a workaround for these kinds of problems (or give meaningful error messages on import).

adding support for datetime numpy arrays

While pd.apply() works for small datasets like the example from the docs

df['ADJUSTED'] = df.apply(lambda x: cpi.inflate(x.MEDIAN_HOUSEHOLD_INCOME, x.YEAR), axis=1)

it quickly falls apart if one tries to inflate long series because it inflates each value one at a time instead of taking advantage of numpy and pandas vectorization.

CPI already can handle numpy arrays and has both pandas and numpy as dependencies.

(100,000,000 rows in less than 2 seconds, pretty cool.)

The problem:

CPI takes year_or_month as either int or a date object and retrieves the corresponding source_index from cpi.db. This, as far as I understand, would need to be done for every item in the array therefore it would still be very time-consuming for very large datasets.

The solution:

I still don't have any solid solutions.

One way to approach this could be:

receive a numpy array of dates for year_or_month
- clean it so they all have 01 as day of month
grab the unique values in this array of dates
- even if you have 100,000,000 rows, you definitely don't have 100,000,000 different year-month combinations.
  - BLS' data goes back to 1913 (2017-1913=104 years, 104 * 12 = 1248 months + 10 months of 2018 as of now = 1258 unique values at most)
- create a numpy array of those values matching their date (or a dict() to later use .map() on the dates array.)
map the source_index values to the array of dates
- look up the CPI value for each of those unique dates and map it back to the original numpy array of dates
cpi.inflate() already just multiplies (value * target_index) / float(source_index)
- numpy will take care of the rest

Even though most likely one would be inflating values to one specific year or month, this method could be applied to both year_or_month and to to inflate a series of values from a series of dates to a different series of dates.

The use:

The particular use I came up with was normalizing different types of incomes from public use microdata. For example, if I go to ipums and grab ACS data from 2000-2016 for incomes (earned wages, household income, farm income, social security, etc).
There are only 16 distinct years but if I use pd.apply() it would go row by row and it would simply never end:

I don't have a experience with sqlite so I couldn't put together a proof of concept but I hope this explanation is helpful.

Autoupdate

Every time I have starting to execute the script, I need to wait for the update process. There should be an auto-update mechanism or a method to check whether the files are up to date.

sqlite3.OperationalError: near ")": syntax error

When I try to run cpi.update(), I get an error: sqlite3.OperationalError: near ")": syntax error. I'm unsure how to update the underlying cpi tables. Even after I uninstalled and reinstalled, I only have data up until 2023-03-31.

List areas, items etc in the documentation

Testing for using series component keywords

This should include verifying the series_id is correct

Add support for the "Urban Wage Earners and Clerical Workers" survey

Support for semiannual indexes

Operational error no such table: cu.area

When I try importing the CPI library after installation it gives me the error:
OperationalError Traceback (most recent call last)
/var/folders/cx/c0rnck312cxbw_4wbq7ylfw80000gn/T/ipykernel_38424/1241436912.py in
----> 1 import cpi

~/opt/anaconda3/lib/python3.9/site-packages/cpi/init.py in
19 # Parse data for use
20 logger.info("Parsing data files from the BLS")
---> 21 areas = parsers.ParseArea().parse()
22 items = parsers.ParseItem().parse()
23 periods = parsers.ParsePeriod().parse()

~/opt/anaconda3/lib/python3.9/site-packages/cpi/parsers.py in parse(self)
58 logger.debug("Parsing area file")
59 object_list = MappingList()
---> 60 for row in self.get_file("cu.area"):
61 obj = Area(row["area_code"], row["area_name"])
62 object_list.append(obj)

OperationalError: no such table: cu.area

Could someone please help me fix this error

Series objects should serialize to pandas with no manipulation necessary

I'd like to see something as simple as:

df = pd.DataFrame(cpi.series)

palewire / cpi Goto Github PK

cpi's Introduction

Links

cpi's People

Contributors

Stargazers

Watchers

Forkers

cpi's Issues

The problem:

The solution:

The use:

Recommend Projects

Recommend Topics

Recommend Org