cid-harvard / py-ecomplexity Goto Github PK

View Code? Open in Web Editor NEW

62.0 62.0 24.0 160 KB

Python package to compute economic complexity and associated variables

License: MIT License

Python 100.00%

ecomplexity economic-complexity economics international-trade networks trade

py-ecomplexity's People

Contributors

Stargazers

Watchers

py-ecomplexity's Issues

Remove array shapes from output display

Output of ecomplexity looks like:
2015
(235,)
(235, 1247)

Remove the lines corresponding to array shapes

ECI correlation with diversity

Great program!

I have a question about the program. I look at the code where you calculate the ECI and PCI. You change the sign (well depending on the resulting vector) so the ECI is always positive correlated with the diversity (and the corresponding PCI). You did this because the sign in the program could be different (of course the opposite sign vector is also the same eigenvector associated with the same eigenvalue and I think python give you the 1st unit length vector found) so doing this you ensure this feature. If you don’t do this, you could end with Japan being the less complex country according to the ECI. So you rule out all those cases.

I made a similar program in python for municipalities in Mexico. I did the same. Doing this, you can replicate the book results, correct?

Right now I am working in crime data. For me it is not clear that the ECI in this case should be always positive correlated with the diversity.

Do you know if exist a math assumption or equation so we can ensure that the ECI should be always positive correlated with diversity?

Thank you for your advice and time,

PCI calculation.

Dear team,

I was wondering about the PCI calculation. You have a equation:

 cdata.pci_t = (cdata.pci_t - cdata.eci_t.mean()) / cdata.eci_t.std()

cdata.eci_t = (cdata.eci_t - cdata.eci_t.mean()) / cdata.eci_t.std()

Shouldn't it be cdata.pci_t = (cdata.pci_t - cdata.pci_t.mean()) / cdata.pci_t.std() according to the https://growthlab.cid.harvard.edu/files/growthlab/files/atlas_2013_part1.pdf , page 24.?

RPOP - Handling zeros in diversity / ubiquity

Diversity / ubiquity is often 0 when calculating with RPOP. Need better handling of these cases, since ECI / PCI show up as NaN for all countries in these cases (the eigenvectors are returned as NaN's).

Allow for custom proximity matrix

Why are the eci computed with ecomplexity different from their given values?

Thank you for contributing this brilliant python package. I am trying to compute the eci (eci_ecomplexity_cal) using the country_sitcproduct2digit_year.csv data. In the country_sitcproduct2digit_year.csv data, there are given eci values (eci_hidalgo_rep) . I found that they do not line up exactly.

Further I find that the given eci values (eci_hidalgo_rep) has a better correlation with the GDP per capita compared with the eci computed using this python package (eci_ecomplexity_cal).

Need coi, cog variables

The following output var's are not created by the package. Can you please update the code to output these as well.

density
coi
cog
proximity

RCA calculation

Hello. Thanks for your great job.

I am confused when you are calculating the RCA.

            num = data_np / np.nansum(data_np, axis=1)[:, np.newaxis]
            
            loc_total = np.nansum(data_np, axis=0)[np.newaxis, :]
            world_total = np.nansum(loc_total, axis=1)[:, np.newaxis]
            den = loc_total / world_total
            self.rca_t = num / den

Should not it be something like "(product p value/ local_total c)/(product p total/world total) No?

PCI values do not line up with STATA ecomplexity package

Although the ECI values and ranks line up exactly with the STATA package, PCI values do not.

Improve performance using numba

Since we use a lot of loops and numpy matrices, numba might be worth looking into to enhance performance

this has an R version here

glad to have discovered this!! I have done something very similar with extensions from recent articles here
https://github.com/pachamaltese/economiccomplexity

Economic density over 1

I managed to produce economic density values greater than 1.
How I managed to do it:
I'm currently in the process of making economic complexity analysis for one of regions of Russian Federation.

I managed to get export and import volumes for particular region.
Substracted region export values from Russian exports and and region imports from Russian imports.
Introduced region as separate country.
Rest of the world was merged into "world" label

Run of ecomplexity produced values greater than 1 for some of goods(I use hs07 4-digits codes for that), which seams impossible - formula itself does not allow values greater than 1.

Data itself is here: https://www.dropbox.com/s/9h6kdny04w0kkt2/world_data_kgd.csv?dl=0

Help with the subnational data

Good day, huge thanks for your work! We are a group of data scientists trying to calculate ECI, PCI and proximity between products for specific country, but we are getting weird results, especially for the proximity. Therefore, we have several questions:

Is there any tuning needed for calculation of the proximity at subnational level?
Could you share the data that you used for your analysis?
Could you elaborate more on the calculation of the subnational data?
Is there anyone with whom we can get in contact to have q&a session?

Thank you in advance, your help will be very much appreciated!

Is COG equal to OG？

name 'ecomplexity' is not defined

I am currently running the init_ file and I the error, "NameError: name 'ecomplexity' is not defined", how would I solve this?

Fail to install

I tried to install using pip install ecomplexity and encountered an error below. Appreciate if you could help. Thank you

error: subprocess-exited-with-error

python setup.py egg_info did not run successfully.
exit code: 1

[10 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\user\AppData\Local\Temp\pip-install-nsk2tobg\ecomplexity_855f1ab254084b4ca68c464addf11bdc\setup.py", line 13, in
long_description=readme(),
^^^^^^^^
File "C:\Users\user\AppData\Local\Temp\pip-install-nsk2tobg\ecomplexity_855f1ab254084b4ca68c464addf11bdc\setup.py", line 6, in readme
return f.read()
^^^^^^^^
UnicodeDecodeError: 'cp932' codec can't decode byte 0x93 in position 3468: illegal multibyte sequence
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Display helpful error messages re RPOP

Incorrect usage of RPOP does not raise good error messages. For example, missing population values leads to lots of NA's but no good error messages.

Test RPOP comprehensively with edge cases
Display specific error messages

Possible mistake in the code of ecomplexity.py

Hi,

I found a possible mistake in line #209 of ecomplexity.py. According to the book The atlas of economic complexity, both ECI and PCI should be z-score normalized. While the code normalizes ECI correctly, resulting in mean=0 and std=1, the PCI does not.

In line #209, PCI is normalized using the mean and standard deviation of ECI. I thought it might be a mistake? Though it does not harm the rankings.

Thank you!

Add log-supermodularity checks and warnings if non-conformant

I got confusing ranking results from the sample code

Thank you for your effort to make this brilliant algorithm in Python!

However, I do have a few questions about the ECI and PCI algorithms.

I got confusing results with the sample codes.

First, if I run the exact code without any data filtering, small countries like MSR, WLF, BVT rank high but developed countries/regions like USA, HKG, SWE rank much lower. I think this situation is incorrect.
Second, I have tried to use only the top 100 countries with the highest total export values over 1995-2016, but the results were still weird. Panama, Vietnam, etc. ranked high but the ECI for the USA was still very small.

I am not sure if I use your code in the right way or it needs more sophisticated data preprocessing technics? Can you provide a sample to generate more reasonable results? e.g., countries like USA, JPN, CHN rank high as shown on https://atlas.cid.harvard.edu/rankings.

Thank you!

Install problem and some question

Thank you for sharing. When installing the package, I found that lines 22 and 23 of the code setup.py were missing commas, and there was a gbk encoding problem in the Readme.txt. I deleted some of the abnormal characters and the installation could be expected.
And during the final standardization, I found that the results I reproduced were inconsistent with the actual ones. This was because the degrees of freedom of the standard deviation used in ECI standardization were different. I think it is better to use n-1, not n.

Implement knn for density calculations

PCI normalization using wrong mean and standard deviation.

Good day, thank you a lot for this project

I came across with a problem that mean and std of the PCI is not 0 and 1. I have looked at the source code and realized that in the function ecomplexity (file ecomplexity.py) line 209,

# Normalize variables as per STATA package
cdata.pci_t = (cdata.pci_t - cdata.eci_t.mean()) / cdata.eci_t.std()
cdata.cog_t = cdata.cog_t / cdata.eci_t.std()
cdata.eci_t = (cdata.eci_t - cdata.eci_t.mean()) / cdata.eci_t.std()

PCI data normalized by the ECI mean and std. Is it mistake or it is done purposefully ? For now my pci ranges from [-10,6] which seems very unusual. Thanks !

Explicitly specify "name" in MultiIndex.from_product

In line 69 of proximity.py, better to specify names explicitly

py-ecomplexity/ecomplexity/proximity.py

Line 69 in 05076fb

output_index = pd.MultiIndex.from_product([cdata.data_t.index.levels[1],

After upgrading to pandas 1.0+, pd.MultiIndex.from_product tries to infer names, which seems to cause error when running reset_index later.

ecomplexity output does not conform with atlas dataverse results using R reticulate

Apologies in advance for the not-so reproducible example. I couldn't find a way around the name/email requirements of the dataverse. I am using reticulate in R to run ecomplexity.

Data published on the Harvard Economic Complexity Dataverse has pre-calculated complexity indicators. The country_hsproduct4digit_year data from https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/T4CHWJ/4RG21Y&version=3.0 has columns: location_id, product_id, year, export_value, import_value, export_rca, product_status, cog, distance, normalized_distance, normalized_cog, normalized_pci, export_rpop, is_new, hs_eci, hs_coi, pci, location_code, hs_product_code.

Using only the location_code, hs_product_code, export_value and year columns from that data as input to ecomplexity yields different values for all of the calculated indicators. As an example, the atlas data has an hs_eci for ABW in 1995 as -0.468138129. When calculating complexity indicators from the atlas data the eci for ABW in 1995 is calculated as -0.1471911.

Is the data published on the Harvard Dataverse created using a different method?

cid-harvard / py-ecomplexity Goto Github PK

py-ecomplexity's People

Contributors

Stargazers

Watchers

Forkers

py-ecomplexity's Issues

Recommend Projects

Recommend Topics

Recommend Org