cid-harvard / py-ecomplexity Goto Github PK
View Code? Open in Web Editor NEWPython package to compute economic complexity and associated variables
License: MIT License
Python package to compute economic complexity and associated variables
License: MIT License
Output of ecomplexity
looks like:
2015
(235,)
(235, 1247)
Remove the lines corresponding to array shapes
Great program!
I have a question about the program. I look at the code where you calculate the ECI and PCI. You change the sign (well depending on the resulting vector) so the ECI is always positive correlated with the diversity (and the corresponding PCI). You did this because the sign in the program could be different (of course the opposite sign vector is also the same eigenvector associated with the same eigenvalue and I think python give you the 1st unit length vector found) so doing this you ensure this feature. If you don’t do this, you could end with Japan being the less complex country according to the ECI. So you rule out all those cases.
I made a similar program in python for municipalities in Mexico. I did the same. Doing this, you can replicate the book results, correct?
Right now I am working in crime data. For me it is not clear that the ECI in this case should be always positive correlated with the diversity.
Do you know if exist a math assumption or equation so we can ensure that the ECI should be always positive correlated with diversity?
Thank you for your advice and time,
Dear team,
I was wondering about the PCI calculation. You have a equation:
cdata.pci_t = (cdata.pci_t - cdata.eci_t.mean()) / cdata.eci_t.std()
cdata.eci_t = (cdata.eci_t - cdata.eci_t.mean()) / cdata.eci_t.std()
Shouldn't it be cdata.pci_t = (cdata.pci_t - cdata.pci_t.mean()) / cdata.pci_t.std()
according to the https://growthlab.cid.harvard.edu/files/growthlab/files/atlas_2013_part1.pdf , page 24.?
Diversity / ubiquity is often 0 when calculating with RPOP. Need better handling of these cases, since ECI / PCI show up as NaN for all countries in these cases (the eigenvectors are returned as NaN's).
Thank you for contributing this brilliant python package. I am trying to compute the eci (eci_ecomplexity_cal
) using the country_sitcproduct2digit_year.csv
data. In the country_sitcproduct2digit_year.csv
data, there are given eci values (eci_hidalgo_rep
) . I found that they do not line up exactly.
Further I find that the given eci values (eci_hidalgo_rep
) has a better correlation with the GDP per capita compared with the eci computed using this python package (eci_ecomplexity_cal
).
The following output var's are not created by the package. Can you please update the code to output these as well.
Hello. Thanks for your great job.
I am confused when you are calculating the RCA.
num = data_np / np.nansum(data_np, axis=1)[:, np.newaxis]
loc_total = np.nansum(data_np, axis=0)[np.newaxis, :]
world_total = np.nansum(loc_total, axis=1)[:, np.newaxis]
den = loc_total / world_total
self.rca_t = num / den
Should not it be something like "(product p value/ local_total c)/(product p total/world total) No?
Although the ECI values and ranks line up exactly with the STATA package, PCI values do not.
Since we use a lot of loops and numpy matrices, numba might be worth looking into to enhance performance
hi
glad to have discovered this!! I have done something very similar with extensions from recent articles here
https://github.com/pachamaltese/economiccomplexity
I managed to produce economic density values greater than 1.
How I managed to do it:
I'm currently in the process of making economic complexity analysis for one of regions of Russian Federation.
Run of ecomplexity
produced values greater than 1 for some of goods(I use hs07 4-digits codes for that), which seams impossible - formula itself does not allow values greater than 1.
Data itself is here: https://www.dropbox.com/s/9h6kdny04w0kkt2/world_data_kgd.csv?dl=0
Good day, huge thanks for your work! We are a group of data scientists trying to calculate ECI, PCI and proximity between products for specific country, but we are getting weird results, especially for the proximity. Therefore, we have several questions:
Is there any tuning needed for calculation of the proximity at subnational level?
Could you share the data that you used for your analysis?
Could you elaborate more on the calculation of the subnational data?
Is there anyone with whom we can get in contact to have q&a session?
Thank you in advance, your help will be very much appreciated!
I am currently running the init_ file and I the error, "NameError: name 'ecomplexity' is not defined", how would I solve this?
I tried to install using pip install ecomplexity and encountered an error below. Appreciate if you could help. Thank you
error: subprocess-exited-with-error
python setup.py egg_info did not run successfully.
exit code: 1
[10 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\user\AppData\Local\Temp\pip-install-nsk2tobg\ecomplexity_855f1ab254084b4ca68c464addf11bdc\setup.py", line 13, in
long_description=readme(),
^^^^^^^^
File "C:\Users\user\AppData\Local\Temp\pip-install-nsk2tobg\ecomplexity_855f1ab254084b4ca68c464addf11bdc\setup.py", line 6, in readme
return f.read()
^^^^^^^^
UnicodeDecodeError: 'cp932' codec can't decode byte 0x93 in position 3468: illegal multibyte sequence
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Encountered error while generating package metadata.
See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Incorrect usage of RPOP does not raise good error messages. For example, missing population values leads to lots of NA's but no good error messages.
Hi,
I found a possible mistake in line #209 of ecomplexity.py
. According to the book The atlas of economic complexity, both ECI
and PCI
should be z-score normalized. While the code normalizes ECI
correctly, resulting in mean=0 and std=1, the PCI
does not.
In line #209, PCI
is normalized using the mean and standard deviation of ECI
. I thought it might be a mistake? Though it does not harm the rankings.
Thank you!
Thank you for your effort to make this brilliant algorithm in Python!
However, I do have a few questions about the ECI and PCI algorithms.
I got confusing results with the sample codes.
I am not sure if I use your code in the right way or it needs more sophisticated data preprocessing technics? Can you provide a sample to generate more reasonable results? e.g., countries like USA, JPN, CHN rank high as shown on https://atlas.cid.harvard.edu/rankings.
Thank you!
Thank you for sharing. When installing the package, I found that lines 22 and 23 of the code setup.py were missing commas, and there was a gbk encoding problem in the Readme.txt. I deleted some of the abnormal characters and the installation could be expected.
And during the final standardization, I found that the results I reproduced were inconsistent with the actual ones. This was because the degrees of freedom of the standard deviation used in ECI standardization were different. I think it is better to use n-1, not n.
Good day, thank you a lot for this project
I came across with a problem that mean and std of the PCI is not 0 and 1. I have looked at the source code and realized that in the function ecomplexity (file ecomplexity.py) line 209,
# Normalize variables as per STATA package
cdata.pci_t = (cdata.pci_t - cdata.eci_t.mean()) / cdata.eci_t.std()
cdata.cog_t = cdata.cog_t / cdata.eci_t.std()
cdata.eci_t = (cdata.eci_t - cdata.eci_t.mean()) / cdata.eci_t.std()
PCI data normalized by the ECI mean and std. Is it mistake or it is done purposefully ? For now my pci ranges from [-10,6] which seems very unusual. Thanks !
In line 69 of proximity.py, better to specify names explicitly
py-ecomplexity/ecomplexity/proximity.py
Line 69 in 05076fb
After upgrading to pandas 1.0+, pd.MultiIndex.from_product tries to infer names, which seems to cause error when running reset_index later.
Apologies in advance for the not-so reproducible example. I couldn't find a way around the name/email requirements of the dataverse. I am using reticulate in R to run ecomplexity.
Data published on the Harvard Economic Complexity Dataverse has pre-calculated complexity indicators. The country_hsproduct4digit_year
data from https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/T4CHWJ/4RG21Y&version=3.0 has columns: location_id, product_id, year, export_value, import_value, export_rca, product_status, cog, distance, normalized_distance, normalized_cog, normalized_pci, export_rpop, is_new, hs_eci, hs_coi, pci, location_code, hs_product_code
.
Using only the location_code, hs_product_code, export_value and year
columns from that data as input to ecomplexity yields different values for all of the calculated indicators. As an example, the atlas data has an hs_eci
for ABW
in 1995
as -0.468138129
. When calculating complexity indicators from the atlas data the eci
for ABW
in 1995
is calculated as -0.1471911
.
Is the data published on the Harvard Dataverse created using a different method?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.