Giter VIP home page Giter VIP logo

py-ecomplexity's People

Contributors

bleonard33 avatar complexly avatar shreyasgm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-ecomplexity's Issues

ECI correlation with diversity

Great program!

I have a question about the program. I look at the code where you calculate the ECI and PCI. You change the sign (well depending on the resulting vector) so the ECI is always positive correlated with the diversity (and the corresponding PCI). You did this because the sign in the program could be different (of course the opposite sign vector is also the same eigenvector associated with the same eigenvalue and I think python give you the 1st unit length vector found) so doing this you ensure this feature. If you don’t do this, you could end with Japan being the less complex country according to the ECI. So you rule out all those cases.

I made a similar program in python for municipalities in Mexico. I did the same. Doing this, you can replicate the book results, correct?

Right now I am working in crime data. For me it is not clear that the ECI in this case should be always positive correlated with the diversity.

Do you know if exist a math assumption or equation so we can ensure that the ECI should be always positive correlated with diversity?

Thank you for your advice and time,

RPOP - Handling zeros in diversity / ubiquity

Diversity / ubiquity is often 0 when calculating with RPOP. Need better handling of these cases, since ECI / PCI show up as NaN for all countries in these cases (the eigenvectors are returned as NaN's).

Why are the eci computed with ecomplexity different from their given values?

Thank you for contributing this brilliant python package. I am trying to compute the eci (eci_ecomplexity_cal) using the country_sitcproduct2digit_year.csv data. In the country_sitcproduct2digit_year.csv data, there are given eci values (eci_hidalgo_rep) . I found that they do not line up exactly.

image

Further I find that the given eci values (eci_hidalgo_rep) has a better correlation with the GDP per capita compared with the eci computed using this python package (eci_ecomplexity_cal).

image

Need coi, cog variables

The following output var's are not created by the package. Can you please update the code to output these as well.

  • density
  • coi
  • cog
  • proximity

RCA calculation

Hello. Thanks for your great job.

I am confused when you are calculating the RCA.

            num = data_np / np.nansum(data_np, axis=1)[:, np.newaxis]
            
            loc_total = np.nansum(data_np, axis=0)[np.newaxis, :]
            world_total = np.nansum(loc_total, axis=1)[:, np.newaxis]
            den = loc_total / world_total
            self.rca_t = num / den

Should not it be something like "(product p value/ local_total c)/(product p total/world total) No?

Economic density over 1

I managed to produce economic density values greater than 1.
How I managed to do it:
I'm currently in the process of making economic complexity analysis for one of regions of Russian Federation.

  • I managed to get export and import volumes for particular region.
  • Substracted region export values from Russian exports and and region imports from Russian imports.
  • Introduced region as separate country.
  • Rest of the world was merged into "world" label

Run of ecomplexity produced values greater than 1 for some of goods(I use hs07 4-digits codes for that), which seams impossible - formula itself does not allow values greater than 1.

Data itself is here: https://www.dropbox.com/s/9h6kdny04w0kkt2/world_data_kgd.csv?dl=0

Help with the subnational data

Good day, huge thanks for your work! We are a group of data scientists trying to calculate ECI, PCI and proximity between products for specific country, but we are getting weird results, especially for the proximity. Therefore, we have several questions:

Is there any tuning needed for calculation of the proximity at subnational level?
Could you share the data that you used for your analysis?
Could you elaborate more on the calculation of the subnational data?
Is there anyone with whom we can get in contact to have q&a session?

Thank you in advance, your help will be very much appreciated!

Fail to install

I tried to install using pip install ecomplexity and encountered an error below. Appreciate if you could help. Thank you

error: subprocess-exited-with-error

python setup.py egg_info did not run successfully.
exit code: 1

[10 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\user\AppData\Local\Temp\pip-install-nsk2tobg\ecomplexity_855f1ab254084b4ca68c464addf11bdc\setup.py", line 13, in
long_description=readme(),
^^^^^^^^
File "C:\Users\user\AppData\Local\Temp\pip-install-nsk2tobg\ecomplexity_855f1ab254084b4ca68c464addf11bdc\setup.py", line 6, in readme
return f.read()
^^^^^^^^
UnicodeDecodeError: 'cp932' codec can't decode byte 0x93 in position 3468: illegal multibyte sequence
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Display helpful error messages re RPOP

Incorrect usage of RPOP does not raise good error messages. For example, missing population values leads to lots of NA's but no good error messages.

  • Test RPOP comprehensively with edge cases
  • Display specific error messages

Possible mistake in the code of ecomplexity.py

Hi,

I found a possible mistake in line #209 of ecomplexity.py. According to the book The atlas of economic complexity, both ECI and PCI should be z-score normalized. While the code normalizes ECI correctly, resulting in mean=0 and std=1, the PCI does not.

In line #209, PCI is normalized using the mean and standard deviation of ECI. I thought it might be a mistake? Though it does not harm the rankings.

Thank you!

I got confusing ranking results from the sample code

Thank you for your effort to make this brilliant algorithm in Python!

However, I do have a few questions about the ECI and PCI algorithms.

I got confusing results with the sample codes.

  • First, if I run the exact code without any data filtering, small countries like MSR, WLF, BVT rank high but developed countries/regions like USA, HKG, SWE rank much lower. I think this situation is incorrect.
  • Second, I have tried to use only the top 100 countries with the highest total export values over 1995-2016, but the results were still weird. Panama, Vietnam, etc. ranked high but the ECI for the USA was still very small.

I am not sure if I use your code in the right way or it needs more sophisticated data preprocessing technics? Can you provide a sample to generate more reasonable results? e.g., countries like USA, JPN, CHN rank high as shown on https://atlas.cid.harvard.edu/rankings.

Thank you!

Install problem and some question

Thank you for sharing. When installing the package, I found that lines 22 and 23 of the code setup.py were missing commas, and there was a gbk encoding problem in the Readme.txt. I deleted some of the abnormal characters and the installation could be expected.
And during the final standardization, I found that the results I reproduced were inconsistent with the actual ones. This was because the degrees of freedom of the standard deviation used in ECI standardization were different. I think it is better to use n-1, not n.

PCI normalization using wrong mean and standard deviation.

Good day, thank you a lot for this project

I came across with a problem that mean and std of the PCI is not 0 and 1. I have looked at the source code and realized that in the function ecomplexity (file ecomplexity.py) line 209,

# Normalize variables as per STATA package
cdata.pci_t = (cdata.pci_t - cdata.eci_t.mean()) / cdata.eci_t.std()
cdata.cog_t = cdata.cog_t / cdata.eci_t.std()
cdata.eci_t = (cdata.eci_t - cdata.eci_t.mean()) / cdata.eci_t.std()

PCI data normalized by the ECI mean and std. Is it mistake or it is done purposefully ? For now my pci ranges from [-10,6] which seems very unusual. Thanks !

ecomplexity output does not conform with atlas dataverse results using R reticulate

Apologies in advance for the not-so reproducible example. I couldn't find a way around the name/email requirements of the dataverse. I am using reticulate in R to run ecomplexity.

Data published on the Harvard Economic Complexity Dataverse has pre-calculated complexity indicators. The country_hsproduct4digit_year data from https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/T4CHWJ/4RG21Y&version=3.0 has columns: location_id, product_id, year, export_value, import_value, export_rca, product_status, cog, distance, normalized_distance, normalized_cog, normalized_pci, export_rpop, is_new, hs_eci, hs_coi, pci, location_code, hs_product_code.

Using only the location_code, hs_product_code, export_value and year columns from that data as input to ecomplexity yields different values for all of the calculated indicators. As an example, the atlas data has an hs_eci for ABW in 1995 as -0.468138129. When calculating complexity indicators from the atlas data the eci for ABW in 1995 is calculated as -0.1471911.

Is the data published on the Harvard Dataverse created using a different method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.