Giter VIP home page Giter VIP logo

ainet-gnn-trade21's Introduction

AINET-GNN-Trade-2021

Artificial Intelligence Network Explanation of Trade (AINET)

Data Preparation

Due to license restrictions, the raw data cannot be shared directly. Rather we provide scripts to download and process the data from the UN Comtrade website. The node data is comprised of 53 countries across 70 time periods from October 2014 to July 2020. The countries and time period were selected based on cross-sectional data availability for soybeans. Belarus was excluded from the dataset based on the lack of trading partner information; the model performance increased after its removal.

Files

  • data_prep/uncomtrade_data_pull.py
  • data_prep/uncomtrade_data_process.py

Data Statistics

Node data summary of 5,568 observations:

Variable Trade Value Net Weight TUV (lead 1 month) Edge Connections
Min 4.6e1 5.0e1 0.00 1
Mean 6.9e7 1.6e8 0.62 8.2
Median 4.2e6 9.4e6 0.50 5
Max 4.4e9 1.2e10 20.51 36

Edge data summary of 40,162 observations:

Variable Trade Value Net Weight
Min 1.0e0 5.0e1
Mean 8.4e6 1.9e7
Median 3.2e4 3.7e4
Max 4.0e9 1.2e10

Alternative Modeling

ARIMA and OLS models are created using the UN comtrade data. Individual ARIMA(0,1,1) models are ran for each country, forecasting soybean trade-unit-value forward 1, 6, and 24 months forward. A preliminary parametric OLS model is created using all node and edge features. RMSE is calculated to measure performance.

ARIMA(0,1,1) models were selected to standardize the model across countries. We began with creating cutomized models for each country, optimizing the predictability for each time series. However, this made interpretation difficult to assess even with increased performance for soybean TUV. ARIMA(0,1,1) models were the most common models across country series and therefore was selected to be the model for all countries to create a model for the entire dataset. The struggle with the ARIMA preparation for a comparative comparison to the GNN was how to incorporate the same variables of interest. Taking the performance average across all the country ARIMA(0,1,1) ultimately had better predictability performance compared to the OLS and GNN. Performance for both ARIMA and OLS were measured by the RMSE of the train and test dataset.

The OLS model is a fixed effects regression controlling for time, country, and trading partner information. It also includes interaction terms of trading value and trade weight to account for their relationship with the TUV dependent variable. The final parametric equation was selected after some robust checks. The core equation was comprised of a model showing the relationship between TUV and trade. To improve the model, trade partner information was added to account for top trading partner's and their influence on TUV. This final OLS model that includes time, country, and trading partner information had the best performance and also better predictability compared to the GNN framework for both the train and test dataset.

An important note about the OLS and ARIMA test environment is not the same. OLS takes data at time t and predicting t+1, while changing model parameters as it uses a test dataset. However, ARIMA does not change parameters, it is actually using data at time t and forecasting out to n time periods as t+n. This important distinction assists in contexualizing the performance of the OLS and ARIMA RMSE results to the GNN.

The RMSE performance for ARIMA and OLS

Train OLS ARIMA Test OLS ARIMA
6 0.186964 0.19826 6 0.225325 0.149483
12 0.181466 0.202305 12 0.240856 0.250507
24 0.160303 0.23752 24 0.259472 0.443633

Files

  • alt_modeling/alt_ols_arima.R

Graphing

The papers graphics are produced using ggplot2 and ggraph. The training data from gnn_modeling produces the model epoch graph, the prediction data from gnn_modeling produces the prediction graph, and the processed data from the data_prep script produces the network graphs.

Files

  • graphing/model_epoch.Rmd
  • graphing/prediction.Rmd
  • graphing/networks.Rmd

GNN Modeling

Both the stateless graph convolutional long short term memory model (S-GC-LSTM) and the graph convolutionl long short term memory model (GC-LSTM) training and prediction are provided in jupyter notebooks. The S-GC-LSTM model is the model selected as the method of choice, and the metrics based on this model are provided in the csvs for the prediction and training performance. Based on standard practice, hyperparameters of learning rate, weight decay, filter size (K), dropout rate, and activation function are tuned using grid search to identify the best performing model.

Files

  • gnn_modeling/temporal-gc-lstm.ipynb
  • gnn_modeling/temporal-s-gc-lstm.ipynb
  • gnn_modeling/model_prediction.csv
  • gnn_modeling/model_train_performance.csv

ainet-gnn-trade21's People

Contributors

florahaberkorn avatar andersonmonken avatar andersonmonken-frb avatar

Stargazers

Zihan Zhang avatar  avatar Eyas Alfaris avatar Marc E. Solèr avatar Ryan C Yost avatar sohn avatar Claudio Casellato avatar Youssef BEN ALLAL avatar Juan de Dios Fernández avatar

Watchers

 avatar

ainet-gnn-trade21's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.