Code for 5th place in ASAP-SAS competition
License GPLv3: http://www.gnu.org/copyleft/gpl.html Authors: Jonathan Peters, Paweł Jankiewicz
- Installation.
- Python 2.6.6+ Libraries needed: numpy, nltk*, pyaspell
- R 2.14.1+ Libraries needed (as in config.r file)
- RStudio 0.95.262+
-
for nltk there is a necessity to download some external data: nltk.corpus.stopwords nltk.corpus.brown nltk.corpus.wordnet
It is done very quickly using a command nltk.download() and selecting those 3 libraries
-
Place the private leaderboard data set as private_leaderboard.tsv file in data/raw/ folder The file that resides there is only a dummy file that helped us create a process.
-
Change the header to setwd(YOUR_CURRENT_PATH) in files:
- 1c_prepare_features.r
- 2_prepare_data.r
- 4_make_models.r
- 5_blend.r
-
Run the files:
- 1a_prepare_essays.py (python)
- 1b_prepare_ngrams.py (python)
- 1c_prepare_features.r (Rstudio)
- 1d_prepare_chunks.py (python)
- 1d_prepare_essay_specific_features.py (python)
- 1g_prepare_features.py (python)
- 2_prepare_data.r (Rstudio)
- 3_make_models.r (Rstudio)
- 4_blend.r (Rstudio)
It will take 1-2 weeks to calculate all the models depending on how many cores you have.
- The submission should be placed in submissions/private.csv