Giter VIP home page Giter VIP logo

lib4moocdata's Introduction

Lib4MOOCData

Library for processing MOOC data dumps. Currently limited to Coursera data.

Published Findings

Papers published using this code on our MOOC corpus are available in this repository for download here: https://github.com/WING-NUS/lib4moocdata/tree/master/coursera/docs

If you use this code for your own research, we request you to let us know by email or github issues and cite us.

  • Chandrasekaran, Epp, C.D., M. K., Kan, M.-Y., Litman, D., 2017. “Using Discourse Signals for Robust Instructor Intervention Prediction”. In Proceedings of the Thirty-First AAAI conference on Artificial Intelligence (AAAI-17), San Francisco, USA. pp. 3415-3421. AAAI. https://ojs.aaai.org/index.php/AAAI/article/view/11015

  • Chandrasekaran, M. K., Kan, M.-Y., Ragupathi, K., Tan, B. C. Y. 2015. “Learning instructor intervention from MOOC forums: Early Results and Issues”. In Proceedings of the 8th International Conference on Educational Data Mining, Madrid, Spain. pp. 218-225. International Education Data Mining Society. https://www.educationaldatamining.org/EDM2015/proceedings/full218-225.pdf

Coursera data export

To use this library you need to procure data dumps of MOOCs you won from Coursera. Coursera exports data from its MOOCs after compeltion for use by the university that is hosting it on its platform. These data dumps are .sql exports from MySQL databases. A typical data export consists of the following .sql files
  1. <Full_Coursename>(<coursecode>)_SQL_anonymized_forum.sql
  2. <Full_Coursename>(<coursecode>)_SQL_hash_mapping.sql
  3. <Full_Coursename>(<coursecode>)_SQL_anonymized_general.sql
  4. <Full_Coursename>(<coursecode>)_SQL_unanonymizable.sql

A .txt file with clickstream data is also provided. We do not ywt process them in this library
5. <coursecode>_clickstream_export.gz

For replicating our published results (in our papers), it is sufficient to import files (1), (2) and (3).

How to run this code?

Step by step instructions on runnning experiments to replicate our EDM 2015 and AAAI 2017 papers are accessible here.

Prerequisites

To use the library to process and analyse your data you will first need to install the MySQL database and ingest the .sql files into the database.
Command to ingest .sql files using MySQL command line interface (CLI): mysql> source <path to .sql file>/<name of the.sql file>

Note that Coursera supplies a sql export for every course. This means DDL statements across the files from different courses will be redundant. More importatnly there is no field for coursecode in any of the tables. So, you have to either: i) create a separate MySQL database for each course dump (1 per each course iteration) or ii) add a 'coursecode' field to every table and issue update statements to populate the coursecode field after running the *.sql import

Installation

The scripts require you to have installed Perl 5 and some dependant perl packages.

For Windows users
Install Strawberyy Perl from here http://strawberryperl.com/lib4moocdata or Active Perl from here http://www.activestate.com/activeperl

For Linux, Mac users
Linux and Mac users should have perl already installed as part of your OS. You can check this with the command perl -v in your terminal.

Dependant Perl Modules (Packages) to install

CPAN has tools to easy install perl modules. Please see this step-by-step tutorial http://www.cpan.org/modules/INSTALL.html
The packages to install are:
  • DBI
  • FindBin
  • Getopt::Long
  • Encode
  • HTML::Entities
  • Lingua::EN::Sentence
  • Lingua::EN::Tokenizer::Offsets
  • Lingua::StopWords
  • Lingua::EN::StopWordList
  • Lingua::Stem::Snowball
  • Lingua::EN::Ngram
  • Lingua::EN::Bigram ## Fails on linux centos 6
  • Lingua::EN::Tagger
  • Lingua::EN::PluralToSingular
  • Config::Simple
  • File::Remove

lib4moocdata's People

Contributors

cmkumar87 avatar knmnyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

lib4moocdata's Issues

schema difference between courses

​ the user table has anon_user_id instead of session_user_id in some of the courses. populate_sqlite_db.pl shoudl handle this case.

Config file for instructor roles

New instructor roles that is not hardcoded in the codebase breaks the code.
Be able to add to new instructor roles to a config. This avoids changes to the codebase.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.