Giter VIP home page Giter VIP logo

monvoscrap's Introduction

monvoscrap

Data scraping tool of Russian govt HE data. VERY WIP, but kind of usable.

The data comes from Monitoring of Higher Education Organizations, organized by Russian Ministry of Higher Education & Sciences. 2013 & 2014 editions of the Monitoring use a different layout of their data pages, so these years' data is still WIP.

This repo contains SQLite database db.sqlite, which contains data of the Monitoring from 2015 and 2022 (each monitoring contain a previous year's data). Feel free to use it for your research or download the data yourself using download_2015_plus.py.

The script fills an SQLite database db.sqlite with following tables:

  • indicators contain all indicators found in the Monitoring (iid being the primary key of the table)
  • federal_districts contains federal districts (fdid being the primary key)
  • universities contains universities' data: primary key uid, name, address, ministry (if it's a govt university), owner, fdid
  • ugn contains УГН (Укрупненные группы направлений подготовки) - a list of major specialization groupings, used by the Ministry to keep track of students' majors;
  • uni_ugn contains yearly data regarding universities' ugn composition (ugnid, uid, year and people);
  • data contains Monitoring data (uid, iid, year and value).

monvoscrap's People

Contributors

ilya101010 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.