Giter VIP home page Giter VIP logo

canigraduate.uchicago.edu's Introduction

o shit waddup

canigraduate.uchicago.edu's People

Contributors

alvinturtle avatar kelly-shen avatar kevmo314 avatar maths22 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

canigraduate.uchicago.edu's Issues

Publish Hungarian algorithm implementation on npm

A custom implementation of the Hungarian algorithm is used because the current one published on npm (munkres-js) is too slow.

We should publish the faster one. The code can probably be cleaned up a bit and documented a little better before that and there are probably a few additional small performance improvements to be had.

Add course name selector

Course names still need to be cleaned up a lot, if you run the scraper, it spits out whenever course names conflict: https://github.com/kevmo314/canigraduate.uchicago.edu/blob/master/backend/uchicago/scraper.py#L60
This is because we store a unified course name whereas UChicago stores name by term.

We would like a system to identify the canonical course name. The previous version did this with a hardcoded map: https://github.com/kevmo314/canigraduate.uchicago.edu-old/blob/master/scripts/ClassResolver.py#L5

This would also be fine, however something more clever may also be of exploratory interest.

A side note, this actually ends up being a problem, as a couple of UChicago classes have the same identifier but different names, which leads me to believe that they're different classes. For canigraduate, we treat them as the same class if the ID is the same, but that's okay because it's a relatively extreme edge case and causes more problems than it would solve if we were to fix it.

Improve interval tree implementation

The implementation of an interval tree is mostly stolen from this implementation.

It's not the best, however. The use of string constants is somewhat egregious and a lot of things can be simplified with a switch to ES6. It's currently 2.1 kB after uglification, I have a hunch it can be brought down to <1 kB.

Migrate search to web worker

Searching takes somewhere around 50-100ms/query right now, which is long enough to block the UI thread noticeably. This can be mitigated by moving the actual query evaluation to a webworker.

Fix chip input behavior

Right now the chip input is sort of terrible. It should be its own component and better implemented. The previous canigraduate actually made a rather nice directive however since the new version uses angular/material2, the typeahead component isn't as readily available, so this will need some clever event handling.

Sidebar responsive behavior not consistent

If you're on a screen wide enough, the sidebar is permanent. Shrinking the page causes the sidebar to become a drawer, however widening the page again doesn't bring it back.

Add higher frequency enrollment poller for watched courses

The scraper right now is intended to be run a couple times a week or so. This means enrollment numbers for most classes can lag up to a week, which is too long for watches. There should be a cron job that runs that pulls the classes that have associated watches attached to them and updates their enrollments so this can be run every few minutes.

Note that courses from terms that have already occurred will not have any enrollment numbers that update, so those do not need to be polled more frequently even though they have attached watches. Additionally, note that there may be clever optimizations to reduce the number of queries by grouping courses eg in the same department. Fewer queries means the courses can be polled more frequently, which results in lower latency on updates.

The previous enrollment watchers also implemented stochastic watching, namely that a sample of courses was taken to be polled at each timestep. The courses themselves had weights corresponding to the number of students watching the course, so more students watching meant lower latencies. The net result was that the number of queries stays constant relative to the number of watches, but the perceived latency on enrollment changes does not increase as much. This would be nice to have in the new version too, but is optional if the grouping optimization is efficient enough.

Add evaluations endpoint

A Firebase Cloud Function that will call evaluations.uchicago.edu with the provided course id and scrape the page so we have a nice json API instead of the garbage HTML UChicago provides.

Migrate scraping one more time to Apache Beam

Being on Apache Spark is nice, however the execution model doesn't scale particularly elastically. Especially for the use case of the scraper, the initial data size is tiny compared to the data that needs to be processed, so Spark will underestimate the parallelism.

This can be mitigated by migrating to a framework like Apache Beam, which will provide more elastic scaling on Cloud Dataflow. Unfortunately, Apache Beam doesn't support python 3, so either the code should be migrated to Java, fixed to work with both python 2 or python 3, or Beam support for python 3 can be fixed.

On the bright side, most of the abstraction work is now done after the Spark migration, so if the solution is to make the code work with python 2, it should be relatively trivial.

Investigate @ngrx/store

@ngrx/store provides some interesting state management, which is one thing the previous canigraduate did not do well and the current version also does not do very well. I'm not a huge fan of the design pattern and syntax, however it may be worth exploring to see if we can take advantage it to get state management working well.

Investigate React for frontend

The frontend is written in Vue right now, mostly because it was easier to write a MVP in. Especially with the new React Fiber improvements, React seems like a more sound long term choice. After release, it would be worth investigating switching over.

Design and implement educator page

What should educators see when they log in? What functionalities would be useful for them?

I think this could wait until the overall site and student page is polished, but definitely any suggestion is welcome.

Investigate offline caching for Firebase

Firebase offline support is a long ways away, so we'd like a monkey-patch for read operations. Something like using localforage would be neat. Note that this is primarily motivated by reducing bandwidth and load times (currently takes ~6s for initial data load), not for full offline support, however having the latter would be nice too. As a result, this will be a little tricky as it would require storing a version sentinel or other clever solution, otherwise Firebase will download all the data every single time anyways when the observable is created, negating the bandwidth benefit.

Add schedule renderer

Choosing the course search tab should replace the transcript card with a schedule card, showing your current schedule (and a dropdown that lets you choose previous terms). The course lists can be pulled from TranscriptService and schedule data is available through DatabaseService.

Change GPA mapping

Ideally we should abstract out the GPA map, but this could be challenging because not every school uses the same GPA system (e.g. MIT GPA is calculated on a 5.0 scale without plus and minus modifiers.)

Add watch support via Firebase Cloud Functions

A watch should use Firebase Cloud Functions to watch the enrollments of relevant classes and send a cute email when the class enrollment changes.

Quick add support in the email would also be nice, via either a deep link or another cloud function that proxies the request. I don't have access to add/drop anymore, so this would be something only a current UChicago student can implement.

Also note that this should be institution-independent by configuring the institution's database (which will all follow the same schema).

Replace vuex with RxJS

The whole vuex state management thing could be replaced by a bunch of RxJS BehaviorSubjects. This would save a lot of code as vuex wouldn't need to be bundled, and would reduce the number of variables that need to be watched, as well as let us switch over to immutable state which comes with some nice benefits.

This would probably make for a good third-party module, there are no good devtools and having a chrome extension to replay state would be pretty useful too.

API explorer/browser

One nice feature would be to surface the data in a standardized API. To facilitate that, we'd need an API browser and documentation. There are a couple of pre-built solutions, but having something custom-made would be fine too. In any case, the backend API should be documented and surfaced in a developer-friendly way.

Fix webpack source maps

The source maps are not working. Not sure if it's webpack's fault. Maybe we should just stop using webpack all together as it's kind of a giant mess anyways...

Backend testing

It would be nice if the backend scrapers had tests. Would potentially require a lot of mocking though.

Move scraping pipeline to Apache Spark

Much of the scraping code is spent parsing HTML and running queries against it. This can be all parallelized if we move to Apache Spark or other distributed computing platform instead of the local multiprocessing.Pool. The timeschedules and coursesearch objects are written sufficiently generically so this should just be a straightforward wrapper around the necessary functionality.

This will also be useful to extend to other schools, which may offer many more classes than UChicago, thus unlikely to be able to run sequentially like UChicago's scraper.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.