kevmo314 / canigraduate.uchicago.edu Goto Github PK

View Code? Open in Web Editor NEW

4.0 6.0 3.0 36.76 MB

Automated graduation dependency resolution.

Home Page: http://canigraduate.uchicago.edu/

License: MIT License

Python 1.39% HTML 0.34% JavaScript 1.01% Vue 20.28% TypeScript 24.59% Java 52.40%

canigraduate.uchicago.edu's Issues

Improve interval tree implementation

The implementation of an interval tree is mostly stolen from this implementation.

It's not the best, however. The use of string constants is somewhat egregious and a lot of things can be simplified with a switch to ES6. It's currently 2.1 kB after uglification, I have a hunch it can be brought down to <1 kB.

Replace vuex with RxJS

The whole vuex state management thing could be replaced by a bunch of RxJS BehaviorSubjects. This would save a lot of code as vuex wouldn't need to be bundled, and would reduce the number of variables that need to be watched, as well as let us switch over to immutable state which comes with some nice benefits.

This would probably make for a good third-party module, there are no good devtools and having a chrome extension to replay state would be pretty useful too.

Add watch support via Firebase Cloud Functions

A watch should use Firebase Cloud Functions to watch the enrollments of relevant classes and send a cute email when the class enrollment changes.

Quick add support in the email would also be nice, via either a deep link or another cloud function that proxies the request. I don't have access to add/drop anymore, so this would be something only a current UChicago student can implement.

Also note that this should be institution-independent by configuring the institution's database (which will all follow the same schema).

Investigate React for frontend

The frontend is written in Vue right now, mostly because it was easier to write a MVP in. Especially with the new React Fiber improvements, React seems like a more sound long term choice. After release, it would be worth investigating switching over.

Design and implement educator page

What should educators see when they log in? What functionalities would be useful for them?

I think this could wait until the overall site and student page is polished, but definitely any suggestion is welcome.

Fix chip input behavior

Right now the chip input is sort of terrible. It should be its own component and better implemented. The previous canigraduate actually made a rather nice directive however since the new version uses angular/material2, the typeahead component isn't as readily available, so this will need some clever event handling.

Investigate offline caching for Firebase

Firebase offline support is a long ways away, so we'd like a monkey-patch for read operations. Something like using localforage would be neat. Note that this is primarily motivated by reducing bandwidth and load times (currently takes ~6s for initial data load), not for full offline support, however having the latter would be nice too. As a result, this will be a little tricky as it would require storing a version sentinel or other clever solution, otherwise Firebase will download all the data every single time anyways when the observable is created, negating the bandwidth benefit.

Change GPA mapping

Ideally we should abstract out the GPA map, but this could be challenging because not every school uses the same GPA system (e.g. MIT GPA is calculated on a 5.0 scale without plus and minus modifiers.)

Investigate if scrapy is worth using over bs4 for uchicago backend

Fix webpack source maps

The source maps are not working. Not sure if it's webpack's fault. Maybe we should just stop using webpack all together as it's kind of a giant mess anyways...

API explorer/browser

One nice feature would be to surface the data in a standardized API. To facilitate that, we'd need an API browser and documentation. There are a couple of pre-built solutions, but having something custom-made would be fine too. In any case, the backend API should be documented and surfaced in a developer-friendly way.

Review and update requirements data

The old canigraduate data is not necessarily correct anymore. In any case, the schema is slightly different now.

Let me know explicitly if you're interested in working on this one, as it requires db write access.

Add course name selector

Course names still need to be cleaned up a lot, if you run the scraper, it spits out whenever course names conflict: https://github.com/kevmo314/canigraduate.uchicago.edu/blob/master/backend/uchicago/scraper.py#L60
This is because we store a unified course name whereas UChicago stores name by term.

We would like a system to identify the canonical course name. The previous version did this with a hardcoded map: https://github.com/kevmo314/canigraduate.uchicago.edu-old/blob/master/scripts/ClassResolver.py#L5

This would also be fine, however something more clever may also be of exploratory interest.

A side note, this actually ends up being a problem, as a couple of UChicago classes have the same identifier but different names, which leads me to believe that they're different classes. For canigraduate, we treat them as the same class if the ID is the same, but that's okay because it's a relatively extreme edge case and causes more problems than it would solve if we were to fix it.

Migrate transcript API to Firebase Cloud Functions

Add schedule renderer

Choosing the course search tab should replace the transcript card with a schedule card, showing your current schedule (and a dropdown that lets you choose previous terms). The course lists can be pulled from TranscriptService and schedule data is available through DatabaseService.

Migrate search to web worker

Searching takes somewhere around 50-100ms/query right now, which is long enough to block the UI thread noticeably. This can be mitigated by moving the actual query evaluation to a webworker.

Sidebar responsive behavior not consistent

If you're on a screen wide enough, the sidebar is permanent. Shrinking the page causes the sidebar to become a drawer, however widening the page again doesn't bring it back.

Backend testing

It would be nice if the backend scrapers had tests. Would potentially require a lot of mocking though.

Add higher frequency enrollment poller for watched courses

The scraper right now is intended to be run a couple times a week or so. This means enrollment numbers for most classes can lag up to a week, which is too long for watches. There should be a cron job that runs that pulls the classes that have associated watches attached to them and updates their enrollments so this can be run every few minutes.

Note that courses from terms that have already occurred will not have any enrollment numbers that update, so those do not need to be polled more frequently even though they have attached watches. Additionally, note that there may be clever optimizations to reduce the number of queries by grouping courses eg in the same department. Fewer queries means the courses can be polled more frequently, which results in lower latency on updates.

The previous enrollment watchers also implemented stochastic watching, namely that a sample of courses was taken to be polled at each timestep. The courses themselves had weights corresponding to the number of students watching the course, so more students watching meant lower latencies. The net result was that the number of queries stays constant relative to the number of watches, but the perceived latency on enrollment changes does not increase as much. This would be nice to have in the new version too, but is optional if the grouping optimization is efficient enough.

Migrate scraping one more time to Apache Beam

Being on Apache Spark is nice, however the execution model doesn't scale particularly elastically. Especially for the use case of the scraper, the initial data size is tiny compared to the data that needs to be processed, so Spark will underestimate the parallelism.

This can be mitigated by migrating to a framework like Apache Beam, which will provide more elastic scaling on Cloud Dataflow. Unfortunately, Apache Beam doesn't support python 3, so either the code should be migrated to Java, fixed to work with both python 2 or python 3, or Beam support for python 3 can be fixed.

On the bright side, most of the abstraction work is now done after the Spark migration, so if the solution is to make the code work with python 2, it should be relatively trivial.

Investigate @ngrx/store

@ngrx/store provides some interesting state management, which is one thing the previous canigraduate did not do well and the current version also does not do very well. I'm not a huge fan of the design pattern and syntax, however it may be worth exploring to see if we can take advantage it to get state management working well.

Move scraping pipeline to Apache Spark

Much of the scraping code is spent parsing HTML and running queries against it. This can be all parallelized if we move to Apache Spark or other distributed computing platform instead of the local multiprocessing.Pool. The timeschedules and coursesearch objects are written sufficiently generically so this should just be a straightforward wrapper around the necessary functionality.

This will also be useful to extend to other schools, which may offer many more classes than UChicago, thus unlikely to be able to run sequentially like UChicago's scraper.

Migrate to lettable operators

RxJS 5.5 was released, which includes lettable operators. Currently, about a third of the bundle size is RxJS. If we can migrate to the new chaining syntax, then webpack can prune off the unused operators automatically instead of relying on manual pruning, which should save a substantial amount off the final shipped bundle size.

See: https://github.com/ReactiveX/rxjs/blob/master/doc/lettable-operators.md

kevmo314 / canigraduate.uchicago.edu Goto Github PK

canigraduate.uchicago.edu's Issues

Recommend Projects

Recommend Topics

Recommend Org