kevmo314 / canigraduate.uchicago.edu Goto Github PK
View Code? Open in Web Editor NEWAutomated graduation dependency resolution.
Home Page: http://canigraduate.uchicago.edu/
License: MIT License
Automated graduation dependency resolution.
Home Page: http://canigraduate.uchicago.edu/
License: MIT License
The implementation of an interval tree is mostly stolen from this implementation.
It's not the best, however. The use of string constants is somewhat egregious and a lot of things can be simplified with a switch to ES6. It's currently 2.1 kB after uglification, I have a hunch it can be brought down to <1 kB.
The whole vuex state management thing could be replaced by a bunch of RxJS BehaviorSubjects. This would save a lot of code as vuex wouldn't need to be bundled, and would reduce the number of variables that need to be watched, as well as let us switch over to immutable state which comes with some nice benefits.
This would probably make for a good third-party module, there are no good devtools and having a chrome extension to replay state would be pretty useful too.
A watch should use Firebase Cloud Functions to watch the enrollments of relevant classes and send a cute email when the class enrollment changes.
Quick add support in the email would also be nice, via either a deep link or another cloud function that proxies the request. I don't have access to add/drop anymore, so this would be something only a current UChicago student can implement.
Also note that this should be institution-independent by configuring the institution's database (which will all follow the same schema).
The frontend is written in Vue right now, mostly because it was easier to write a MVP in. Especially with the new React Fiber improvements, React seems like a more sound long term choice. After release, it would be worth investigating switching over.
What should educators see when they log in? What functionalities would be useful for them?
I think this could wait until the overall site and student page is polished, but definitely any suggestion is welcome.
Right now the chip input is sort of terrible. It should be its own component and better implemented. The previous canigraduate actually made a rather nice directive however since the new version uses angular/material2, the typeahead component isn't as readily available, so this will need some clever event handling.
Firebase offline support is a long ways away, so we'd like a monkey-patch for read operations. Something like using localforage would be neat. Note that this is primarily motivated by reducing bandwidth and load times (currently takes ~6s for initial data load), not for full offline support, however having the latter would be nice too. As a result, this will be a little tricky as it would require storing a version sentinel or other clever solution, otherwise Firebase will download all the data every single time anyways when the observable is created, negating the bandwidth benefit.
Ideally we should abstract out the GPA map, but this could be challenging because not every school uses the same GPA system (e.g. MIT GPA is calculated on a 5.0 scale without plus and minus modifiers.)
The source maps are not working. Not sure if it's webpack's fault. Maybe we should just stop using webpack all together as it's kind of a giant mess anyways...
One nice feature would be to surface the data in a standardized API. To facilitate that, we'd need an API browser and documentation. There are a couple of pre-built solutions, but having something custom-made would be fine too. In any case, the backend API should be documented and surfaced in a developer-friendly way.
The old canigraduate data is not necessarily correct anymore. In any case, the schema is slightly different now.
Let me know explicitly if you're interested in working on this one, as it requires db write access.
Course names still need to be cleaned up a lot, if you run the scraper, it spits out whenever course names conflict: https://github.com/kevmo314/canigraduate.uchicago.edu/blob/master/backend/uchicago/scraper.py#L60
This is because we store a unified course name whereas UChicago stores name by term.
We would like a system to identify the canonical course name. The previous version did this with a hardcoded map: https://github.com/kevmo314/canigraduate.uchicago.edu-old/blob/master/scripts/ClassResolver.py#L5
This would also be fine, however something more clever may also be of exploratory interest.
A side note, this actually ends up being a problem, as a couple of UChicago classes have the same identifier but different names, which leads me to believe that they're different classes. For canigraduate, we treat them as the same class if the ID is the same, but that's okay because it's a relatively extreme edge case and causes more problems than it would solve if we were to fix it.
Choosing the course search tab should replace the transcript card with a schedule card, showing your current schedule (and a dropdown that lets you choose previous terms). The course lists can be pulled from TranscriptService
and schedule data is available through DatabaseService
.
Searching takes somewhere around 50-100ms/query right now, which is long enough to block the UI thread noticeably. This can be mitigated by moving the actual query evaluation to a webworker.
If you're on a screen wide enough, the sidebar is permanent. Shrinking the page causes the sidebar to become a drawer, however widening the page again doesn't bring it back.
It would be nice if the backend scrapers had tests. Would potentially require a lot of mocking though.
The scraper right now is intended to be run a couple times a week or so. This means enrollment numbers for most classes can lag up to a week, which is too long for watches. There should be a cron job that runs that pulls the classes that have associated watches attached to them and updates their enrollments so this can be run every few minutes.
Note that courses from terms that have already occurred will not have any enrollment numbers that update, so those do not need to be polled more frequently even though they have attached watches. Additionally, note that there may be clever optimizations to reduce the number of queries by grouping courses eg in the same department. Fewer queries means the courses can be polled more frequently, which results in lower latency on updates.
The previous enrollment watchers also implemented stochastic watching, namely that a sample of courses was taken to be polled at each timestep. The courses themselves had weights corresponding to the number of students watching the course, so more students watching meant lower latencies. The net result was that the number of queries stays constant relative to the number of watches, but the perceived latency on enrollment changes does not increase as much. This would be nice to have in the new version too, but is optional if the grouping optimization is efficient enough.
Being on Apache Spark is nice, however the execution model doesn't scale particularly elastically. Especially for the use case of the scraper, the initial data size is tiny compared to the data that needs to be processed, so Spark will underestimate the parallelism.
This can be mitigated by migrating to a framework like Apache Beam, which will provide more elastic scaling on Cloud Dataflow. Unfortunately, Apache Beam doesn't support python 3, so either the code should be migrated to Java, fixed to work with both python 2 or python 3, or Beam support for python 3 can be fixed.
On the bright side, most of the abstraction work is now done after the Spark migration, so if the solution is to make the code work with python 2, it should be relatively trivial.
@ngrx/store provides some interesting state management, which is one thing the previous canigraduate did not do well and the current version also does not do very well. I'm not a huge fan of the design pattern and syntax, however it may be worth exploring to see if we can take advantage it to get state management working well.
Much of the scraping code is spent parsing HTML and running queries against it. This can be all parallelized if we move to Apache Spark or other distributed computing platform instead of the local multiprocessing.Pool
. The timeschedules and coursesearch objects are written sufficiently generically so this should just be a straightforward wrapper around the necessary functionality.
This will also be useful to extend to other schools, which may offer many more classes than UChicago, thus unlikely to be able to run sequentially like UChicago's scraper.
RxJS 5.5 was released, which includes lettable operators. Currently, about a third of the bundle size is RxJS. If we can migrate to the new chaining syntax, then webpack can prune off the unused operators automatically instead of relying on manual pruning, which should save a substantial amount off the final shipped bundle size.
See: https://github.com/ReactiveX/rxjs/blob/master/doc/lettable-operators.md
A Firebase Cloud Function that will call evaluations.uchicago.edu with the provided course id and scrape the page so we have a nice json API instead of the garbage HTML UChicago provides.
A custom implementation of the Hungarian algorithm is used because the current one published on npm (munkres-js) is too slow.
We should publish the faster one. The code can probably be cleaned up a bit and documented a little better before that and there are probably a few additional small performance improvements to be had.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.