ekzhang / classes.wtf Goto Github PK

View Code? Open in Web Editor NEW

290.0 8.0 17.0 197 KB

A course catalog with extremely fast full-text search

Home Page: https://classes.wtf

License: MIT License

Go 63.99% Dockerfile 2.16% HTML 1.80% Svelte 24.21% CSS 0.12% TypeScript 6.80% JavaScript 0.92%

course go harvard redis university full-text-search catalog svelte

classes.wtf's People

Contributors

Stargazers

Watchers

Forkers

raphaelrk nduatik martinristovski richardsonjf ramayer redstrike therealthor firdavsdev burningion maddyonline tygeri suufi ictserv alikiki garrettmooney soi-20 edwardzhou538

classes.wtf's Issues

Queries restricted to specific terms return more courses from my.harvard than expected

In order to get all courses offered in an academic year, we query the fall and spring term. However, we currently encode the term specification using the incorrect key, Term, instead of STRM.

Our mistake is using the "localized" name rather than the "unlocalized" one. For example, each course has a Course Level that indicates for which group of students a course is intended, and this localized name is displayed in my.harvard's search input for readability. However, the key actually sent in the HTTP request is the unlocalized name, CRSE_ATTR_VALUE_HU_LEVL_ATTR.

my.harvard's frontend supports either version of the key, transparently translating from localized to unlocalized. We do not do this, and the REST API only supports unlocalized keys; unrecognized search components are silently ignored. For example, the query, anime (Course Level:"PRIMUGRD") (STRM:"2238" | Term:"2242"), is the same as just anime (STRM:"2238").

my.harvard queries with multiple ORs do not work as expected

Queries of the form, ("A":"1" | "B":"2") ("C":"3" | "D":"4"), do not yield the same results as the union of the queries, ("A":"1") ("C":"3" | "D":"4") and ("B":"2") ("C":"3" | "D":"4"). I'm hoping this issue can lay out systematically how they differ.

Local my.harvard course list has duplicates

We send concurrent requests to my.harvard and aggregate the results, removing duplicates. (The key to uniquely identify courses is aptly named, Key.)

However, I don't believe we should be seeing duplicates; when I manually search for allegedly duplicate courses, only one result shows up. Also, each time the course download script is run, a different number of duplicates are removed.

Errors out when I include dashes in specific text search

Hi, I love the website and you obviously did an incredible job of engineering it. Just one quip that I thought you might want to look into -- it seems to error out when my query includes a dash (attached screenshot below as an example)

AY 22-23 course meeting times in noon hour displayed incorrectly

For example, FRSEMR 60r which meets from 12:00-2:45pm is displayed as 24:00-14:45.

Courses which meet off the hour also have this issue. For example, PORTUG 220.

I am unsure if courses which end in the noon hour also have the end time affected.

Some course entries on my.harvard are incorrect such as ECON 2909 which meets from 10:30-11:45am but is listed as 10:30am-11:45pm. As such, it is correctly displayed as 10:30-23:45.

This bug only applies to courses in AY 2022-2023.

Cannot page beyond 10,000th course on my.harvard

my.harvard appears to use Oracle WebLogic Server, which in turn uses Elasticsearch to service course search queries. Elasticsearch's pagination API allows paging to the 10,000th course at most.¹

When we reach this limit, my.harvard returns an error message (albeit with a 200 code) instead of a JSON object.

https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html ↩

Spams the browser history

The title is quite self explanatory - The website spams the browser history, adding a new entry every time the user types a character.

Missing courses without a course level

We incorrectly assume that every course has a Course Level, when in fact some don't. This is confirmed by exhausting all course levels—(Course Level:"UGRDGRAD" | Course Level:"GRADCOURSE" | Course Level:"INTRO" | Course Level:"NOLEVEL" | Course Level:"PRIMGRAD" | Course Level:"PRIMUGRD")—and then doing a wildcard search within FAS. The difference is about 50 courses.

The fix specific to our use case of excluding graduate-level courses is to query all courses that do not match (Course Level:"GRADCOURSE"). This can be done efficiently for my.harvard by setting the Exclude300 flag and in GraphQL by using a filter.

ekzhang / classes.wtf Goto Github PK

classes.wtf's People

Contributors

Stargazers

Watchers

Forkers

classes.wtf's Issues

Queries restricted to specific terms return more courses from my.harvard than expected

my.harvard queries with multiple ORs do not work as expected

Local my.harvard course list has duplicates

Errors out when I include dashes in specific text search

AY 22-23 course meeting times in noon hour displayed incorrectly

Cannot page beyond 10,000th course on my.harvard

Spams the browser history

Missing courses without a course level

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

ekzhang / classes.wtf Goto Github PK

classes.wtf's People

Contributors

Stargazers

Watchers

Forkers

classes.wtf's Issues

Footnotes

Recommend Projects

Recommend Topics

Recommend Org