Giter VIP home page Giter VIP logo

api's People

Contributors

bearsyankees avatar bgdncz avatar davidtjeong avatar ericyoondotcom avatar erikboesen avatar evgerritz avatar goldinguy avatar helenhall avatar jeffreyjgong avatar neilshah12 avatar redorhcs avatar rencewang avatar salmogy22 avatar transdoan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

api's Issues

Come up with a new way to tell if people are on leave

Currently we just check if the graduation year of each student has increased since the saved copy we have from last year. Once this semester ends, we'll no longer have a reliable way to tell whether people are still on leave or if they only took one semester off. We'll either need to find another way to get leave data, or change the labeling to signify that this student HAS taken a leave but may not necessarily still be on it.

Scrape more faculty information from department websites

Lots of academic departments at Yale have People pages that list (in apparently a somewhat consistent format) all the people (grad students, faculty, staff, etc.) in the department.

These websites have lots of extra information, such as:

  • Suffixes (M.D., Ph.D, etc.)
  • Links to personal or lab webpages
  • Full professorship titles (for example "Sterling Prof of Sociology, Director, Urban Ethnography Project; Prof African American Studies")
  • Pictures

Examples:
https://ling.yale.edu/people
https://cpsc.yale.edu/people
https://afamstudies.yale.edu/people
https://math.yale.edu/people
https://mcdb.yale.edu/people
https://medicine.yale.edu/anesthesiology/people/

Many more... full list here: https://www.yale.edu/academics/departments-programs

Refactor scraper into multiple files

It's huge, and once we implement #44, it's only gonna get huger. We should split different components into multiple files, some ideas for divisions:

  • face_book
  • directory
  • department_websites
  • util (for example clean_* functions)

Add admin and banned columns to users table

Currently, when checking that a user is permitted to do certain privileged operations (i.e. running the scraper), we just check if the user's CAS NetID is equal to my NetID (ekb33). We should add a boolean admin column to the users table that would allow users to be set as administrators, and then check if the current user is an admin when attempting to perform privileged operations, rather than checking against my hardcoded NetID. If you really want to be fancy, you could try to figure out how to add a decorator for this (like @admin_required, comparably to how flask-cas and flask-login implement @login_required).

For banned, it would be good to be able to ban individual users who we don't want using the site. Just in case.

Re-show residence filters once room numbers return to Face Book

The Face Book has removed all room numbers this semester. This may be because of us. It also may be because of the irregularity of COVID. For this reason, I hid the filters that use room numbers (building code, entryway, floor, etc.) in app/templates/index.html. If/when room numbers are put back, we should show these filters again.

Write more complete API documentation

I think it would be really cool to have a Swagger docs system like this, where you can test the API in-browser and see what the responses are like. At minimum we should document the filters endpoint and add a list of fields Person has.

If no people were found on face book page and we abort scraping, delete the saved page file

This failure is caused when authentication has failed. Currently, if we change the passed token so that it's valid, and then immediately rerun the scraper, it'll use the existing page.html file from when the request failed, and the problem won't be fixed until we restart the heroku dyno (which resets the ephemeral filesystem).

Add code to the failing case to delete the page.html file.

ElasticSearch can return different results each search, causing different ordering for each page of results

Pretty much the title. If you do a broad search that returns many results ("Hopper", for instance), you may notice that some people are duplicated across pages, or possibly omitted. This only is an issue for very large searches (which few people do, apparently preferring to use filters), but it's a very obvious problem once you notice it. One solution to this might be to use ES's scrolling tracking features.

Persist query in URL parameters

One kind of nice (although very bugged) thing that the Yale Face Book does is that when you run a search, the search information is stored in the URL. That way, if you want to send someone the results of your search, you can just copy and paste the URL, which would be something like:

yalies.io/?query=Some+name&filters=...

Use more secure filenames for S3 images

Right now, someone could theoretically iterate through every number from 1-100,000 and get all the user images. Rather than using Yale's naming scheme for the files, we should generate a securely random name for the image file based on some properties of the user that aren't likely to change. For example, we can append UPI, image ID, netid, etc. together and then hash that somehow and name the file thus.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.