harvardopendata / harvardopendata.github.io Goto Github PK

View Code? Open in Web Editor NEW

59.0 14.0 54.0 124.3 MB

Harvard's first open data catalog

Home Page: http://hodp.org

License: MIT License

CSS 1.39% JavaScript 94.45% Python 1.67% HTML 2.49%

open-data harvard harvard-data civic-tech

harvardopendata.github.io's Issues

add bios for our team members

Put on the About page. Also include their profile picture, name, year, what they're interested in, and what they've done on HODP.

import datasets from /harvard-data

Add stuff from here to our data csv

https://github.com/Harvard-Open-Data-Project/harvard-data

Free Open Data Tools

note the tools listed here https://project-open-data.cio.gov

Data sets for launch

This thread is for discussion of which data sets should be included in launch

Add featured datasets and projects

One example:

https://github.com/johnshen7/harvardnow

Tiles for each category

I've started it in the landingpage branch, but we need to add a bootstrap thumbnail for each one.

Consider creating brand new frontend

Our bootstrap frontend doesn't look very sexy. Consider using a new one from StartBootstrap, WrapBootstrap, HTML5Up, etc.

If you want to take this on, comment here so we can figure out a new theme before you start implementing it!

License for Harvard's Open Data website

This thread is to discuss license options for Harvard's Open Data website (not the underlying datasets).

For comparison, here is Data.gov's license: https://github.com/GSA/data.gov/blob/master/LICENSE.md

Here is a straw man for discussion:

Public Domain

We waive copyright and related rights in the work worldwide through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest. See CONTRIBUTING for more information.

GNU General Public License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Visit http://www.gnu.org/licenses/ to learn more about the GNU General Public License.

Other Information

In no way are the patent or trademark rights of any person affected by CC0, nor are the rights that other persons may have in the work or in how the work is used, such as publicity or privacy rights.

Unless expressly stated otherwise, the person who associated a work with this deed makes no warranties about the work, and disclaims liability for all uses of the work, to the fullest extent permitted by applicable law. When using or citing the work, you should not imply endorsement by the author or the affirmer.

Add support for multiple URLs

Consider the dataset "Universal Harvard Events Calendar". It has multiple URLs so it doesn't work! Yet some datasets will have multiple URLs. We should enable support for that! Either have url1, url2, url3, etc. fields, or allow for arbitrarily many URLs separated by, say, a pipe, in the CSV file.

Technology / Architecture Options

This thread is for a discussion about Dataverse, CKAN, Socrata, and other possible solutions.

Resources

Demo Dataverse

Appeal to coordinate with Harvard's Worldmap

@capooti is a collaborator/core commiter on the geonode project now working on Harvard's Worldmap with utilizes pyCSW the same Web Catalog Service CKAN uses to catalog/manage/harvest/serve metadata...

See slide deck:
https://github.com/DistributedOpenUnifiedGovernmentNetwork

Add button to clear search categories & terms

Currently, once you set a category/term, there's no way to undo it besides visiting the page again. Add a way to clear it.

fix calendar page not working

Shrink cover photo on small screens

It gets really huge on small screens b/c fixed height

add new projects to site

Store css offline

Icons not loading

Hey @jdhe1120 — in trying to store CSS locally, I think we've excluded the icon font, because these icons aren't loading:

Release data under CC-BY 4.0

https://choosealicense.com/licenses/cc-by-4.0/

note that the code should still be under the MIT license, but data should be CC-BY.

Add more information about our members

Add information about our new members.
Make dedicated cards (panels) for each member. Include a profile photo and information like year, house, interests, and what they've done for HODP.

Improve SEO

I want to be sure that we're always #1 on the Google rankings for queries like harvard open data. We're already doing a pretty good job, but we need to keep improving:

Ideas:

Try adding mention of "catalog" to our webpage's title or meta description.
Mention "open data" and "data portal"/"data catalog" more on our homepage.

Post-launch data sets

I would love to see syllabi as a target data set. Because Harvard does not have an open syllabus project, these would have to be gathered via opt ins ... or possibly by downloading the syllabi the Open Syllabus Project at Columbia U has been gathering off the Web. The OSP is also likely to be producing a useful schema; the people running it are pragmatic, not Schema Infinite Perfectionists.

Why would I love this so much? 1. Syllabi are an insanely useful resource for faculty creating new courses. 2. Encouraging open syllabi would result in a cross-university dataset that would be a gold mine for researchers seeing to understand the patterns of ed in this country and beyond. 3. It seems to me to be totally in line with Harvard's commitment to openness. 4. Harvard syllabi could be imported into the H2O project that treats them like playlists to be learned from and mashed up.

Dataset metadata

We need to find metadata for our datasets before we publish them, including a description of the dataset, where to download it, who published it, etc.

Everyone should find one dataset on this shared Google Doc, find the relevant metadata by poking around the internet, and fill in the rest of the dataset's row.

The metadata schema can be found on this thread.

If you're interested, you can find more potential datasets here. Or, if you have any more ideas about potential datasets, feel free to add information about them!

Post here if you have any questions or comments!

Put catalog in YAML

YAML is more flexible and powerful than CSV, and also easier for humans to read and write. It's a little harder to parse, but there's a library for that.

However, YAML is more space-intensive and not as well suited for huge collections of data as CSV. Harder to learn, too.

MVP ideas

From the meeting yesterday we had the idea to create a Dataverse for the Open Data Project, store some data on it, and write a wrapper webapp (using whatever stack we want) that simply calls the Dataverse APIs behind the scenes, allowing users to call APIs (which in turn call the Dataverse APIs) or download files directly from the Dataverse.

I think the benefit of this is that Dataverse contains lots of useful functionality, and with a wrapper we can add some useful features on top of that.

What's everyone think? If this sounds interesting, I can throw together a quick proof-of-concept.

Consider using Yarn to manage dependencies

Instead of downloading Font Awesome, D3, jQuery, etc. individually, we can use a tool like Yarn to automatically download them and keep them updated for us.

https://yarnpkg.com/en/

Add background image to search area

pforzheimer-house.jpg may be useful

Building the MVP

Here we can discuss the technical details of building the minimum viable product.

For background reading, see the following threads:

Improve featured datasets display

Make it clear that our cards for featured datasets are linking to a Medium article (add a button, maybe?)
Add links to the datasets
Add descriptions below the title of the dataset, to explain what the data is and what we analyzed with it.

License for datasets

This thread can be for a discussion of the license options for the underlying datasets listed on Harvard's Open Data website.

Would suggest 3 popular options, with the default and recommended choice being CC0.

make sidebar filter from the current results rather than wipe them

update members list

2/24/16 Meeting Issue

Harvard Open Data Project - 2/24/16

Interesting emails/conversations to read
1. David Eaves
  1. He says it’ll be difficult to sell it to Harvard but data.harvard.edu already exists and is sponsored by HACC!
2. Alan Wolf
  1. HACC are our allies — it’s good that we have allies in the administration
  2. We should meet with them — it’ll really inform our next steps
  3. It’s a sign that there is interest in open data among administration, and they appreciate/validate what we’re doing since they use our datasets
  4. They’ve already established data.harvard.edu so we’ll need to work with them if we want to do anything big]
  5. We can add value by making data.harvard.edu much muchbetter
3. Nick Sinai
http://data.harvard.edu/
1. “Right now the site has very low traffic, there has been no formal announcement and to the best of my knowledge there are no links to it.”
2. How does this impact us?
Let’s talk strategy
1. It’s an uphill battle.
2. What are we trying to do? What’s our vision for this project? What would we rather do with our time — what’s everyone’s goal for their involvement with this project?
  1. Building something or causing institutional change?
3. Theory of change?
4. How will Harvard buy our vision?
5. How does data.harvard.edu play into this?
6. Who should we talk to?
  1. HACC
Todos: see Trello
Next steps

Brainstorming

What does Harvard gain from this?
- Making/keeping up with trends - depends on their action now (huge opportunity)
- Image - more open, less secretive, greater community
  - Open data is intrinsically good (hard to argue)
- Does Harvard have internal data management software?
- Better-decision making - “data driven”
- Data.harvard.edu already exists, so might not need to convince them
  - What are HACC’s reasons?
- They probably don’t care about centralization
- Promoting student innovation & publicity
- Foster collaboration between departments
What do we want?
- More opportunities for student innovation & better daily life
- improve the standing of CS at Harvard
- Promote student products (give them a home)
- Have Harvard upload public data to our site too (or maybe only there)
- Integrate with Harvard systems
- Get control over data.harvard.edu
- Have Harvard publicize apps that students create with this data
- Make it easier for Harvard students to access data
  - e.g. publicize HUDS menus as .csv’s (don’t force us to scrape their website like we do now)

Categorize datasets

Advanced search in sidebar

Like in data.gov, let people search by text, filter by category and datatype, etc.

Add brief "about" section to our homepage

Mission Statement

Here is our current mission statement, any recommendations for it?

The goal of the Harvard Open Data Project (HODP) is to leverage open data to foster community, efficiency, and student innovation. Making data public and easily accessible allows us all to unlock its potential. Data-driven progress unites people, organizations, and departments as we all try to make daily life better. Aggregating, maintaining, and publicizing open data has and will continue to be a global trend and we want Harvard to be at its forefront. Our goal is to give that progress a home with centralization of available data, integration with existing systems, and showcases of data-inspired products.

Public Feedback and Engagement Mechanisms

To solicit general Ux feedback or feedback on what datasets should be included in Droid at launch or after (#8, #10), it'd be great to partner with Boston's new library project on open data: http://www.cityofboston.gov/doit/knight.asp

For reference: Pittsburgh has similarly partnered with the University of Pittsburgh on a regional open data catalog: http://ucsur.pitt.edu/programs/urban-regional-analysis/regional-data-center/

Allow search by category

Metadata standard (Schema)

This thread is for a discussion about metadata standards. Would suggest starting with global standard like DCAT and reducing to a lightweight few that are required (e.g. title, description, keywords, point of contact name, point of contact email, URL, license).

Here is the current Data.gov schema: https://project-open-data.cio.gov/v1.1/schema/

harvardopendata / harvardopendata.github.io Goto Github PK

harvardopendata.github.io's Issues

Harvard Open Data Project - 2/24/16

Alpha

Production

Recommend Projects

Recommend Topics

Recommend Org