Giter VIP home page Giter VIP logo

cernopendata / opendata.cern.ch Goto Github PK

View Code? Open in Web Editor NEW
635.0 84.0 141.0 291.73 MB

Source code for the CERN Open Data portal

Home Page: http://opendata.cern.ch/

License: GNU General Public License v2.0

Python 43.55% CSS 2.76% JavaScript 13.26% Shell 3.01% HTML 29.99% Dockerfile 1.43% SCSS 6.00%
open-data open-science research-data research-data-repository research-data-management flask json-schema python invenio inveniosoftware

opendata.cern.ch's Introduction

CERN Open Data portal

image

image

image

About

This is the source code behind CERN Open Data portal. You can access the portal at http://opendata.cern.ch/. The source code uses Invenio digital repository framework.

Developing

If you'd like to install a demo site locally for personal developments, please see developing guide for more information.

Contributing

Bug reports, feature requests and code contributions are encouraged and welcome! Please see contributing guide for more information.

Support

You can ask questions at our Forum or get in touch via our Chatroom.

Authors

The alphabetical list of all contributors is available in the AUTHORS file.

License

GNU General Public License

opendata.cern.ch's People

Contributors

annatrz avatar anxheladani avatar artemislav avatar artfisica avatar atrisovic avatar audrium avatar caredg avatar daanrosendal avatar espacial avatar filipmaxin avatar fsocher avatar heitorpb avatar hjhsalo avatar ioannistsanaktsidis avatar jirikuncar avatar jmhogan avatar juhateuho avatar katilp avatar mantasavas avatar mvidalgarcia avatar nancyhamdan avatar okraskaj avatar pamfilos avatar pherterich avatar psaiz avatar raoofphysics avatar rdebrand avatar stwunsch avatar tiborsimko avatar tpmccauley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendata.cern.ch's Issues

Feedback on current design [Aug 15th]

  • no facebook/twitter/youtube icons and language settings: get rid of the uppermost dark line
  • new logo will be proposed by Laura Rueda
  • collection tabs without radio buttons - do we need them anyway? [we will ask exps]
  • detailed record view should get a "frame" leaving more space to the left. Fonts should be adjusted (e.g. Title not prominent enough)
  • detailed record: dont show unused tabs, usage stats are of interest (but most likely closed access)

thoughts:

  • better navigation for researchers vs. citizen scientists
  • policies will be uploaded as individual records, but also put onto a "learn about" page
  • we need some design elements and experiment specific materials to show alongside data records
  • navigation elements to move from "big data" elements to "small data" elements

Meeting August 28th

  • metadata ingest will happen manually for the first 14 high level datasets, for the mid-term future we will enable automated ingestions from a controlled list of sources
  • CMS is preparing for a "guided tour"/how-to document which will accompany every dataset and analysis. This document will be the same for all primary data sets (may change later). But it will be different for derived data sets (e.g. the instructions connect to derived "pattuples" from ana will point to code and the instructions of how to run). However, the structure will be the same:
  • selection
  • validation
  • how to reuse
  • limitations

These texts are being prepared by CMS, with the support of Patricia. They should be linked (initially) on the right hand side of the individual records with a dedicated box. Patricia will investigate if parts of this information can be referenced in the metadata to enable the tailoured dataset specific display. This additional documentation will sit, however, on an additional page and should be exportable as a PDF. It should be a record by itself, get a DOI, incl. citation recommendation (Action on Patricia to prepare that).

  • all of the datasets get a disclaimer that Kati will provide, i.e. concerning quality assurance. Location on the record page to be decided, possibly at the bottom of the page
  • There will be a set of restricted files, not visible to the external users with trigger/selection details (Kati please correct the details here!)
  • there must be an export functionality for the 14 highlevel file names enabling an easy integration into the config files - this needs to include the root file name
  • A virtual image will be stored on the plattform: will become a standalone record and DOI

Ana's analysis

  • is derived from two high-level (primary) datasets [the same is the case for Tom's examples]
  • is available on github:
    a) exercise itself https://github.com/ayrodrig/OutreachExercise2010
    b) the pattuples production https://github.com/ayrodrig/pattuples2010
  • Ana's code should become a record by itself, too - also with a DOI [following Zenodo's Github integration]
  • also these records will have their own "how to" in the box on the right [see 1-4 above]
  • there should be enough metadata to create such a record: authors lists the same

Overall tasks and next steps

  • set up Laura's design
  • set up of html-editing pages for additional info on github
  • prepare for additional information in separate menu so that we can prepare some nice additional documentation there
  • prepare the additional boxes on the right of a detailed record page
  • check export functionlities (see comment on titles above)
  • meeting beginning of next week for documentation sprint (with Achintya and Patricia)
  • meeting beginning of next week with Pamfilos for design sprint

UX/UI testing tasks

  • navigation on the portal
  • navigation from primary and reduced data
  • one task: can you reproduce the analysis? [is the user able to find all the related information, data, code, "how-to" for the particular analysis?]

Metadata related tasks

  • compile metadata for software
  • compile metdata for virtual image
  • populate the records for the 14 primary datsets
  • integrate Ana's analysis

pages: "For Research"

Prepare content pages for "research" carousel links, i.e. how to download VM, how to access datasets via xrootd.

testsuite: addition of "small data" samples

In addition to big data sets, smaller (JSON) files that will be used for event display and histogramming should be added to the test suite so that we could further develop the portal UI and the data visualisation layer.

Dataset metadata

To assign a DOI, the following metadata are required:

  • Title (ideally a human readable one)
  • Creators (will be the collaboration and authors from author XML)
  • Date (year is enough, either the date the data was published on CMS pages or the date the data moved to the portal; we could also have both, depending on preference)
  • Publisher (should that be the collaboration or more abstract the "CERN Open Data Portal"?)

In bold is marked where CMS has to decide on which data to use/what to put there.

It's always good to have more metadata, especially

  • Description (human readable information about the dataset)
  • Technical details (how many files, file sizes etc.)

As a MARC record, it will look like this:
0247_$$a10.1234/whatevernaming $$2DOI
245__$$aHUMAN READABLE TITLE
256__$$aNr. of Files, Filesize in total
260__$$bPUBLISHER$$cDATE
[269__$$cPREPRINTDATE; might be CMS page publication date in case 260$$c will be used for the portal]
520__$$aHuman Readable Description
540__$$aCC-0 license
700__$$aAUTHOR [filled from author XML file]
710__$$gCMS collaboration

Usability test tasks

Visualise an event from Muon primary data set and turn the view on the x-y plane.
Do you observe the curvature of the tracks?

Compare, in the event display, some events from Mu primary data set to those in Minumum Bias primary data set. What differences do you observe?

With the online histogrammer, plot a *** histogram from the di-muon reduced data set and make selections on **?

You liked the histogramming application and you would like to use it for other purposes.
Do you find the source code, and do you understand what would you need to do to get started for using them for your own application?

You liked the histogramming application and you would like to use it plot other physics objects
(e.g. events including two jets).
Do you find the source code for producing the reduced data for histogramming, and do you
understand what would you need to change in the source code to read other primary data sets
and select other physics objects?

Open a primary data set file with root and find the collections for different physics objects (muons, electrons, photons, jets).

Run the analysis example on a small number of events.

pages: "For Education"

Prepare content pages for "education" carousel links, i.e. event display, histogram, "learn more" about HEP.

Activate the file download

Activate the file download for derived and primary data sets.
We could already foresee a warning text for the primary data sets saying that
data sets are of TB size and download takes time accordinly, and point
the users to VM image.

Area for information material with easy editing

Follow-up from #41 : We need an area where we can easily deposit/edit the information material (i.e. the instructions, VM test report, validation statement). A possibility for easy html editing within github? was mentioned in the meeting of Aug 28.

Data file listings under Research and Education

Can we have primary data sets appear under Reserach and all others (derived data sets) under Education? The name for the files for event display should reflect their content, now they have the same name as the primary data set.

introduce search option

Amend "SEE ALL" to list all records, by simply pointing to /search.
Amend UI prototype to have search box visible.

Feedback mechanism for production use

Need a feedback page (unless something else is preferred) for the testing. Can it
be a button on every page which picks up automatically the page URL, and maybe proposes categories such as

  • problems with display or layout
  • problems with instructions
  • does not provided the intended functionality
  • problems with navigations
  • purpose of the page unclear
    ...
  • free text field
  • eventually a possibility for attaching a screen-shot

search: collection facets

Introduce facets by collections after #10 and #26 is completed. For example, a search in the CMS collection would distinguish "CMS Primary Dataset" records from "CMS Reduced Dataset" records.

collection: style `/collection` pages

Now that we have some collections, see /collection/CMS or /collection/ALICE, it would be good to style collection pages. (fonts, margins, logos, remove add-to-favourites, etc)

pages: "Latest News"

Prepare content pages for "news" carousel links, e.g. CMS open data release statement draft.

Create GitHub Organization for all ODP-related repos

As discussed at meeting of 2014-09-02T15:00+02 (@katilp, @pherterich, @pamfilos, @RaoOfPhysics present):

Request to have a GitHub Organization, which will act as a single point-of-entry for all repositories related to the CERN Open Data Portal, including:

Note that there already exists an Organization called cms-outreach where the iSpy codebase is stored.

Create GitHub Organization. Either @tiborsimko or @TimSmithCH to be owners?

dataset descriptions

The AOD datasets have names (e.g. /Mu/April21_ReReco) which need to be translated into something more meaningful to the public.

Create Tools collection on the portal

This would contain the code to use on data and subcategories could be

dataset information

If the datasets are divided into skims then information on the contents should be provided (e.g. if a muon skim then something like "this dataset was created by selecting events that contained at least one muon that passed this trigger condition which was...").

List of pages needed for additional material (continued from #41)

VM (the element in the portal is the VM image - or a link to it)
Instructions
Validation report
Known problems

Analysis example final step (the element in the portal is the code in
https://github.com/ayrodrig/OutreachExercise2010)
Input data (are the files to be uploaded from Spain)
Instructions
Validation
Limitations

Analysis example - intermediate file production (the element in the portal is the code in
https://github.com/ayrodrig/pattuples2010)
Input data
Instructions
Validation
Limitations

Intermediate analysis files (the files to be uploaded from Spain, see above)
The usual metadata fields
Intructions (maybe the same than the ones above for the code)
Validation
Limitations

For the primary data sets, we could get started with the current template,
but how do you want is organized?
For example, as we think now - the text for "How the data were selected"
would be different for each sample.
The text for "Validation" is the same for all primary data sets now, but in the future
it may become different.
For "How to reuse", we would like to have a page called "Getting started",
which is a single set of instructions for all primary data sets (to start with)

Note also the remark from Tim in #41 that all data records should have clearly marked the copyright statement and licence for reuse, and from Sünje that the official label for CCZero, which is the one being used here (so far) is available here http://creativecommons.org/about/downloads

Can these templated to all elements in Limitation/Disclaimer section?

UI: site not working well for MSIE

The site currently does not work well for MSIE users. I'm testing with MSIE 9.0 via CERN WTS:

$ alias wts="rdesktop -d cern.ch -g 1024x768 -a 16 -k en-us -T TS cernts.cern.ch"

We could:

  1. improve the layout so that it would work with MSIE; this may be worth it if event display and reduced data set JS visualisation works well with MSIE. @tpmccauley have you checked ?

  2. detect the usage of MSIE and say something like:

      It seems you are using MS Internet Explorer 9 for which this site
      has not been optimised. Please consider using Firefox, Chromium, 
      or Safari instead.
    

    in a gentle way.

Due to the shortage of time, let's start by 2, and eventually implement 2 when time permits.

testimonials: update user quotes

For demo purposes, we have added some initial young user testimonial quotes taken from past IPPOG masterclasses. It would be good to update them.

search: pressing `Enter` invokes Add-to-search rather than Search

On the search page (/search), typing cms and pressing Enter does not invoke Search action, but the "Add-to-search" action is exectued instead. The default should be the former, not the latter.

Note that this was probably already fixed in latest "next", so it may be sufficient to upgrade Invenio.

CMS: setup collection to hold VM images

  • Set up new collection to hold VM images. The collection name can be: "CMS VM Images".
  • Add example record representing a VM image. Either take copy or at least link to places like http://cernvm.cern.ch/releases/CMS%20OpenData%20Latest.ova.

files: ROOT vs BIN file types

ROOT demo files (see #25) were uploaded on the SLC6 box as BIN type ones. The "automagic" recognition of file content vs file extension to be checked and amended...

home page menu: introduce links to experiments

In the home page menu panel next to LATEST, the name of experiments should be links pointing to experiments' records, e.g. CMS should point to /search?cc=CMS. (Others don't have any demo data yet.)

Also, Alice should be spelled ALICE.

Also, ATLAS is missing.

installation: add `invenio-previewer-ispy`

  • add -e git+https://github.com/inveniosoftware/invenio-previewer-ispy.git to requirements.txt
  • add invenio-previewer-ispy to install_require in setup.py
  • add ‘invenio_previewer_ispy’ to invenio_opendata.config:PACKAGES

prepare basic record formatting

After #1 is completed, the basic record format templates (both in brief and detailed outputs) should be adapted in order to match chosen metadata and the site style.

fixtures: collection setup

The collection set up should be amended to distinguish between (i) big datasets from (ii) small samples, and the corresponding fixtures should be committed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.