Giter VIP home page Giter VIP logo

jkan's Introduction

opendata.scot (JKAN)

A lightweight, backend-free open data portal, powered by Jekyll and based off JKAN by timwis.

Contributing

Project management and issue reporting is managed from our the_od_bods repository.

Getting started

Project-specific instructions to come. In the meantime you may find the original JKAN documentation helpful.

Docker support

You can start this website using Docker by running docker compose up in your command line. Once the server is running you can access it at http://localhost:4000 or http://127.0.0.1:4000

jkan's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

jkan's Issues

Add ability to filter datasets by live status

Stemming from OpenDataScotland/the_od_bods#26
The ability to filter datasets with working links (vs dead links/ archived items)

Question: do we need this? Should we even list datasets with a dead url anyway? Maybe they should be dropped in processing, so this feature becomes redundant.

We do have an existing function that checks url status (see analytics.ipynb) but because of the time it takes to process the whole listing, we restrict this to local authority published datasets only. So, function exists, probably not the best, but could be modified to be better, attribute does not exist yet as such.

Related to task #32

Investigate Jekyll build improvements

Is your feature request related to a problem? Please describe.
Jekyll builds currently take about a minute on a good PC at the moment. We should see if we can cut this down. Output stats from a build:

Filename Occurrences in build Bytes Time (s)
 analytics_la_categories.html 1     83.74K 20.443
 _layouts/dataset.html 1697  51051.82K 14.421
 _includes/dataset-form.html 1698  39005.19K 11.584
 _includes/form/organization.html 1698  18805.58K 4.669
 _includes/dataset-form-resource.html 6563   8338.45K 3.031
 _layouts/default.html 1910  71050.41K 2.969
 _includes/head.html 1910   7584.38K 1.563
 _includes/header.html 1910   7575.97K 1.219
 data.json 1   2825.11K 1.05
 sitemap.xml 1    291.94K 1.022
 _includes/form/text.html 27044   6079.75K 0.936
 _includes/form/dropdown.html 6758   3398.73K 0.849
 _includes/breadcrumbs.html 1909    477.27K 0.562
 _layouts/organization.html 194   1399.37K 0.562
 _includes/form/category.html 1698   1719.94K 0.528
 _includes/form/license.html 1698   3323.18K 0.437
 _includes/organization-form.html 195    888.75K 0.435
 datasets.json 1   2714.95K 0.305
 organizations.html 1    163.61K 0.235
 _includes/addons/twittercard.html 1910   1088.24K 0.198
 _includes/display/category.html 1697    424.59K 0.178
 _includes/addons/opengraph.html 1910    882.33K 0.177
 _includes/form/org_type.html 195    219.29K 0.068
 _includes/form/textarea.html 1698   1256.97K 0.062
 _includes/display/link.html 1697    495.88K 0.048
 admin.html 1     41.77K 0.038
 analytics_platform_health.html 1    286.88K 0.037
 analytics_la_coverage.html 1      4.04K 0.036
 data/local_authorities.json 1     16.42K 0.036
 organizations.json 1     89.52K 0.021
 _includes/admin-form-category.html 18     21.03K 0.016
 _includes/footer.html 1910   2100.25K 0.013
 add-dataset.html 1     18.76K 0.006
 add-organization.html 1      4.82K 0.002
 _includes/admin-form-license.html 14     10.05K 0.002
 index.html 1      6.29K 0.002
 _includes/form/checkbox.html 18      3.01K 0.001
 analytics_portal_types.html 1      2.20K 0
 about_opendata.html 1      4.14K 0
 analytics_file_types.html 1      2.53K 0
 analytics_licensing.html 1      2.37K 0
 about_siteanalytics.html 1      0.89K 0
 resources.html 1     12.76K 0
 jekyll-redirect-from-0.16.0/lib/jekyll-redirect-from/redirect.html 3      1.70K 0
 datasets.html 1      0.96K 0
 soocon23.html 1      1.72K 0
 suggest_dataset.html 1      1.74K 0
 about_project.html 1      4.41K 0
 analytics.html 1      4.85K 0

Done in 65.089 seconds.

Describe the solution you'd like
Investigate if build can be optimised with any of the techniques in this article: https://forestry.io/blog/how-i-reduced-my-jekyll-build-time-by-61/

Describe alternatives you've considered
N/A

Additional context
N/A

Use nicer visuals in Analytics

Potentially replace static chart images with D3.js?

Pages/charts to do:

  • # of datasets by org
  • # of data files by org
  • Scorecard
  • Data portal sources (e.g. CKAN, ArcGIS, manual etc.)
  • File types/file types by org/average file types per dataset
  • File size distribution/average data files per dataset per org
  • Top 10 most popular tags/Top 10 most shared tags (after data cleaning)
  • Licence distribution (after data cleaning)

Change repo name to opendata.scot

Is your feature request related to a problem? Please describe.
When navigating our repoes, it's not always clear that the JKAN repo is the site repo for opendata.scot

Describe the solution you'd like
For clarity we should change this to opendata.scot

Describe alternatives you've considered
N/A

Additional context
Some things to consider when we make the change:

  • Make sure we keep the attribution to the original JKAN project: we wouldn't be here without them after all!
  • Consider impacts on the pipeline scripts in the_od_bods and opendata.scot_pipeline repositories as they may use hardcoded references to this repo name. Consider changing the repo name reference to be stored in a configuration file or as an environment variable.
  • Update all documentation to remove references to our JKAN repo and repoint to the opendata.scot repo

Allow for No organisation

We've a great deal of data in Wikidata (10,000's of items at a guess) which are now in PD - CC0.

Allow a dataset to be added for either No organisation (a fake org of 'None' ) or add w WikiData organisation to which to attribute the datasets (as SPARQL end points for queries to retrieve the data in question.

New Google Form: Submit portal/site/org for scraping

We should probably close OpenDataScotland/the_od_bods#100 and create an issue template as part of OpenDataScotland/the_od_bods#103 before we move this forward

In the interests of making it easier for people/orgs to contribute suggestions for datasets we could scrape, let's create a Google Form instead of forcing them to sign up for GitHub and open a new issue, which they may not be familiar with doing.

    graph TD
    A("👨‍💻 User identifies website to be scraped") --> B("👨‍💻 User submits Google Form");
    B --> C("🤖 GitHub Action picks up new submission")
    C --> D("🤖 GitHub Action opens new issue using template")

Consider revising title metadata on dataset page

Is your feature request related to a problem? Please describe.
Our dataset pages don't include the name of the organisation in the title which could lead to confusion in search results on search engines like Google (see screenshots below) if the user was just reading the page title.

Describe the solution you'd like
Consider adding organisation to the title metadata tags so that they can be included in search results

Describe alternatives you've considered
None

Additional context
image
image
image

Portal site-specific styling being passed through to JKAN site

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Our dataset pipelines take raw HTML from the descriptions of some datasets which means that they can often be littered with various tags that mess with the styling when outputted on opendata.scot (e.g. <h1> or any tag with a style) property. This can sometimes produce unexpected results like large text being outputted from header tags.

Expected behavior
Some of these styles or tags could be simplified (e.g. we could convert all header tags to just be bold and underlined)

Screenshots
image
Example from https://opendata.scot/datasets/dundee+city+council-housing+available+now/

Hardware and software used
N/A

Additional context
Whilst unlikely to happen, I have concerns that this could leave us vulnerable to XSS (cross-site scripting) attacks if we ended up loading JavaScript <script> tags in the description of datasets we pull from other websites. See this relevant article where someone registered an XSS attack payload as a company name on Companies House which had the knock on effect of XSSing websites that consumed data from the Companies House API: https://www.theregister.com/2020/10/30/companies_house_xss_silliness/

Refresh "Suggest a dataset" page on google form submission

When the google form is submitted, the table of submitted requests on the right doesn't update as the page doesn't refresh so it feels like the submission hasn't been successful.

Ideally, the page or table should refresh on form submission.

Add new org records for leisure and culture trusts

Is your feature request related to a problem? Please describe.
Some of the datasets being surfaced via local authority open data portals belong to their leisure and culture providers that are often separate organizational entities (e.g. Sport Aberdeen, ANGUSalive, Leisure and Culture Dundee). We need to record these org records in JKAN so the data links up correctly on opendata.scot

Describe the solution you'd like
Create org records in JKAN for each leisure and culture organization

Describe alternatives you've considered
N/A

Additional context
Some research will need to be done to get a list of leisure and culture providers for each local authority area to add them all.
We may need to consider adding a new organization type to JKAN as well as it may not fall under any of the existing ones.

Update references to old projects board

Describe the bug
Moving from the old projects board on the the_od_bods repo to the new 2023 board means references and links to the old board need updating

To Reproduce
N/A

Expected behavior
Locations
[x] opendata.scot footer
[x] https://opendata.scot/about/
[x] contributing docs
[x] old wiki (retired)
[x] org repo public README
[x] org repo private readme

Screenshots
N/A

Hardware and software used
N/A

Additional context
N/A

Handling FOI exempt organizations

Is your feature request related to a problem? Please describe.
We have a health report for displaying which orgs don't have a What Do They Know page attached to their record. The problem is, some orgs may never have a page as they are exempt (e.g. due to being a public body). See OpenDataScotland/the_od_bods#71 for discussion

Describe the solution you'd like
A way of marking orgs that are FOI exempt so this shows up in the health report. Should we consider also displaying this on the org's page? E.g. "this org is FOI exempt"

Describe alternatives you've considered
Continuing on as normal - would show as missing data on health report

Additional context
N/A

Request for Place Name Gazetteer data fails

Describe the bug

When I go to the webpage for the Place Name Gazetteer dataset, I find a link to the dataset under the 'Resources' heading, marked as media type 'WFS', which I presume is Web Feature Service, which seems to be a protocol for carrying out operations on geographic data over the Web. When I click on this link, I get an XML file back describing an error: 'Could not determine geoserver request from http request org.geoserver.monitor.MonitorServletRequest@2c5b19d'.

Here's the full error:

<ows:ExceptionReport version="2.0.0" xsi:schemaLocation="http://www.opengis.net/ows/1.1 https://geo.spatialhub.scot/geoserver/schemas/ows/1.1.0/owsAll.xsd">
    <ows:Exception exceptionCode="MissingParameterValue" locator="request">
        <ows:ExceptionText>
            Could not determine geoserver request from http request org.geoserver.monitor.MonitorServletRequest@2c5b19d
        </ows:ExceptionText>
    </ows:Exception>
</ows:ExceptionReport>

To Reproduce
Steps to reproduce the behavior:

  1. Go to the webpage for the Place Name Gazetteer dataset
  2. Click on the link to the Place Name Gazetteer data

Expected behavior

Data is returned

Actual behaviour

Error is returned as detailed above

Hardware and software used

  • Device: Laptop
  • OS: Windows 10
  • Browser: Firefox 123.0.1

Set up Selenium tests for opendata.scot

Is your feature request related to a problem? Please describe.
Adding some browser simulation tests that can be ran to make sure that the website is functioning as normal. To be ran either after a new site publish (usually on a weekly basis) or for testing when adding new features.

Describe the solution you'd like
Browser tests using Selenium to check various aspects of the website:

  • Datasets page loads and has datasets
  • Orgs page loads and has datasets
  • Open a dataset page
  • Open an org page

Selenium tests can be written using Python
https://vknight.org/unpeudemath/code/2014/12/23/using_python_and_selenium_for_a_jekyll_site.html

Describe alternatives you've considered
N/A

Additional context
N/A

Create organisation scorecard

Is your feature request related to a problem? Please describe.
For milestone 23Q1. Create a dynamic scorecard, very similar to the original one done in 2021 (bottom of the notebook) which will allow users to see which organisations are doing best in providing open data.

Describe the solution you'd like

  • A page in frontend, under Analytics
  • sources latest data in JKAN
  • preferable: user selection to include/exclude organisations to compare

Describe alternatives you've considered
It could be a static table that is generated every week on refresh, but seems rather archaic.

Additional context
Build of this is 2 parts:

  1. The scoring mechanisms
  2. The display in frontend

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.