Giter VIP home page Giter VIP logo

ckanext-data-qld's Introduction

ckanext-data-qld

A custom CKAN extension for Data.Qld

Tests

Local environment setup

  • Make sure that you have latest versions of all required software installed:
  • Make sure that all local web development services are shut down (Apache/Nginx, Mysql, MAMP etc).
  • Checkout project repository (in one of the supported Docker directories).
  • pygmy up
  • ahoy build
  • You may need to use sudo on linux

Building on Ubuntu (optional: behind proxy)

  • composer from compose
    • sudo pip install docker-compose
  • sudo apt-get install composer
  • ensure /etc/gemrc has the following http_proxy: http://localhost:3128 https_proxy: http://localhost:3128
  • if squid proxy is in use on your machine ensure that acl localnet src 172.17.0.0/16 # allows your public ip for loopback
  • https://docs.docker.com/network/proxy/ ~/.docker/config.json { "proxies": { "default": { "httpProxy": "http://hostexternalip:3128", "httpsProxy": "http://hostexternalip:3128", "noProxy": "" } } }
    • https://docs.docker.com/config/daemon/systemd/ sudo mkdir -p /etc/systemd/system/docker.service.d sudo vi /etc/systemd/system/docker.service.d/http-proxy.conf [Service] Environment="HTTP_PROXY=http://localhost:3128/" sudo vi /etc/systemd/system/docker.service.d/https-proxy.conf [Service] Environment="HTTPS_PROXY=http://localhost:3128/" sudo systemctl daemon-reload sudo systemctl restart docker

Use admin/password to login to CKAN.

Available ahoy commands

Run each command as ahoy <command>.

 build        Build or rebuild project.
 clean        Remove containers and all build files.
 cli          Start a shell inside CLI container or run a command.
 doctor       Find problems with current project setup.
 down         Stop Docker containers and remove container, images, volumes and networks.
 flush-redis  Flush Redis cache.
 info         Print information about this project.
 install-site Install a site.
 lint         Lint code.
 logs         Show Docker logs.
 pull         Pull latest docker images.
 reset        Reset environment: remove containers, all build, manually created and Drupal-Dev files.
 restart      Restart all stopped and running Docker containers.
 start        Start existing Docker containers.
 stop         Stop running Docker containers.
 test-bdd     Run BDD tests.
 test-unit    Run unit tests.
 up           Build and start Docker containers.

Coding standards

Python code linting uses flake8 with configuration captured in .flake8 file.

Set ALLOW_LINT_FAIL=1 in .env to allow lint failures.

Nose tests

ahoy test-unit

Set ALLOW_UNIT_FAIL=1 in .env to allow unit test failures.

Behavioral tests

ahoy test-bdd

Set ALLOW_BDD_FAIL=1 in .env to allow BDD test failures.

How it works

We are using Behave BDD framework with additional step definitions provided by Behaving library.

Custom steps described in test/features/steps/steps.py.

Test scenarios located in test/features/*.feature files.

Test environment configuration is located in test/features/environment.py and is setup to connect to a remote Chrome instance running in a separate Docker container.

During the test, Behaving passes connection information to Splinter which instantiates WebDriver object and establishes connection with Chrome instance. All further communications with Chrome are handled through this driver, but in a developer-friendly way.

For a list of supported step-definitions, see https://github.com/ggozad/behaving#behavingweb-supported-matcherssteps.

Automated builds (Continuous Integration)

In software engineering, continuous integration (CI) is the practice of merging all developer working copies to a shared mainline several times a day. Before feature changes can be merged into a shared mainline, a complete build must run and pass all tests on CI server.

This project uses GitHub Actions as a CI server: it imports production backups into fully built codebase and runs code linting and tests. When tests pass, a deployment process is triggered for nominated branches (usually, master and develop).

Installation

  1. Clone this repository

  2. Activate python virtualenv and install extension:

     . /usr/lib/ckan/default/bin/activate
     cd /usr/lib/ckan/default/src/ckanext-data-qld
     python setup.py develop
    
  3. Add the extension to the relevant CKAN .ini file plugins definition:

     ckan.plugins = ... data_qld
    

data_qld_google_analytics

A custom CKAN extension for Data.Qld for sending API requests to Google Analytics

Setup

  1. Add the extension to the relevant CKAN .ini file plugins definition:

     ckan.plugins = ... data_qld_google_analytics
    
  2. Add the config settings to relevant CKAN .ini file

     # ckanext-data_qld_googleanalytics
     ckan.data_qld_googleanalytics.id = UA-1010101-1 # Relevant Google analytics ID
     ckan.data_qld_googleanalytics.collection_url = http://www.google-analytics.com/collect
    
     # change when reporting starts from, default is 2022-11-01
     ckanext.data_qld.reporting.de_identified_no_schema.count_from = 2045-01-01
    
  3. The file capture_api_actions.json is a dictionary of api actions to capture to send to google analytics

a. The dictionary key is the name of the api_action from https://docs.ckan.org/en/2.8/api/index.html#action-api-reference b. The dictionary value is the event_label sent to google analytics with the {0} replaced with the query parameter value eg. package_id, resource_id, query values, sql query

  1. Restart web server(s), e.g.

     sudo service apache reload
     sudo service nginx reload
    

Migrating Legacy Extra Fields

Note: The following assumes that a dump of production data has been imported into the CKAN database and any necessary database schema updates have been performed (ref.: https://docs.ckan.org/en/2.8/maintaining/database-management.html#upgrading).

Previously, the "Security classification" and "Used in data-driven application" fields had been added as free extras to datasets, e.g.

These fields are now part of the dataset schema via the scheming extension (ref.: https://github.com/qld-gov-au/ckanext-data-qld/blob/develop/ckanext/data_qld/ckan_dataset.json)

The legacy field values need to be migrated to their schema counterparts.

The ckanext-data-qld extension contains a paster command for doing this (ref.: https://github.com/qld-gov-au/ckanext-data-qld/blob/develop/ckanext/data_qld/commands.py)

Note: This paster command was designed to be run once for initial migration of legacy extra fields, so isn't idempotent. If migration command is run multiple times there should not be any issues, but it is not recommended to be used again once the site is live

To run the command:

  1. Enable the python virtual environment:

     . /usr/lib/ckan/default/bin/activate
    
  2. Run the following command:

     paster --plugin=ckanext-data-qld migrate_extras -c /path/to/ini_file.ini
    
  3. Rebuild the Solr index:

     paster --plugin=ckan search-index rebuild -c /PATH/TO/YOUR_INI_FILE.ini
    

This will iterate through each of the datasets in CKAN and copy the "Security classification" and "Used in data-driven application" extra field values to the dataset schema fields security_classification and data_driven_application respectively.

Demoting Publishers to Editor role

Note: The following assumes that a dump of production data has been imported into the CKAN database and any necessary database schema updates have been performed (ref.: https://docs.ckan.org/en/2.8/maintaining/database-management.html#upgrading).

There is a paster command to set the role for any users with names starting with publisher- from admin to editor where necessary.

Note: This paster command was designed to be run once. If run multiple times there should not be any issues, as it will not re-assign users who are already assigned the editor role - but it is not designed to be idempotent

To run the command:

  1. Enable the python virtual environment:

     . /usr/lib/ckan/default/bin/activate
    
  2. Run the following command:

     paster --plugin=ckanext-data-qld demote_publishers -c /etc/ckan/default/development.ini
    

Data.Qld Engagement Reporting Plugin

Configuration

  1. Add the extension to the relevant CKAN .ini file plugins definition:

     ckan.plugins = ... data_qld_reporting
    
  2. Add the following optional config settings to relevant CKAN .ini file, if desired:

    # ckanext-data_qld_reporting
    ckan.reporting.datarequest_open_max_days = 60 # Defaults to 60
    ckan.reporting.comment_no_reply_max_days = 10 # Defaults to 10
    ckan.reporting.engagement_json_config = PATH_TO_FILE # Defaults to os.path.dirname(os.path.realpath(__file__)) + '/../engagement_report_csv.json'
    ckan.reporting.admin_json_config = PATH_TO_FILE # Defaults to os.path.dirname(os.path.realpath(__file__)) + '/../admin_report_csv.json'
    

ckanext-data-qld's People

Contributors

alexskrypnyk avatar andersmx avatar asifaminb avatar awset avatar dependabot-preview[bot] avatar duttonw avatar ganapavz avatar iaroslav13 avatar icesun avatar markcalvert avatar mitch1011 avatar mutantsan avatar rosswebsterwork avatar salsa-nathan avatar smotornyuk avatar thrawnca avatar tino097 avatar tonymcneil avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ckanext-data-qld's Issues

ungettext removed in 2.10.x and g AND h is also not surfaced.

needs to be fixed in
https://github.com/qld-gov-au/ckanext-data-qld/tree/develop/ckanext/data_qld/templates/activity_streams
ungettext changed to ngettext.

also:
jinja2.exceptions.UndefinedError: 'g' is undefined
jinja2.exceptions.UndefinedError: 'h' is undefined

File "/ckan_venv/src/ckanext-data-qld/ckanext/data_qld/templates/activity_streams/activity_stream_email_notifications.text", line 3, in top-level template code
{{ ungettext("You have {num} new activity on your {site_title} dashboard", "You have {num} new activities on your {site_title} dashboard", num).format(site_title=g.site_title, num=num) }}:
File "/ckan_venv/lib64/python3.7/site-packages/jinja2/utils.py", line 83, in from_obj
if hasattr(obj, "jinja_pass_arg"):
jinja2.exceptions.UndefinedError: 'ungettext' is undefined

2.10 is

{% set num = activities|length %}{{ ngettext("You have {num} new activity on your {site_title} dashboard", "You have {num} new activities on your {site_title} dashboard", num).format(site_title=g.site_title if g else site_title, num=num) }} {{ _('To view your dashboard, click on this link:') }}
{% url_for 'dashboard.index', _external=True %}
{{ _('You can turn off these email notifications in your {site_title} preferences. To change your preferences, click on this link:').format(site_title=g.site_title if g else site_title) }}
{% url_for 'user.edit', _external=True %}

ours is

{%- set num = activities|length -%}
{{ ungettext("You have {num} new activity on your {site_title} dashboard", "You have {num} new activities on your {site_title} dashboard", num).format(site_title=g.site_title, num=num) }}:
{% for activity in activities -%}
    {%- set data = activity['data'] if activity['data'] else None -%}
    {%- set activity_type = activity['activity_type'] if activity['activity_type'] else None -%}
    {%- set id = activity['object_id'] -%}
    {%- if data -%}
        {%- if data['package'] -%}
            {%- set name = data['package']['title'] -%}
            {%- set action = 'dataset.read' -%}
        {%- elif data['group'] -%}
            {%- set name = data['group']['title'] -%}
            {%- set action = 'organization.read' if activity_type == 'changed organization' else 'group.read' -%}
        {%- endif -%}
    {%- endif -%}
    {% if action and id %}{{name}} ({{ h.activity_type_nice(activity_type)|capitalize }}) {{ h.url_for(action, id=id, _external=True) }}{% if activity_type %}{% endif %}{% endif %}
{% endfor %}
{{ _('To view your dashboard, click on this link:') }}
{{ g.site_url + '/dashboard' }}
{{ _('You can turn off these email notifications in your {site_title} preferences. To change your preferences, click on this link:').format(site_title=g.site_title) }}
{{ g.site_url + '/user/edit' }}

Worker thread failure from: data_qld/validation.py, in process_schema_fields

data_qld/validation.py, in process_schema_fields is not verifying that its on web thread or background worker thread.

since tk.request.files does not exist on worker thread.

File "/srv/app/src/ckanext-data-qld/ckanext/data_qld/validation.py", line 78, in read_schema_from_request
form_data = tk.request.files

2023-03-06 06:19:27,009 DEBUG [ckanext.validation.jobs] Validating resource: f6ac0fd9-84dd-4d98-b146-cc53ac3d23b5
2023-03-06 06:19:27,010 DEBUG [ckanext.validation.validation_status_helper] updateValidationJobStatus: f6ac0fd9-84dd-4d98-b146-cc53ac3d23b5 status: running
2023-03-06 06:19:27,015 DEBUG [ckanext.validation.validation_status_helper] getValidationJob: f6ac0fd9-84dd-4d98-b146-cc53ac3d23b5
2023-03-06 06:19:27,058 DEBUG [ckanext.harvest.model] Harvest tables already exist
2023-03-06 06:19:27,073 DEBUG [ckanext.harvest.model] Harvest tables already exist
2023-03-06 06:19:27,077 WARNI [ckanext.csrf_filter.anti_csrf] Site ParseResult(scheme='http', netloc='ckan:5000', path='', params='', query='', fragment='') is not secure! CSRF tokens may be exposed!
2023-03-06 06:19:27,077 INFO  [ckanext.csrf_filter.anti_csrf] Obtained secret key from beaker.session.secret
2023-03-06 06:19:27,078 DEBUG [ckanext.resource_type_validation.resource_type_validation] Allowed file extensions: ['accdb', 'asc', 'cdf', 'csv', 'doc', 'docx', 'ecw', 'esri', 'fgdb', 'gdb', 'geojson', 'geotiff', 'gpkg', 'gpx', 'html', 'jp2', 'jpeg', 'jpg', 'json', 'kml', 'kmz', 'mdb', 'mtl', 'n3', 'nc', 'obj', 'parquet', 'pdf', 'png', 'ppt', 'pqt', 'pptx', 'rdf', 'rtf', 'shp', 'sparql', 'tab', 'tif', 'tiff', 'topojson', 'tsv', 'ttf', 'txt', 'wfs', 'wmts', 'xls', 'xlsx', 'xml', 'zip']
2023-03-06 06:19:27,106 WARNI [ckanext.csrf_filter.anti_csrf] Site ParseResult(scheme='http', netloc='ckan:5000', path='', params='', query='', fragment='') is not secure! CSRF tokens may be exposed!
2023-03-06 06:19:27,106 INFO  [ckanext.csrf_filter.anti_csrf] Obtained secret key from beaker.session.secret
2023-03-06 06:19:27,107 DEBUG [ckanext.resource_type_validation.resource_type_validation] Allowed file extensions: ['accdb', 'asc', 'cdf', 'csv', 'doc', 'docx', 'ecw', 'esri', 'fgdb', 'gdb', 'geojson', 'geotiff', 'gpkg', 'gpx', 'html', 'jp2', 'jpeg', 'jpg', 'json', 'kml', 'kmz', 'mdb', 'mtl', 'n3', 'nc', 'obj', 'parquet', 'pdf', 'png', 'ppt', 'pqt', 'pptx', 'rdf', 'rtf', 'shp', 'sparql', 'tab', 'tif', 'tiff', 'topojson', 'tsv', 'ttf', 'txt', 'wfs', 'wmts', 'xls', 'xlsx', 'xml', 'zip']
2023-03-06 06:19:27,322 DEBUG [ckanext.validation.jobs] Validating source: /var/lib/ckan/resources/f6a/c0f/d9-84dd-4d98-b146-cc53ac3d23b5
2023-03-06 06:19:27,323 DEBUG [ckanext.validation.validation_status_helper] updateValidationJobStatus: f6ac0fd9-84dd-4d98-b146-cc53ac3d23b5 status: success
2023-03-06 06:19:27,361 INFO  [ckanext.qa.tasks] Openness scoring package random_package (1 resources)
2023-03-06 06:19:27,367 INFO  [ckanext.qa.sniff_format] Sniffing file format of: /var/lib/ckan/resources/f6a/c0f/d9-84dd-4d98-b146-cc53ac3d23b5
2023-03-06 06:19:27,368 INFO  [ckanext.qa.sniff_format] Magic detects file as: application/csv
2023-03-06 06:19:27,377 INFO  [ckanext.qa.sniff_format] Is CSV because 2.0 cells per row (22 cells, 11 rows)
2023-03-06 06:19:27,381 DEBUG [ckanext.resource_type_validation.resource_type_validation] f6ac0fd9-84dd-4d98-b146-cc53ac3d23b5 is not an uploaded resource, skipping validation
2023-03-06 06:19:27,381 INFO  [ckanext.qa.sniff_format] Mimetype translates to filetype: CSV
2023-03-06 06:19:27,385 INFO  [ckanext.qa.tasks] Score: 3 Reason: Content of file appeared to be format "CSV" which receives openness score: 3.
2023-03-06 06:19:27,386 INFO  [ckanext.qa.tasks] Openness scoring: 
{'openness_score': 3, 'openness_score_reason': 'Content of file appeared to be format "CSV" which receives openness score: 3.', 'format': 'CSV', 'archival_timestamp': '2023-03-06T06:19:25.904667'}
<Resource id=f6ac0fd9-84dd-4d98-b146-cc53ac3d23b5 package_id=0d6e776c-fcbe-408d-8744-fbe24fb92932 url=test.csv format=csv description=Former world name consumer laugh street debate let. hash= position=0 name=invisible-resource resource_type=None mimetype=application/csv mimetype_inner=None size=7884 created=2023-03-06 06:19:25.275977 last_modified=2023-03-06 06:19:25.483859 metadata_modified=2023-03-06 06:19:25.486172 cache_url=None cache_last_updated=None url_type=upload extras={'_xloader': False, 'align_default_schema': False, 'datastore_active': False, 'datastore_contains_all_records_of_source_file': False, 'governance_acknowledgement': 'NO', 'privacy_assessment_result': 'Nor career particularly reason nearly project small dinner.', 'resource_visible': 'FALSE', 'schema': '{"fields": [{"format": "default", "name": "Game Number", "type": "integer"}, {"format": "default", "name": "Game Length", "type": "integer"}], "missingValues": ["Resource schema"]}'} state=active>
'test.csv'


2023-03-06 06:19:27,402 INFO  [ckanext.qa.tasks] QA results updated ok
2023-03-06 06:19:27,402 INFO  [ckanext.qa.tasks] CKAN updated with openness score
2023-03-06 06:19:27,429 ERROR [ckan.lib.jobs] Job 849078b5-490c-4d61-9848-a8f8790db8ff on worker rq:worker:c773b50aa48b48558fe5c6de33db17e1 raised an exception: Working outside of request context.

This typically means that you attempted to use functionality that needed
an active HTTP request.  Consult the documentation on testing for
information about how to avoid this problem.
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/rq/worker.py", line 812, in perform_job
    rv = job.perform()
  File "/usr/lib/python3.8/site-packages/rq/job.py", line 588, in perform
    self._result = self._execute()
  File "/usr/lib/python3.8/site-packages/rq/job.py", line 594, in _execute
    return self.func(*self.args, **self.kwargs)
  File "/srv/app/src/ckanext-validation/ckanext/validation/jobs.py", line 104, in run_validation_job
    t.get_action('resource_patch')(
  File "/srv/app/src/ckan/ckan/logic/__init__.py", line 504, in wrapped
    result = _action(context, data_dict, **kw)
  File "/srv/app/src/ckan/ckan/logic/action/patch.py", line 81, in resource_patch
    return _update.resource_update(context, patched)
  File "/srv/app/src/ckan/ckan/logic/action/update.py", line 107, in resource_update
    updated_pkg_dict = _get_action('package_update')(context, pkg_dict)
  File "/srv/app/src/ckan/ckan/logic/__init__.py", line 504, in wrapped
    result = _action(context, data_dict, **kw)
  File "/srv/app/src/ckan/ckan/logic/action/update.py", line 301, in package_update
    data, errors = lib_plugins.plugin_validate(
  File "/srv/app/src/ckan/ckan/lib/plugins.py", line 302, in plugin_validate
    result = plugin.validate(context, data_dict, schema, action)
  File "/srv/app/src/ckanext-scheming/ckanext/scheming/plugins.py", line 308, in validate
    return navl_validate(data_dict, schema, context)
  File "/srv/app/src/ckan/ckan/lib/navl/dictization_functions.py", line 285, in validate
    converted_data, errors = _validate(flattened, schema, validators_context)
  File "/srv/app/src/ckan/ckan/lib/navl/dictization_functions.py", line 335, in _validate
    convert(converter, key, converted_data, errors, context)
  File "/srv/app/src/ckan/ckan/lib/navl/dictization_functions.py", line 237, in convert
    converter(key, converted_data, errors, context)
  File "/srv/app/src/ckanext-data-qld/ckanext/data_qld/validation.py", line 49, in process_schema_fields
    schema_from_upload_request = read_schema_from_request()
  File "/srv/app/src/ckanext-data-qld/ckanext/data_qld/validation.py", line 78, in read_schema_from_request
    form_data = tk.request.files
  File "/usr/lib/python3.8/site-packages/werkzeug/local.py", line 347, in __getattr__
    return getattr(self._get_current_object(), name)
  File "/usr/lib/python3.8/site-packages/werkzeug/local.py", line 347, in __getattr__
    return getattr(self._get_current_object(), name)
  File "/usr/lib/python3.8/site-packages/werkzeug/local.py", line 306, in _get_current_object
    return self.__local()
  File "/usr/lib/python3.8/site-packages/flask/globals.py", line 38, in _lookup_req_object
    raise RuntimeError(_request_ctx_err_msg)
RuntimeError: Working outside of request context.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.