The NRCan DataHub project is an enterprise platform for storing, working with and collaborating on data initiatives. It is a central location for NRCan users to store any kind of data with a unified portal to facilitate data discovery and management, performing collaborative analysis, manipulating data using advanced analytics tools, and conducting data science experiments.
The DataHub aims to provide the following features:
- Trusted Enterprise platform for storing, working with, and collaborating on data initiatives.
- Common and integrated environment for working with data, including an intuitive interface and features designed for usability, collaboration, and mobility.
- Reduce barriers to entry for using the latest business intelligence and analytics tools, frameworks and technologies.
- Secure end-to-end data management
- Data Projects empower users to tell data stories, work with massive datasets, conduct analyses and experiment with new technologies.
- Connect Data Scientists & Analytics Users in Sectors & Collaborating with other platforms
This project is presently internal to NRCan and more details can be obtained by contacting on DataHub mailbox for any collaboration opportunities [email protected]
The DataHub makes it easy for multiple teams, labs or users to get access to ETL, Data Science or Analytical tools.
The web interface lets users browse data project, request access and work with the following tools:
- PowerBI Workspaces: Each data project can be associated with its own workspace.
- Storage Explorer & Databricks: For data science projects, a separate storage account and a databricks workspace are created. The portal includes a user friendly drag and drop user interface to browse the account, upload and download files.
- Form Builder: DataHub include a data model manager that lets user design data models which can be converted into Entity Framework Models and connected to Blazor Forms.
- Data Entry: Once deployed into a web application, the forms designed with the form builder can be accessed for each data project and convert complex legacy spreadsheets into user friendly web applications.
The diagram below shows the key components of the platform
This project includes multiple repositories
- DataHub Web Portal: This repository contains the code for all the portal and Azure Functions used to automate PowerBI & Databricks tasks
- DataHub Terraform: The terraform infrastructure for the project is stored in this repository and elements of the terraform script are dynamically generated from the Data Project database. Please contact us for details on the terraform infrastructure.
- DataHub Databricks: Databricks is used in this project for ETL, Data Science and other data transformations. Examples from this repository can be used as template for setting up new tasks.