Giter VIP home page Giter VIP logo

heat_pump_adoption_modelling's People

Contributors

athrado avatar ch-williamson avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

heat_pump_adoption_modelling's Issues

Documentation: Representativeness Study of EPC Dataset

Representativeness

EPC only covers 50% of all GB properties.

Check whether EPC data is represnetative of GB house stock, for example by comparing for various features whether the EPC distribution matches data found from other resources.

  • Are IMD deciles equally represented?
  • Is there really more social housing in Scotland?
  • EPC property density vs. actual property density
  • and more...

Supervised Model: Feature Encoding

Find a way to encode categorical features

  • Reduce number of categories for some features, e.g. window glazing type (single, double, triple glazing)
  • Ordinal encoding for features which have a natural order, e.g. efficiencies or energy rating
  • One-hot encoding for other categorical features

Nearest Neighbours Search

Premises

  • Assumption: That post code areas that might adopt heat pumps, but have not already, might be identified through sharing similar characteristics with those that have.
  • Geography: household/postcode level
  • Outcome: Similarity of non-heat pump households/postcodes to those with heat pumps

Model

  • Pipeline:
    • Dimensionality reduction of household data
    • Split into households with and without heat pumps
    • For each household with a heat pump, find the K nearest neighbours from households without heat pumps
  • Geography: postcode level
  • Outcomes: Similarity of non-heat pump postcodes to households with heat pumps

Outputs

  • Triangulation with other models to validate and evaluate our approaches by comparing the nearest neighbours detected with the postcodes/households predicted as adopters.
  • Household level predictions where the other approaches yield only postcode level analyses, this method will allow investigation of specific households.
  • Ability to test and explore assumptions about the groups who install heat pumps through combination with further qualitative research
  • Geographic analysis of which groups are more or less prevalent across regions of the UK

Issues

Inspect MCS sample data and get full dataset

Inspect MCS sample data and organise data exchange of full dataset.

  • NDA
  • Inspect sample data
  • Is this data really what we need?
  • Any other fields that we need?
  • Organise exchange of entire dataset

EPC Data for Northern Ireland

Problem: There is a EPC registry for NI but there doesn't seem to be a download.

  • Check whether there is really no way to download or get this data.
  • Could we scrape or reverse engineer the register look up to get the records for NI?

Merge datasets and create features for predictive model

We will be working with datasets at different geographic resolutions. We will need a strategy and process for:

  • Harmonising and merging the datasets at a single resolution
  • Constructing features from datasets that are not at the chosen resolution (e.g. taking the mean/median of continuous variables)

EPC Cleaning: EPC Duplicate Identification

Find a better way to identify duplicates in the EPC dataset, instead of simply using the first address line and postcode. Possibly use the MCS/EPC matching algorithm.

Geospatial Model

Premises

  • Outcome: number/rate of heat pump adoptions in a prescribed area over a fixed period of time (say, a year)
  • Geography: postcode level data.
  • Time window:
  • Explanatory variables: household characteristics (building type, total floor area, ...), energy-related (gas availability, energy rating, ...), socio-demographic (postcode IMD, …)
  • Study design: official register data.

Model

  • Model and link function: a Poisson regression model with log link will allow to model counts/rates (given a proper offset)
  • Structured spatial component: intrinsic Conditional Auto Regressive (iCAR) at the postcode-level
  • Variable transformation: non-linear relationships with the outcome will be allowed by entering the covariates via splines, when the interpretation of attached coefficients will not be of primary interest.

Outcomes

  • Outcome predictions at the desired geographical level will be available and will possess uncertainty measures around them.
  • Analysis of model’s residuals will allow to identify areas that, after adjustment for the explanatory variables, still exhibit behaviours that are extreme with respect to the overall average, as estimated by the model.
  • Exceedance probabilities, i.e. an estimate of the probability of exceeding a given threshold of likelihood to adopt a heat pump solution.
  • Visualisation: all of the above can be represented on maps, which can help improve readability of the findings and aid dissemination to a wider, non-technical audience.

Issues

Supervised Model

Premises

  • Target: expected % increase in heat pump adoptions in a prescribed area at time (year) t
    -Geography: postcode level data
  • Features: household characteristics (building type, total floor area, ...), energy-related (gas availability, energy rating, ...), socio-demographic (postcode IMD, …). Some features will be taken as a snapshot at t-1, while others will be taken for time period t-n to t-1 (where n is a parameter to be tuned)

Model

  • Model: linear regression as a baseline. Investigate more specific models such as XGBoost regressor.
  • Pipeline: Split data into target year (2019?) and feature years. Use variables to build additional features and use to predict % of heat pumps in a postcode (or % increase) for the target year. Feature engineering might include snapshots of variables and growth rates.
  • Prediction: Use model to predict for 2020*

In subsequent iterations we might explore VAR models or variations on LSTM neural networks.

*we know that 2020 was a highly disrupted year, but we can still make and inspect predictions

Outcomes

  • Predictions will be available at the post code level and for the year specified.
  • An analysis of the model’s residuals will be used to identify postcodes that adopted significantly more or less heat pumps than expected. In addition they will be used to explore the distribution of error dependent on splits among the features.
  • Investigation of specific model predictions to past growth can be used to further understand the patterns of heat pump adoption.

Issues

Data Audit

Description

The data audit is a way of assessing the suitability, feasibility and issues such as bias associated with using a particular dataset.

Data audit tool

Issues

Documentation: EPC cleaning and added features

Since we will (also) use the original EPC dataset including data on Wales, England and Scotland, we need to clean up some of the EPC variables. We also add some features for easier processing.

A short documentation shows which features are cleaned (and how), and which features are added and what original features and/or additional data they are based on.

Match MCS data to EPC data via address

Both datasets have 3 address lines, but not clear if they are exactly the same format.

Should be able to use same code as was used to deduplicate entries in the EPC data.

Inspect MCS data

  • Sanity checks - check all fields for e.g. extreme values, missingness, temporal and geographic distribution of installations, distribution of type of heat pump etc. (check with other mission team members
  • Identify whether we have all the fields we anticipate we will need

Getter for Deprivation Scores

  • Collect deprivation data for England, Wales and Scotland
  • Upload the data to S3 bucket
  • Write getter function for loading the data
  • Merge function (Deprivation data and EPC data)

EPC Cleaning: Handle Welsh and other non-English Entries

Review comment by Chris:
Some of the heating descriptions appear to be in Welsh (including, unbelievably, some of the records from England). I speak Welsh so can provide a translation though https://pypi.org/project/translate/ can probably do it faster :)

Watch out for encoding errors - there are some descriptions which say St+¦r which should be Stôr. There are also some bilingual descriptions such as 'Boiler and radiators, |Bwyler a rheiddiaduron, |bottled gas|nwy potel' which may confuse a translator.

Are any of the Scottish records in Scots/Gaelic... ?

Originally posted by @ch-williamson in #28 (comment)

Organise, upload and create getters for all EPC data

Organise in reasonable structure and upload EPC data versions:

  • EST cleansed EPC
  • EST cleansed EPC - with duplicates
  • Raw Scotland data
  • Raw England/Wales data
  • Preprocessed data: original, preprocessed, preprocessed+deduplicated

EPC Cleaning: Add more heating system categories

There are a few heating systems that slip through here such as 'Electric ceiling heating' as well as some fuels (coal, wood logs, smokeless fuel, anthracite, wood chips, wood pellets) - may not need to worry too much as we only care about heat pumps but raising just in case!

Lower priority

Originally posted by @ch-williamson in #28 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.