Giter VIP home page Giter VIP logo

lidarnn's Introduction

Lidar.nn

Discovering ancient sites from UK Lidar data

This repo contains a suite of tools used for training an image segmentation model on UK Lidar data with the aim of detecting archaeological features in the landscape.

The data

example_data

Features

The UK National Lidar Programme provides 1m-resolution digital terrain model (DTM) elevation data across a large fraction of the UK, made available by Defra. As Lidar is able to penetrate surface vegetation and trees, it can be used as a tool to discover topographical features not visible to satellite imagery. Lidar has been previously utilised in regions of dense forest canopy to identify sites of historic or archaeological significance, and its use has been growing.

The raw data from Defra consists of around 400GB across over 5000 tiles, each tile covering a 5km x 5km square in the British National Grid System system.

The data for each tile is stored in a directory containing a .tif with the raw elevation data (between 50-100MB in size per tile), and a subdirectory index/ containing geospatial metadata.

lidarnn_raw/LIDAR-DTM-1m-2022-NY11se/
|-- index/
|-- lidar_used_in_merging_process/
|-- SP52sw-DTM-1m.tif
|-- SP52sw-DTM-1m.tif.xml
`-- SP52sw-DTM-1m.tfw

Rather than training a model on raw high-precision elevation data, data/lidar_helper.py applies a hillshade filter as a part of the image preprocessing step. This serves a dual purpose of amplifying the local features we expect to be important, and compression.

Labels

The Historic England Scheduled Monuments dataset covers close to 20,000 scheduled monuments across the UK. Examples include Roman-era sites, barrows and tumuli, castles, earthworks, and the remains of ancient villages. Each element of this dataset consists of a detailed polygon tracing the perimeter of the feature which makes it an excellent candidate for use as a training label as it precisely masks the geographical features we expect to be present in the Lidar data.

The model

Our task is to build and train a neural network model that takes Lidar images as an input, and outputs a binary mask. The baseline model architecture chosen for this is based on a U-Net architecture, first developed for biomedical image segmentation.

The model can be found in model/unet.py

The code

Data pipeline

The scripts under data/ perform the bulk of data aquisition and preprocessing.

  • data/lidar_downloader.py SFTP interface (using paramiko) for connecting to the DEFRA ftp server, listing contents and downloading data in smaller chunks. For this to work, see How To Run.

  • data/lidar_helper.py Helper functions for processing Lidar data into model-ready features. Uses rasterio and geopandas.

  • data/lidar_plan.py manages processing pipeline asynchronously using task queues shared between multiple processes. This helps speed up overall work, as well as being able to run the pipeline on smaller chunks for testing before scaling up.

lidar_plan

  • data/synthetic_data.py Utility for creating synthetic features using real masks with noise. Not a great model for real Lidar data, but handy for verifying model convergence and sanity.

  • util/data_loading.py Implementation of pytorch Dataset class for accessing the features and labels. This file contains both an implementation of LidarDataset and LidarDatasetSynthetic. This means we can swap

Model

  • model/unet.py Implementation of U-Net using pytorch.

Training

[Under construction]

  • train.py [TODO] Helper functions for training
  • train.ipynb [TODO] Notebook for running training + visualisation

Results

[Under construction]

How to run

If you want to run this code yourself, you will need to download the following data:

  1. UK National Lidar Programme. Refer to the link for download options. If downloading via sftp, create the following file in the project root named sftpconfig.json:
{
  "SFTP_USER": "",
  "SFTP_HOST": "",
  "SFTP_PASSWORD": "",
  "SFTP_REMOTE_DIRECTORY": ""
}
  1. Historic England Scheduled Monuments Extract to {SHAPE_PATH}/monuments/ such that this file exists {SHAPE_PATH}/monuments/Scheduled_monuments.shp.
  2. UK boundaries. This is used to mask out the sea on tiles straddling the coastline. Extract to {SHAPE_PATH}/gb/.

Run full data pipeline

from data.lidar_plan import DataPipeline
from data.lidar_downloader import list_files

# Run this the first time - it lists the contents of all DTM files on the remote
# server and saves it to ./ls.txt. This file is subsequently used to manage the task queue.
list_files('sftpconfig.json', 'ls.txt')

pipeline = DataPipeline(
        data_raw_path = RAW_PATH, # location .zip files will be downloaded to
        data_out_path = OUT_PATH, # location preprocessed features and masks will be placed
        shape_path = SHAPE_PATH,   # location that the monuments and UK boundaries datasets extracted to
        remote_ls_file = 'ls.txt'
    )

# Run entire pipeline on first 500 items. By default it will spin up 1 process for downloading,
# 1 for unzipping, and 2 for preprocessing
pipeline.run(N=500)

See the if __name__=='__main__' section of data/lidar_plan.py for example usage

lidarnn's People

Contributors

ansbalin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.