Lidar.nn

Discovering ancient sites from UK Lidar data

This repo contains a suite of tools used for training an image segmentation model on UK Lidar data with the aim of detecting archaeological features in the landscape.

The data

Features

The UK National Lidar Programme provides 1m-resolution digital terrain model (DTM) elevation data across a large fraction of the UK, made available by Defra. As Lidar is able to penetrate surface vegetation and trees, it can be used as a tool to discover topographical features not visible to satellite imagery. Lidar has been previously utilised in regions of dense forest canopy to identify sites of historic or archaeological significance, and its use has been growing.

The raw data from Defra consists of around 400GB across over 5000 tiles, each tile covering a 5km x 5km square in the British National Grid System system.

The data for each tile is stored in a directory containing a .tif with the raw elevation data (between 50-100MB in size per tile), and a subdirectory index/ containing geospatial metadata.

lidarnn_raw/LIDAR-DTM-1m-2022-NY11se/
|-- index/
|-- lidar_used_in_merging_process/
|-- SP52sw-DTM-1m.tif
|-- SP52sw-DTM-1m.tif.xml
`-- SP52sw-DTM-1m.tfw

Rather than training a model on raw high-precision elevation data, data/lidar_helper.py applies a hillshade filter as a part of the image preprocessing step. This serves a dual purpose of amplifying the local features we expect to be important, and compression.

Labels

The Historic England Scheduled Monuments dataset covers close to 20,000 scheduled monuments across the UK. Examples include Roman-era sites, barrows and tumuli, castles, earthworks, and the remains of ancient villages. Each element of this dataset consists of a detailed polygon tracing the perimeter of the feature which makes it an excellent candidate for use as a training label as it precisely masks the geographical features we expect to be present in the Lidar data.

The model

Our task is to build and train a neural network model that takes Lidar images as an input, and outputs a binary mask. The baseline model architecture chosen for this is based on a U-Net architecture, first developed for biomedical image segmentation.

The model can be found in model/unet.py

The code

Data pipeline

The scripts under data/ perform the bulk of data aquisition and preprocessing.

data/lidar_downloader.py SFTP interface (using paramiko) for connecting to the DEFRA ftp server, listing contents and downloading data in smaller chunks. For this to work, see How To Run.
data/lidar_helper.py Helper functions for processing Lidar data into model-ready features. Uses rasterio and geopandas.
data/lidar_plan.py manages processing pipeline asynchronously using task queues shared between multiple processes. This helps speed up overall work, as well as being able to run the pipeline on smaller chunks for testing before scaling up.

data/synthetic_data.py Utility for creating synthetic features using real masks with noise. Not a great model for real Lidar data, but handy for verifying model convergence and sanity.
util/data_loading.py Implementation of pytorch Dataset class for accessing the features and labels. This file contains both an implementation of LidarDataset and LidarDatasetSynthetic. This means we can swap

Model

model/unet.py Implementation of U-Net using pytorch.

Training

[Under construction]

train.py [TODO] Helper functions for training
train.ipynb [TODO] Notebook for running training + visualisation

Results

[Under construction]

How to run

If you want to run this code yourself, you will need to download the following data:

UK National Lidar Programme. Refer to the link for download options. If downloading via sftp, create the following file in the project root named sftpconfig.json:

{
  "SFTP_USER": "",
  "SFTP_HOST": "",
  "SFTP_PASSWORD": "",
  "SFTP_REMOTE_DIRECTORY": ""
}

Historic England Scheduled Monuments Extract to {SHAPE_PATH}/monuments/ such that this file exists {SHAPE_PATH}/monuments/Scheduled_monuments.shp.
UK boundaries. This is used to mask out the sea on tiles straddling the coastline. Extract to {SHAPE_PATH}/gb/.

Run full data pipeline

from data.lidar_plan import DataPipeline
from data.lidar_downloader import list_files

# Run this the first time - it lists the contents of all DTM files on the remote
# server and saves it to ./ls.txt. This file is subsequently used to manage the task queue.
list_files('sftpconfig.json', 'ls.txt')

pipeline = DataPipeline(
        data_raw_path = RAW_PATH, # location .zip files will be downloaded to
        data_out_path = OUT_PATH, # location preprocessed features and masks will be placed
        shape_path = SHAPE_PATH,   # location that the monuments and UK boundaries datasets extracted to
        remote_ls_file = 'ls.txt'
    )

# Run entire pipeline on first 500 items. By default it will spin up 1 process for downloading,
# 1 for unzipping, and 2 for preprocessing
pipeline.run(N=500)

See the if __name__=='__main__' section of data/lidar_plan.py for example usage

ansbalin / lidarnn Goto Github PK

lidarnn's Introduction

Lidar.nn

Discovering ancient sites from UK Lidar data

The data

Features

Labels

The model

The code

Data pipeline

Model

Training

Results

How to run

Run full data pipeline

lidarnn's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent