Giter VIP home page Giter VIP logo

amazonlinux-gdal's Introduction

amazonlinux-gdal

Create an AWS lambda like docker image with python 3 and GDAL.

Inspired from developmentseed/geolambda and mojodna/lambda-layer-rasterio.

The aim of this repo is to construct docker image to use when creating AWS Lambda package (with python 3.6).

GDAL Versions

  • 3.0.1 (5 July 2019) - python 3.7

    • Docker: remotepixel/amazonlinux-gdal:3.0.1
    • Github Branch: gdal3.0.1
  • 3.0.0 (9 May 2019) - python 3.7

    • Docker: remotepixel/amazonlinux-gdal:3.0.0
    • Github Branch: gdal3.0.0
  • 2.4.2 (5 July 2019) - python 3.6

    • Docker: remotepixel/amazonlinux-gdal:2.4.2
    • Github Branch: gdal2.4.2
  • 2.4.1 (22 March 2019) - python 3.6

    • Docker: remotepixel/amazonlinux-gdal:2.4.1
    • Github Branch: gdal2.4.1
  • 2.4.0 (14 Dec 2018) - python 3.6

    • Docker: remotepixel/amazonlinux-gdal:2.4.0
    • Github Branch: gdal2.4.0
  • 2.3.2 (21 Sep 2018) - python 3.6

    • Docker: remotepixel/amazonlinux-gdal:2.3.2
    • Github Branch: gdal2.3.2
Image with minimal support: -light (no HTTP/2, no JPEGTURBO)
  • 2.4.0 (14 Dec 2018) - python 3.6
    • Docker: remotepixel/amazonlinux-gdal:2.4.0-light
    • Github Branch: gdal2.4.0-light
  • 2.3.2 (21 Sep 2018) - python 3.6
    • Docker: remotepixel/amazonlinux-gdal:2.3.2-light
    • Github Branch: gdal2.3.2-light
Deprecated
  • 2.5.0dev (HEAD)
    • Docker: remotepixel/amazonlinux-gdal:2.5.0dev
    • Github Branch: gdal2.5.0dev
    • Docker: remotepixel/amazonlinux-gdal:gdal2.5.0dev-light
    • Github Branch: gdal2.5.0dev-light

Available Drivers (shipped with GDAL)

  • Proj4
  • GEOS
  • GeoTIFF
  • ZSTD
  • WEBP
  • JPEG2000
  • ngHTTP2 # Not in -light versions
  • curl # Not in -light versions
  • PNG # Not in -light versions
  • JPEGTURBO # Not in -light versions

Note: Drivers like curl and PNG are enabled by default, is using -light version, GDAL will use the default libs available on the amazonlinux instance.

Available drivers

  $ gdalinfo --formats
  Supported Formats:
    VRT -raster- (rw+v): Virtual Raster
    DERIVED -raster- (ro): Derived datasets using VRT pixel functions
    GTiff -raster- (rw+vs): GeoTIFF
    NITF -raster- (rw+vs): National Imagery Transmission Format
    RPFTOC -raster- (rovs): Raster Product Format TOC format
    ECRGTOC -raster- (rovs): ECRG TOC format
    HFA -raster- (rw+v): Erdas Imagine Images (.img)
    SAR_CEOS -raster- (rov): CEOS SAR Image
    CEOS -raster- (rov): CEOS Image
    JAXAPALSAR -raster- (rov): JAXA PALSAR Product Reader (Level 1.1/1.5)
    GFF -raster- (rov): Ground-based SAR Applications Testbed File Format (.gff)
    ELAS -raster- (rw+v): ELAS
    AIG -raster- (rov): Arc/Info Binary Grid
    AAIGrid -raster- (rwv): Arc/Info ASCII Grid
    GRASSASCIIGrid -raster- (rov): GRASS ASCII Grid
    SDTS -raster- (rov): SDTS Raster
    DTED -raster- (rwv): DTED Elevation Raster
    PNG -raster- (rwv): Portable Network Graphics
    JPEG -raster- (rwv): JPEG JFIF
    MEM -raster- (rw+): In Memory Raster
    JDEM -raster- (rov): Japanese DEM (.mem)
    ESAT -raster- (rov): Envisat Image Format
    XPM -raster- (rwv): X11 PixMap Format
    BMP -raster- (rw+v): MS Windows Device Independent Bitmap
    DIMAP -raster- (rov): SPOT DIMAP
    AirSAR -raster- (rov): AirSAR Polarimetric Image
    RS2 -raster- (rovs): RadarSat 2 XML Product
    SAFE -raster- (rov): Sentinel-1 SAR SAFE Product
    ILWIS -raster- (rw+v): ILWIS Raster Map
    SGI -raster- (rw+v): SGI Image File Format 1.0
    SRTMHGT -raster- (rwv): SRTMHGT File Format
    Leveller -raster- (rw+v): Leveller heightfield
    Terragen -raster- (rw+v): Terragen heightfield
    ISIS3 -raster- (rw+v): USGS Astrogeology ISIS cube (Version 3)
    ISIS2 -raster- (rw+v): USGS Astrogeology ISIS cube (Version 2)
    PDS -raster- (rov): NASA Planetary Data System
    PDS4 -raster- (rw+vs): NASA Planetary Data System 4
    VICAR -raster- (rov): MIPL VICAR file
    TIL -raster- (rov): EarthWatch .TIL
    ERS -raster- (rw+v): ERMapper .ers Labelled
    JP2OpenJPEG -raster,vector- (rwv): JPEG-2000 driver based on OpenJPEG library
    L1B -raster- (rovs): NOAA Polar Orbiter Level 1b Data Set
    FIT -raster- (rwv): FIT Image
    RMF -raster- (rw+v): Raster Matrix Format
    WCS -raster- (rovs): OGC Web Coverage Service
    WMS -raster- (rwvs): OGC Web Map Service
    MSGN -raster- (rov): EUMETSAT Archive native (.nat)
    RST -raster- (rw+v): Idrisi Raster A.1
    INGR -raster- (rw+v): Intergraph Raster
    GSAG -raster- (rwv): Golden Software ASCII Grid (.grd)
    GSBG -raster- (rw+v): Golden Software Binary Grid (.grd)
    GS7BG -raster- (rw+v): Golden Software 7 Binary Grid (.grd)
    COSAR -raster- (rov): COSAR Annotated Binary Matrix (TerraSAR-X)
    TSX -raster- (rov): TerraSAR-X Product
    COASP -raster- (ro): DRDC COASP SAR Processor Raster
    R -raster- (rwv): R Object Data Store
    MAP -raster- (rov): OziExplorer .MAP
    KMLSUPEROVERLAY -raster- (rwv): Kml Super Overlay
    WEBP -raster- (rwv): WEBP
    PDF -raster,vector- (w+): Geospatial PDF
    PLMOSAIC -raster- (ro): Planet Labs Mosaics API
    CALS -raster- (rwv): CALS (Type 1)
    WMTS -raster- (rwv): OGC Web Map Tile Service
    SENTINEL2 -raster- (rovs): Sentinel 2
    PNM -raster- (rw+v): Portable Pixmap Format (netpbm)
    DOQ1 -raster- (rov): USGS DOQ (Old Style)
    DOQ2 -raster- (rov): USGS DOQ (New Style)
    PAux -raster- (rw+v): PCI .aux Labelled
    MFF -raster- (rw+v): Vexcel MFF Raster
    MFF2 -raster- (rw+): Vexcel MFF2 (HKV) Raster
    FujiBAS -raster- (rov): Fuji BAS Scanner Image
    GSC -raster- (rov): GSC Geogrid
    FAST -raster- (rov): EOSAT FAST Format
    BT -raster- (rw+v): VTP .bt (Binary Terrain) 1.3 Format
    LAN -raster- (rw+v): Erdas .LAN/.GIS
    CPG -raster- (rov): Convair PolGASP
    IDA -raster- (rw+v): Image Data and Analysis
    NDF -raster- (rov): NLAPS Data Format
    EIR -raster- (rov): Erdas Imagine Raw
    DIPEx -raster- (rov): DIPEx
    LCP -raster- (rwv): FARSITE v.4 Landscape File (.lcp)
    GTX -raster- (rw+v): NOAA Vertical Datum .GTX
    LOSLAS -raster- (rov): NADCON .los/.las Datum Grid Shift
    NTv1 -raster- (rov): NTv1 Datum Grid Shift
    NTv2 -raster- (rw+vs): NTv2 Datum Grid Shift
    CTable2 -raster- (rw+v): CTable2 Datum Grid Shift
    ACE2 -raster- (rov): ACE2
    SNODAS -raster- (rov): Snow Data Assimilation System
    KRO -raster- (rw+v): KOLOR Raw
    ROI_PAC -raster- (rw+v): ROI_PAC raster
    RRASTER -raster- (rw+v): R Raster
    BYN -raster- (rw+v): Natural Resources Canada's Geoid
    ARG -raster- (rwv): Azavea Raster Grid format
    RIK -raster- (rov): Swedish Grid RIK (.rik)
    USGSDEM -raster- (rwv): USGS Optional ASCII DEM (and CDED)
    GXF -raster- (rov): GeoSoft Grid Exchange Format
    NWT_GRD -raster- (rw+v): Northwood Numeric Grid Format .grd/.tab
    NWT_GRC -raster- (rov): Northwood Classified Grid Format .grc/.tab
    ADRG -raster- (rw+vs): ARC Digitized Raster Graphics
    SRP -raster- (rovs): Standard Raster Product (ASRP/USRP)
    BLX -raster- (rwv): Magellan topo (.blx)
    SAGA -raster- (rw+v): SAGA GIS Binary Grid (.sdat, .sg-grd-z)
    XYZ -raster- (rwv): ASCII Gridded XYZ
    HF2 -raster- (rwv): HF2/HFZ heightfield raster
    OZI -raster- (rov): OziExplorer Image File
    CTG -raster- (rov): USGS LULC Composite Theme Grid
    E00GRID -raster- (rov): Arc/Info Export E00 GRID
    ZMap -raster- (rwv): ZMap Plus Grid
    NGSGEOID -raster- (rov): NOAA NGS Geoid Height Grids
    IRIS -raster- (rov): IRIS data (.PPI, .CAPPi etc)
    PRF -raster- (rov): Racurs PHOTOMOD PRF
    RDA -raster- (ro): DigitalGlobe Raster Data Access driver
    EEDAI -raster- (ros): Earth Engine Data API Image
    SIGDEM -raster- (rwv): Scaled Integer Gridded DEM .sigdem
    IGNFHeightASCIIGrid -raster- (rov): IGN France height correction ASCII Grid
    CAD -raster,vector- (rovs): AutoCAD Driver
    PLSCENES -raster,vector- (ro): Planet Labs Scenes API
    NGW -raster,vector- (rw+s): NextGIS Web
    GenBin -raster- (rov): Generic Binary (.hdr Labelled)
    ENVI -raster- (rw+v): ENVI .hdr Labelled
    EHdr -raster- (rw+v): ESRI .hdr Labelled
    ISCE -raster- (rw+v): ISCE raster
    HTTP -raster,vector- (ro): HTTP Fetching Wrapper

Use it on from DockerHub

FROM remotepixel/amazonlinux-gdal:{TAG}

Docker environment variables

A couple environment variables are set when creating the images:

  • PREFIX: Path where GDAL has been installed, shoud be /var/task
  • GDAL_DATA: $PREFIX/share/gdal
  • PROJ_LIB: $PREFIX/share/proj
  • GDAL_CONFIG: $PREFIX/bin/gdal-config
  • GEOS_CONFIG: $PREFIX/bin/geos-config
  • GDAL_VERSION: version of GDAL
  • PATH has been updated to add $PREFIX/bin in order to access gdal binaries

Create a Lambda package

docker run --name lambda -itd remotepixel/amazonlinux-gdal:2.4.1 /bin/bash
docker exec -it lambda bash -c 'pip3 install rasterio[s3] --no-binary numpy,rasterio -t /tmp/python -U'
docker exec -it lambda bash -c 'cd /tmp/python; zip -r9q /tmp/package.zip *'
docker exec -it lambda bash -c 'cd /var/task; zip -r9q --symlinks /tmp/package.zip lib/*.so*'
docker exec -it lambda bash -c 'cd /var/task; zip -r9q --symlinks /tmp/package.zip lib64/*.so*' # This step is not needed for `-light` image
docker exec -it lambda bash -c 'cd /var/task; zip -r9q /tmp/package.zip share'
docker cp lambda:/tmp/package.zip package.zip
docker stop lambda
docker rm lambda

You can find a more complex example in https://github.com/RemotePixel/remotepixel-tiler/blob/master/Dockerfile

Create a Lambda layer

TODO

Package architecture and AWS Lambda config

⚠️ AWS Lambda will need GDAL_DATA to be set to /var/task/share/gdal to be able to work ⚠️

package.zip
  |
  |___ lib/      # Shared libraries (GDAL, PROJ, GEOS...)
  |___ lib64/    # Shared libraries (64bits only)
  |___ share/    # GDAL/PROJ data directories   
  |___ rasterio/
  ....
  |___ other python module
Using HTTP/2 in AWS Lambda

By default libcurl shipped in AWS Lambda doesn't support http/2, this is why we created the docker images with custom libcurl (compiled with nghttp2). To enable HTTP/2 features in GDAL you'll need to set those differents environment variables:

  • GDAL_HTTP_MERGE_CONSECUTIVE_RANGES: YES
  • GDAL_HTTP_MULTIPLEX: YES
  • GDAL_HTTP_VERSION: 2

more info in #7

Shared libraries

By default the package will be unarhived in /var/task/ directory on AWS Lambda. The LD_LIBRARY_PATH is set to look in

/lib64:/usr/lib64:$LAMBDA_RUNTIME_DIR:$LAMBDA_RUNTIME_DIR/lib:$LAMBDA_TASK_ROOT:$LAMBDA_TASK_ROOT/lib:/opt/lib

which means it will be able to find any shared libs in /var/task/lib but not in /var/task/lib64 To overcome this the non -light version of GDAL have been compiled with /var/task/lib and /var/task/lib64 set as priority shared library path (-rpath).

more info in #7 (comment)

Optimal AWS Lambda config

  • GDAL_DATA: /var/task/share/gdal
  • GDAL_CACHEMAX: 512
  • VSI_CACHE: TRUE
  • VSI_CACHE_SIZE: 536870912
  • CPL_TMPDIR: "/tmp"
  • GDAL_HTTP_MERGE_CONSECUTIVE_RANGES: YES
  • GDAL_HTTP_MULTIPLEX: YES
  • GDAL_HTTP_VERSION: 2
  • GDAL_DISABLE_READDIR_ON_OPEN: "EMPTY_DIR"
  • CPL_VSIL_CURL_ALLOWED_EXTENSIONS: ".TIF,.tif,.jp2,.vrt"

amazonlinux-gdal's People

Contributors

vincentsarago avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

amazonlinux-gdal's Issues

archiving the repo

With the work started in #28 I've decided to move the new images over RemotePixel/amazonlinux repo

2.3 Can't get Rasterio working on Lambda using amazonlinux-gdal:2.3.0

On Lambda, I run into:

Traceback (most recent call last):
  File "/var/task/complete_surveys.py", line 84, in lambda_handler
    import rasterio
  File "/var/task/vendored/rasterio/__init__.py", line 23, in <module>
    from rasterio._base import gdal_version
ImportError: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /var/task/lib/libgdal.so)

It works if I switch to the amazonlinux-gdal:2.2.2 image.

I think the error means that the problem is something like: gdal was built against a different version of gcc than what is on the Lambda environment.

I wonder if the issue stems from where you start your image:

FROM amazonlinux:latest

Which means amazonlinux:2.0.20180622.1 (amazonlinux tags), which is pretty new.

But, Amazon seems to say that Lambda runs on an older-sounding AMI amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 (link).

AWS Lambda image is now shipped with HTTP2 support

curl 7.53.1 (x86_64-redhat-linux-gnu) libcurl/7.53.1 NSS/3.36 zlib/1.2.8 libidn2/0.16 libpsl/0.6.2 (+libicu/50.1.2) libssh2/1.4.2 nghttp2/1.21.1\nProtocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp \nFeatures: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz HTTP2 UnixSockets HTTPS-proxy PSL

We could remove the custom curl install but I'll need to check on GDAL side if version 7.53.1 is not too old

GDAL_DATA and PROJ_LIB not set correctly?

Warnings on AWS Lambda

When running on Lambda, I see 2 warnings:

[WARNING]	2018-05-25T15:44:39.443Z	88a...	GDAL data files not located, GDAL_DATA not set
[WARNING]	2018-05-25T15:44:39.443Z	88a...	PROJ data files not located, PROJ_LIB not set

When I look at your Docker image, I find this:

bash-4.2# $GDAL_DATA
bash: /tmp/app/local/lib/gdal: No such file or directory
bash-4.2# $PROJ_LIB
bash-4.2# 

$GDAL_DATA wasn't pointing to a folder that exists.

Solution on the Docker image

I can find a file that I think $GDAL_DATA needs to point to (or at least it's parent directory):

bash-4.2# find / -name gcs.csv
/tmp/app/local/share/gdal/gcs.csv

So maybe

  • $GDAL_DATA should point to: /tmp/app/local/share/gdal/
  • $PROJ_LIB should point to: /tmp/app/local/share/proj/

Solution for Lambda

Additionally you could add a line to your setup instructions:

Set these environment variables on AWS Lambda:
$GDAL_DATA to /var/task/share/gdal/
$PROJ_LIB to /var/task/share/proj/

Automatic lambda layer and docker image creation

With #24 and #21 I think it's clear that we should have a way to create/update the docker image and also to have the lambda layer deploy

As mentioned in #26 we should only focus on the latest GDAL version (3.0.1, 2.4.2 and master). For Python version it's a little bit difficult but I think supporting 3.6 and 3.7 should be enought ?

so at the end the process should create

  • gdal2.4.2-py3.7 (layer + docker image)
  • gdal2.4.2-py3.6 (layer + docker image)
  • gdal3.0.1-py3.7 (layer + docker image)
  • gdal3.0.1-py3.6 (layer + docker image)
  • gdalmaster-py3.7 (layer + docker image)
  • gdalmaster-py3.6 (layer + docker image)

To achieve this I think we should also refator the whole repo architecture

gdal2/
    /python36
    /python37
gdal3/
    /python36
    /python37
gdalmaster
    /python36
    /python37

every commit to master should trigger build/deploy

Allow setting PREFIX at build time

First of all, thanks so much for putting this together. GDAL is a pain to compile.

I'm using your images with https://github.com/UnitedIncome/serverless-python-requirements to get GDAL working on AWS Lambda. Unfortunately, there's a conflict because both this project's Dockerfile and serverless-python-requirements attempt to use /var/task in the container's file system, the former to install GDAL and the latter to mount a volume. The path is hardcoded in both cases.

Ideally, the mount path should be overridable in serverless-python-requirements. Until then, it wouldn't hurt to make the PREFIX, currently set by a hardcoded ENV in the Dockerfile, overridable at build time by using an ARG. I've created a PR that accomplishes precisely this: #18. The default is still set to /var/task, so it shouldn't break anything.

Correct paths resulting package.zip

I had some trouble creating a layer using the script at this repo.

According to AWS documentation, contents of package.zip will be extracted to /opt. So the correct environment variables, GDAL_DATA, and my inclusion, PROJ_LIB, should point to /opt/gdal and /opt/proj. I have included the PROJ_LIB variable to avoid the proj.db not found error.

Besides, for the python packages, instead of zipping them to the root address of the zip file, we should zip them to a python directory at root level.

In the end, it should look like this:

package.zip
├── bin     # executables
├── lib     # libraries
├── lib64   # libraries 64-bit
├── python  # python packages
└── share   # shared libraries

And last, installing rasterio with --no-binary raises the error of rasterio._shim module not existing. Installing with pip install rasterio fixes it. Also, I do not think is necessairy to install rasterio[s3] because boto3 is already included in python Lambdas by default.

After doing all the changes I have mentioned above, I've successfully created a layer that I am currently using.

add png support

Not sure how/why but I've disabled PNG support in GDAL.
libpng is shipped by default in AWS Lambda images so we could add it back without increasing the size of the package.

I'm going to add custom libpng in the regular gdal:{TAG} and just enable --with-png for the -light version

Issue with curl version

When using the 2.3.2 and 2.4.0 images I get this error when using Rasterio to open a TIF on S3:
CPLE_AppDefined in GDAL was built against curl 7.59.0, but is running against 7.51.0. Runtime failure is likely !

I do not hit that error when using your old 2.2.2 image.

Does this have to do with the curl version you specify here?

CURL_VERSION=7.59.0 \

I am confused about the branch and tag names vs gdal version

Hi! Thanks for your work! I depend on it.

My problem is that I can't say which GDAL version is used in each Docker Image tag. I think that the tag name indicates the GDAL version but that is not exactly backed up by your README.

Details

The README mentions two GDAL versions (https://github.com/RemotePixel/amazonlinux-gdal#gdal-versions): master and 2.4.0

Later (https://github.com/RemotePixel/amazonlinux-gdal/tree/gdalmaster#version) under Version the README states that GDAL 2.4.0 is used regardless of the branch.

At Docker Hub (https://hub.docker.com/r/remotepixel/amazonlinux-gdal/tags), there are 3 tags available:
2.3.2
2.4.0
master

Suggestion

  • In your README I would state the relationship between branch names and GDAL versions more explicitly, something like this:

GDAL Versions

The branch name tells you which GDAL version is used:

  • master: (GDAL commit 9598f77e1 - 18 Jan 2019)
  • gdal2.4.0: GDAL 2.4.0
  • If it is true that the Docker Image tag indicates the GDAL version, then I recommend that you state that clearly on Docker Hub. Because you use your GitHub Readme on Docker Hub you could ammend your Readme with something like:

Docker Images on Docker Hub

The tag indicates which version of GDAL is packaged with the image, just like the branch names in this repository. There may be more tags hosted on Docker Hub than there are branches in this repository, which just indicates that the images is an older one that I no longer maintain a branch for.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.