Giter VIP home page Giter VIP logo

unpackqa's Introduction

unpackqa test-package PyPI

A python package for QA/QC bit unpacking and labeling in earth science data products

  • Works with single QA values, 1D arrays (eg. time series), or 2D arrays (eg. full scenes).
  • Get a QA mask in a single line of code: unpackqa.unpack_to_array(qa,'LANDSAT_8_C2_L2_QAPixel','Cloud')
  • The same methods are used for all sensors, with specific product flags specified via arguments.
  • Common data products are included. Specifying bit flag information manually is also supported.
  • No file reading or writing, everything is handled as pre-loaded arrays.
  • Requires python 3.6+, with numpy and pyyaml as the only dependencies.

Installation

Install via pip:

pip install unpackqa

Documentation

https://sdtaylor.github.io/unpackqa

Quickstart

import numpy as np
from unpackqa import unpack_to_array

# Specify the Landsat 8 Collection 2 Level 2 QA Pixel
# see all identifiers in unpackqa.list_products()
l8_identifer = 'LANDSAT_8_C2_L2_QAPixel'

qa_array = np.array([[21284,0],[21284,0]])

unpack_to_array(qa_array, product = l8_identifer)
array([[[0, 0, 1, 0, 0, 1, 0, 0, 3, 0, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],

       [[0, 0, 1, 0, 0, 1, 0, 0, 3, 0, 1, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]], dtype=uint8)

# The original shape is retained, with a new axis for the 
# 12 Landsat 8 QA flags. 
qa_array.shape
(2,2)
unpack_to_array(qa_array, product = l8_identifer).shape
(2,2,12)

# Masks for individual flags can also be obtained from a dictionary object
from unpackqa import unpack_to_dict

# See flags for each product with unpackqa.list_qa_flags()
flags = ['Cloud','Cloud Shadow']

flag_masks = unpack_to_dict(qa_array, product = l8_identifer, flags=flags)
flag_masks['Cloud'].shape
(2,2)
flag_masks['Cloud Shadow'].shape
(2,2)

unpackqa's People

Contributors

sdtaylor avatar

Stargazers

JK Han avatar Houcai Guo avatar Júlio avatar

Watchers

James Cloos avatar  avatar

unpackqa's Issues

custom product qa

Open for when something isn't configured here, but the user knows what they need.

essentially just need to recreate the required product fields
for example

product_info = dict(flag_info = {'flag1':[0],
                                 'flag2':[1],
                                 'flag3':[2,3]},
                     max_value = 65535,
                     num_bits  = 8)

unpack_to_array(qa, 'manual', custom_product_info = product_info)

or just make the product arg accept a dict and when that happens assume it's a custom one

unpack_to_array(qa, product = custom_product_info)

Either way there needs to be some product info validation similar to whats in test_products.py

user facing bit packing routine pseudocode

outlines for functions to reverse unpack_to_array and unpack_to_dict()

def pack_from_dict(flag_dict, product, num_bits):
    """
    Where keys are flag names, values are arrays all the same shape. 
    product is a custom product spec. num_bits should be 8,16, or 32
    """
    
    qa_array = np.empty(shaped like input, dtyp = product[numbits])
    
    for flag in products:
        if 1 bit:
            qa_array[flag_bit] = qa_dict[flag]
    
        if > 1 bit:
            # ie. take an array of possible values 0,1,2,3 and convert to two bits.
            temp_qa_array = unpackbits(qa_dict(flag))
            for ii in bits:
                qa_array[flag_bit] = temp_qa_array[flag_bit] # or something like that

potential max range adjustment

Noticed in the landsat 4-7 guide that the valid range is different from the potential max range. I should potentially adjust the max_values settings to this or make a new setting.

image

about section on "confirmed products"

Need a quick explainer that there are several sources for a lot of data. The original bit descriptors are obtained from the official reference doc. confirmed products are those which I went and looked and found the product description and/or bit descriptor to match the official doc.

Bit unpacking speed tests

Bit unpacking can be pretty CPU and/or memory intensive depending on how you do it. Here's a comparison of various methods.

The first 5 methods were outlined in this blog http://karthur.org/2021/fast-bitflag-unpacking-python.html. One is from a stackoverflow post, and the final is the one I settled on for unpackqa.

It's important to run test on larger sized arrays which can be seen when processing whole scenes (a landsat 8 scence can be approx. 7000x7000 pixels). In my own testing with different methods I found that reshaping adds most of the
processing time, and memory usage, to unpacking larger arrays. So the unpackqa method avoids this.

import numpy as np
from unpackqa.tools.unpackbits import unpackbits
from functools import partial

import timeit
import pandas as pd
import seaborn as sns

# Option 1: NumPy's Binary Representation
dec2bin_numpy = np.vectorize(partial(np.binary_repr, width = 8))

# Option 2: Python's Built-in Binary Conversion
dec2bin_py = np.vectorize(lambda v: bin(v).lstrip('0b').rjust(8, '0'))

# Option 3: Bitwise Operators
def dec2bin_bitwise(x):
    'For a 2D NumPy array as input'
    shp = x.shape
    return np.fliplr((x.ravel()[:,None] & (1 << np.arange(8))) > 0)\
        .astype(np.uint8).reshape((*shp, 8))

def dec2bin_bitwise2(x):
    'For a 2D NumPy array as input'
    shp = x.shape
    return ((x.ravel()[:,None] & (1 << np.arange(7, -1, -1))) > 0)\
        .astype(np.uint8).reshape((*shp, 8))
        
# Option 4: NumPy's unpackbits()
def dec2bin_unpack(x, axis = None):
    'For an arbitrary NumPy array a input'
    axis = x.ndim if axis is None else axis
    return np.unpackbits(x[...,None], axis = axis)[...,-8:]

# the quickest method from https://stackoverflow.com/a/51509307/6615512
# can unpack an arbitrary bit length.
def unpackbits_so(x, num_bits=8):
    xshape = list(x.shape)
    x = x.reshape([-1, 1])
    mask = 2**np.arange(num_bits, dtype=x.dtype).reshape([1, num_bits])
    return (x & mask).astype(bool).astype(int).reshape(xshape + [num_bits])


# unpackqa package unpack method, hard coded to 8 bits
# found here: https://github.com/sdtaylor/unpackqa/blob/main/unpackqa/tools/unpackbits.py
dec2bin_unpackqa = lambda x: unpackbits(qa_array=x, num_bits=8)


method_list = {
#    'dec2bin_numpy' : dec2bin_numpy,    # these 2 are always the slowest
#    'dec2bin_py' : dec2bin_py,
    'dec2bin_bitwise' : dec2bin_bitwise,
    'dec2bin_bitwise2' : dec2bin_bitwise2,
    'dec2bin_unpack' : dec2bin_unpack,
    'dec2bin_SO_method': unpackbits_so,
    'unpackqa_method' : dec2bin_unpackqa
    }

#------------------------------------
# testing a small array
arr = np.arange(400).reshape((20,20)).astype(np.uint8)
for method_name, method in method_list.items():
    result = timeit.timeit(lambda: method(arr), number=1000) 
    print('{} - {}s'.format(method_name,round(result,3)))



#------------------------
# testing on larger arrays
# array sizes to test. These will end up being 
# (10,10), (100,100), etc.
array_sizes = [10,100,200,500,1000,2000,4000]

results = []
for s in array_sizes:
    arr = np.repeat([100], repeats=s*s).reshape((s,s)).astype(np.uint8)
    for method_name, method in method_list.items():
        result = timeit.timeit(lambda: method(arr), number=5)
        
        results.append(dict(
            method = method_name,
            array_size = s,
            time = result
            ))


results = pd.DataFrame(results)

sns.set_style(style="whitegrid")
sns.lineplot(x='array_size', y='time',hue='method', data=results, palette="tab10", linewidth=2.5)

Results

Here are results using a 20x20 array.

dec2bin_bitwise - 0.038s
dec2bin_bitwise2 - 0.026s
dec2bin_unpack - 0.009s
dec2bin_SO_method - 0.018s
unpackqa_method - 0.034s

Here are results with larger array sizes up to 4000x4000
image

new test qa test

Another table of tests is available for LANDSAT_47_C2_L2_SRCloudQA in the Landsat 4-7 C2L2 product guide, table 5-4.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.