wswup / cloud-free-scene-counts Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 4.0 167 KB

Generate monthly cloud free Landsat scene counts

License: Apache License 2.0

Python 100.00%

cloud-free-scene-counts's People

Contributors

Stargazers

Watchers

Forkers

map-water markwbrown dgketchum virdi

cloud-free-scene-counts's Issues

Add ID type command line argument to make_quicklook_lists.py

To help support switching to using the full Landsat product ID (instead of scene ID) (see #5) , I think it would be useful to first add a command line argument to the make_quicklook_lists.py script so that the user can control which type of ID is used in the output files. This could be done with separate "--product_id"/"--scene_id" flags or something like "--id product"/"--id scene". We could initially default the script to returning the scene ID if not set, and then switch to product ID later on.

The WRS_PATH field doesn't exist in the CSV; erroneous

After filtering the metadata csv files for year 2014 pr 041r027,p041r028 then running make_quicklook_lists.py, the program errors. I removed the LANDSAT_TM_C.csv file (not filtered; "No data for target years(s), skipping file"), and it ran.

memory error (python 2.7.11 32-bit)

Ran into a memory error on metadata_csv_fitler.py here:

input_df = pd.read_csv(csv_path, parse_dates=[date_col])

I was running python 2.7 32-bit.

Tried with python 3.5 64-bit, and it ran fine.

Clear scene list for each year

Create a clear scene list file for each year and project area. Please.

metadata_csv_filter.py unexpected keyword argument 'csv_folder'

Looks like main() call is providing the csv argument, which is absent from the definition of main.

extra column in usecols_dtype

Looks like Pandas wasn't finding 'LANDSAT_SCENE_ID' in the csv.

I got this to run by commenting out scene_id_col:

dtype_cols = {
    acq_date_col: object,
    browse_col: object,
    browse_url_col: object,
    col_number_col: object,
    col_category_col: object,
    cloud_col: float,
    data_type_col: object,
    product_id_col: object,
    # scene_id_col: object,
    sensor_col: object,
    time_col: object,
    wrs2_path_col: int,
    wrs2_row_col: int,
    # elevation_col: float,
    # azimuth_col: float,
}

See traceback:

C:\cloud-free-scene-counts>python metadata_csv_filter.py -y 2015 -pr p041r027 -d

Filter/reducing Landsat Metadata CSV files
['acquisitionDate', 'browseAvailable', 'browseURL', 'CLOUD_COVER_LAND', 'COLLECTION_CATEGORY', 'COLLECTION_NUMBER', 'DATA_TYPE_L1', 'LANDSAT_PRODUCT_ID', 'LANDSAT_SCENE_ID', 'sensor', 'sceneStartTime', 'path', 'row']
Paths: 41
Rows: 27
WRS2 Tiles: p041r027
LANDSAT_8_C1.csv
Filtering by chunk
Traceback (most recent call last):
File "metadata_csv_filter.py", line 495, in
years=args.years, months=args.months, conus_flag=args.conus)
File "metadata_csv_filter.py", line 301, in main
usecols=list(dtype_cols.keys()), dtype=dtype_cols)):
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1749, in init
_validate_usecols_names(usecols, self.orig_names)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1134, in _validate_usecols_names
"columns expected but not found: {missing}".format(missing=missing)
ValueError: Usecols do not match columns, columns expected but not found: ['LANDSAT_SCENE_ID']

Needs functionality for Landsat 9

metadata_quicklook_download.py

Script doesn't look for csv metadata files with the "_filter" suffix. It'd be better if the script looked for filtered lists by default (maybe to the extent that the code would skip unfiltered csv files).

make_quicklook_list.py not counting first clear scene each year

When I run make_quicklook_list.py and output clear_scene_counts.txt it consistently misses counting the first clear scene of each year, but counts the remaining scenes for the path_row and year.

I'm not sure if it has to do with how the counts = defaultdict(dict) is structured (line 142), or the try and except routine (lines 148-151).

Any tips on getting this to work correctly?

Add a "landsat" command line argument for all scripts

This argument could be used to manually control which Landsat types are included in scene counts.

Scripts should limit which Landsat files are processed based on "years" parameter

For example, there isn't any reason to even unzip the Landsat 8 CSV if the user sets the years parameter to 2004.

error

I tried running the first batch of code and ran into some issues (python metadata_csv_download.py). The errors are listed bellow. I downloaded Anaconda per the instructions and am using Jupyter QtConsole to run the code.

SystemExit Traceback (most recent call last)
in ()
136
137 if name == 'main':
--> 138 args = arg_parse()
139
140 logging.basicConfig(level=args.loglevel, format='%(message)s')

in arg_parse()
126 '-d', '--debug', default=logging.INFO, const=logging.DEBUG,
127 help='Debug level logging', action='store_const', dest='loglevel')
--> 128 args = parser.parse_args()
129
130 if args.csv and os.path.isfile(os.path.abspath(args.csv)):

C:\ProgramData\Anaconda3_\lib\argparse.py in parse_args(self, args, namespace)
1731 if argv:
1732 msg = _('unrecognized arguments: %s')
-> 1733 self.error(msg % ' '.join(argv))
1734 return args
1735

C:\ProgramData\Anaconda3_\lib\argparse.py in error(self, message)
2387 self.print_usage(_sys.stderr)
2388 args = {'prog': self.prog, 'message': message}
-> 2389 self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

C:\ProgramData\Anaconda3_\lib\argparse.py in exit(self, status, message)
2374 if message:
2375 self._print_message(message, _sys.stderr)
-> 2376 _sys.exit(status)
2377
2378 def error(self, message):

SystemExit: 2

browseURL values are invalid

I emailed the USGS help desk and will update the issue if/when I hear back.

For reference here are examples of broken and working image URLs in case we need to change this in the code.
Broken: https://earthexplorer.usgs.gov/browse/landsat_8/2015/043/030/LC08_L1TP_043030_20150922_20170225_01_T1.jpg
Works: https://earthexplorer.usgs.gov/browse/landsat_8_c1/2015/043/030/LC08_L1TP_043030_20150922_20170225_01_T1.jpg

Broken: https://earthexplorer.usgs.gov/browse/etm/43/30/2015/LE07_L1TP_043030_20150322_20160906_01_T1_REFL.jpg
Works: https://earthexplorer.usgs.gov/browse/landsat_etm_c1/2015/043/030/LE07_L1TP_043030_20150322_20160906_01_T1.jpg

Broken: https://earthexplorer.usgs.gov/browse/tm/43/30/2000/LT05_L1TP_043030_20000624_20160918_01_T1_REFL.jpg
Works: https://earthexplorer.usgs.gov/browse/landsat_tm_c1/2000/043/030/LT05_L1TP_043030_20000624_20160918_01_T1.jpg

Add documentation on using API download script

Consider using the full Landsat Collection 1 Product ID as the scene identifier

For consistency, we should consider using the full Landsat Collection 1 Product ID (https://landsat.usgs.gov/sites/default/files/images/Scene_ProductID_compare-.jpg). One benefit of this is that it would be much easier to identify and download the Landsat images from the Google storage bucket since the naming would be identical.

One potential problem with doing this is that we could have multiple versions of images on the same date (with different processing dates), which may not be obvious or desirable. Another problem is that the images will not sort by date if the files are named with the product ID since the Landsat type is first.

It would be possible to maintain a lookup file to translate the thumbnail file names to the product IDs.

Input fields of metadata_quicklook_download.py have changed

It seems like some of the input fields have changed in the LANDSAT csv files. For example, the metadata_quicklook_download.py file is looking for the "ACQUISITION_DATE" column in the csv and that column is now "acquisitionDate." Additionally, I wasn't able to find the equivalent of the "WRS2_TILE" column in the new csv files that should have values like "p038r031."

metadata_csv_api.py -> api_csv_download.py
metadata_csv_download.py -> bulk_csv_download.py
metadata_csv_filter.py -> bulk_csv_filter.py
metadata_quicklook_download.py ->quicklook_download.py

make_quicklook_lists.py can probably stay the same for now.

parse_int_set not defined in make_quicklook_lists.py

parse_int_set is called on line 55 when the years input arg is specified:

It doesn't seem possible to re-run metadata_csv_filter.py (on a csv file that has already been filtered)

because the input and output field names are different. This could probably be fixed by not using "use_cols" when calling read_csv() and doing the field renaming in a for loop and try/except block.