wswup / cloud-free-scene-counts Goto Github PK
View Code? Open in Web Editor NEWGenerate monthly cloud free Landsat scene counts
License: Apache License 2.0
Generate monthly cloud free Landsat scene counts
License: Apache License 2.0
To help support switching to using the full Landsat product ID (instead of scene ID) (see #5) , I think it would be useful to first add a command line argument to the make_quicklook_lists.py script so that the user can control which type of ID is used in the output files. This could be done with separate "--product_id"/"--scene_id" flags or something like "--id product"/"--id scene". We could initially default the script to returning the scene ID if not set, and then switch to product ID later on.
After filtering the metadata csv files for year 2014 pr 041r027,p041r028 then running make_quicklook_lists.py, the program errors. I removed the LANDSAT_TM_C.csv file (not filtered; "No data for target years(s), skipping file"), and it ran.
Ran into a memory error on metadata_csv_fitler.py here:
input_df = pd.read_csv(csv_path, parse_dates=[date_col])
I was running python 2.7 32-bit.
Tried with python 3.5 64-bit, and it ran fine.
Create a clear scene list file for each year and project area. Please.
Looks like main() call is providing the csv argument, which is absent from the definition of main.
Looks like Pandas wasn't finding 'LANDSAT_SCENE_ID' in the csv.
I got this to run by commenting out scene_id_col:
dtype_cols = {
acq_date_col: object,
browse_col: object,
browse_url_col: object,
col_number_col: object,
col_category_col: object,
cloud_col: float,
data_type_col: object,
product_id_col: object,
# scene_id_col: object,
sensor_col: object,
time_col: object,
wrs2_path_col: int,
wrs2_row_col: int,
# elevation_col: float,
# azimuth_col: float,
}
See traceback:
C:\cloud-free-scene-counts>python metadata_csv_filter.py -y 2015 -pr p041r027 -d
Filter/reducing Landsat Metadata CSV files
['acquisitionDate', 'browseAvailable', 'browseURL', 'CLOUD_COVER_LAND', 'COLLECTION_CATEGORY', 'COLLECTION_NUMBER', 'DATA_TYPE_L1', 'LANDSAT_PRODUCT_ID', 'LANDSAT_SCENE_ID', 'sensor', 'sceneStartTime', 'path', 'row']
Paths: 41
Rows: 27
WRS2 Tiles: p041r027
LANDSAT_8_C1.csv
Filtering by chunk
Traceback (most recent call last):
File "metadata_csv_filter.py", line 495, in
years=args.years, months=args.months, conus_flag=args.conus)
File "metadata_csv_filter.py", line 301, in main
usecols=list(dtype_cols.keys()), dtype=dtype_cols)):
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1749, in init
_validate_usecols_names(usecols, self.orig_names)
File "C:\Users\justclickok\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1134, in _validate_usecols_names
"columns expected but not found: {missing}".format(missing=missing)
ValueError: Usecols do not match columns, columns expected but not found: ['LANDSAT_SCENE_ID']
Script doesn't look for csv metadata files with the "_filter" suffix. It'd be better if the script looked for filtered lists by default (maybe to the extent that the code would skip unfiltered csv files).
When I run make_quicklook_list.py and output clear_scene_counts.txt it consistently misses counting the first clear scene of each year, but counts the remaining scenes for the path_row and year.
I'm not sure if it has to do with how the counts = defaultdict(dict) is structured (line 142), or the try and except routine (lines 148-151).
Any tips on getting this to work correctly?
This argument could be used to manually control which Landsat types are included in scene counts.
For example, there isn't any reason to even unzip the Landsat 8 CSV if the user sets the years parameter to 2004.
I tried running the first batch of code and ran into some issues (python metadata_csv_download.py). The errors are listed bellow. I downloaded Anaconda per the instructions and am using Jupyter QtConsole to run the code.
SystemExit Traceback (most recent call last)
in ()
136
137 if name == 'main':
--> 138 args = arg_parse()
139
140 logging.basicConfig(level=args.loglevel, format='%(message)s')
in arg_parse()
126 '-d', '--debug', default=logging.INFO, const=logging.DEBUG,
127 help='Debug level logging', action='store_const', dest='loglevel')
--> 128 args = parser.parse_args()
129
130 if args.csv and os.path.isfile(os.path.abspath(args.csv)):
C:\ProgramData\Anaconda3_\lib\argparse.py in parse_args(self, args, namespace)
1731 if argv:
1732 msg = _('unrecognized arguments: %s')
-> 1733 self.error(msg % ' '.join(argv))
1734 return args
1735
C:\ProgramData\Anaconda3_\lib\argparse.py in error(self, message)
2387 self.print_usage(_sys.stderr)
2388 args = {'prog': self.prog, 'message': message}
-> 2389 self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
C:\ProgramData\Anaconda3_\lib\argparse.py in exit(self, status, message)
2374 if message:
2375 self._print_message(message, _sys.stderr)
-> 2376 _sys.exit(status)
2377
2378 def error(self, message):
SystemExit: 2
I emailed the USGS help desk and will update the issue if/when I hear back.
For reference here are examples of broken and working image URLs in case we need to change this in the code.
Broken: https://earthexplorer.usgs.gov/browse/landsat_8/2015/043/030/LC08_L1TP_043030_20150922_20170225_01_T1.jpg
Works: https://earthexplorer.usgs.gov/browse/landsat_8_c1/2015/043/030/LC08_L1TP_043030_20150922_20170225_01_T1.jpg
Broken: https://earthexplorer.usgs.gov/browse/etm/43/30/2015/LE07_L1TP_043030_20150322_20160906_01_T1_REFL.jpg
Works: https://earthexplorer.usgs.gov/browse/landsat_etm_c1/2015/043/030/LE07_L1TP_043030_20150322_20160906_01_T1.jpg
Broken: https://earthexplorer.usgs.gov/browse/tm/43/30/2000/LT05_L1TP_043030_20000624_20160918_01_T1_REFL.jpg
Works: https://earthexplorer.usgs.gov/browse/landsat_tm_c1/2000/043/030/LT05_L1TP_043030_20000624_20160918_01_T1.jpg
For consistency, we should consider using the full Landsat Collection 1 Product ID (https://landsat.usgs.gov/sites/default/files/images/Scene_ProductID_compare-.jpg). One benefit of this is that it would be much easier to identify and download the Landsat images from the Google storage bucket since the naming would be identical.
One potential problem with doing this is that we could have multiple versions of images on the same date (with different processing dates), which may not be obvious or desirable. Another problem is that the images will not sort by date if the files are named with the product ID since the Landsat type is first.
It would be possible to maintain a lookup file to translate the thumbnail file names to the product IDs.
It seems like some of the input fields have changed in the LANDSAT csv files. For example, the metadata_quicklook_download.py file is looking for the "ACQUISITION_DATE" column in the csv and that column is now "acquisitionDate." Additionally, I wasn't able to find the equivalent of the "WRS2_TILE" column in the new csv files that should have values like "p038r031."
Chris was having a problem and reported it to the USGS. I started looking into this and I am wondering if it is a result of the download station field have extra commas that are not being parsed correctly.
The raw CSV files will have path and row columns called "path" and "row", whereas the filtered file should have "WRS_PATH" and "WRS_ROW".
I'm thinking something like the following:
make_quicklook_lists.py can probably stay the same for now.
because the input and output field names are different. This could probably be fixed by not using "use_cols" when calling read_csv() and doing the field renaming in a for loop and try/except block.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.