holoviz-topics / earthml Goto Github PK
View Code? Open in Web Editor NEWTools for working with machine learning in earth science
Home Page: https://earthml.holoviz.org
License: BSD 3-Clause "New" or "Revised" License
Tools for working with machine learning in earth science
Home Page: https://earthml.holoviz.org
License: BSD 3-Clause "New" or "Revised" License
At the end of this page of the docs, the resampling section has a couple of bugs.
I think it should read:
res_1000 = 1000
x_1000 = np.arange(xmin, xmax, res_1000)
y_1000 = np.arange(ymin, ymax, res_1000)
diff_res_1000 = (
diff_regridded.groupby_bins("x", x_1000, labels=x_1000[:-1])
.mean(dim="x")
.groupby_bins("y", y_1000, labels=y_1000[:-1])
.mean(dim="y")
.rename(x_bins="x", y_bins="y")
)
diff_res_1000
I remember a comment made during the presentation that the UC Mercred was to large to include in the instructional data. That said it would be nice to include a note that the data can be downloded from the site, or possibly add a commented out download using intake. Also, since the data is so large, maybe include a small set that will get the examples working even if they will produce bad results.
Hi.
Is it possible that the original nee_data_repo is offline? Or did it move?
https://github.com/greyNearing/nee_data_fusion/
Also, I get a kernel reboot at the cell
metadata.hvplot.points('lon', 'lat', geo=True, color='vegetation',
height=420, width=800, cmap='Category20') * gts.OSM
but that might be my env.
Great notebooks!
Best,
C
In a recent meeting we (@ebo @jsignell @jbednar) came up with some new ideas for public labelled data that can be applied to public satellite imagery (which mostly implies LANDSAT data).
Good criteria for a task are that 1) all the data can be made public 2) the labelled features are big enough to spot with LANDSAT 3) the features can be easily spotted by a human to evaluate the ML performance. The two most promising suggestions were:
Using the National Inventory of Dams Database to mark dams on US imagery. This data has latitude/longitude data so the labels are points. There is one excel file per state and there are > 90k dams total.
Labelling lakes using the Global Lakes and Wetlands Database which is polygon data. The GLWD-2 dataset has > 250,000 polygons though this is a global database so I don't know how many fall in the US if we want to focus on that.
Another nice thing about these two datasets is that there is a good chance they are correlated with each other!
I'd like a better explanation of the motivation, and some domain knowledge to know what variables to exclude.
A task for the HoloViz group as part of the PangeoML project is to put together examples using the HoloViz tools supporting Geo-Machine Learning oriented workflows. And when need be, improve the existing tools to better support these workflows. Such content has already produced a couple of years ago part of the EarthML project.
Besides that task, in the Python ecosystem several projects are tackling a similar issue, i.e. producing and maintaining examples in this domain. This includes Project Pythia and the Pangeo Gallery. Members of this project have met in September 2022 to discuss about best practices and a way forward. One of the outcome of this discussion was that Project Pythia is the best equipped to accept and maintain examples (and that new developments are required to improve the workflow)
The tasks for the HoloViz groups are then:
cc @jbednar
crs = gv.util.proj_to_cartopy(landsat_8_da.crs)
generated
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-11-d99b01f392df> in <module>
----> 1 crs = gv.util.proj_to_cartopy(landsat_8_da.crs)
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\geoviews\util.py in proj_to_cartopy(proj)
453 proj = check_crs(proj)
454
--> 455 if proj.is_latlong():
456 return ccrs.PlateCarree()
457
AttributeError: 'Proj' object has no attribute 'is_latlong'
The landsat_spectral_clustering notebook updated in #12 uses a tif file missing crs information which results in lots of warnings. We should add one and update the S3 file with it to make the example cleaner.
In the EarthML project, we need to apply machine-learning tools like sklearn to multidimensional array data like xarrays and other data that doesn't fit naturally into sklearn's single-column input data format. Of course, arrays can be flattened before running the algorithm, then reshaped in the opposite way afterwards, as in Tom Augspurger's spectral clustering example.
However, doing so is awkward and error prone and likely to lose metadata like lat,lon coordinates, especially for more complicated multidimensional arrays with data that needs to be selected along some certain range of dimensions or sliced inside the array, and then restored to that range and slice afterwards for analysis and visualization. It's presumably especially painful and error prone if the dimensionality changed as a result of any of the sklearn operations (e.g. PCA).
Existing libraries to deal with these issues take one of two approaches:
These two approaches have very different implications, and it would be good if we can tease those out explicitly here before moving forward with a particular choice. They are also at very different stages of development, with phausamann/sklearn-xarray seeming further along than the rest, but I don't know how well any of them handle the end stages in this process (reshaping back to the original shape or some appropriately reduced version of it). @jlstevens, any thoughts?
I tried to update the repository today (with: git pull) and got:
"Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists."
I thought that I might have corrupted the copy on my thumb drive, so I moved it over and tried to re-clone it. I got the same error.
Probably something simple, but I do not think it is on my end.
Just wanted to track this issue dask/dask-ml#230 which was causing failures as described in #81
The original version of landsat_spectral_clustering.ipynb used redding.tif
which was obtained originally from planet.com. As I wasn't sure whether this image could be made available, I updated the notebook to use a landsat example taken from a datashader example.
@ebo then informed me by e-mail that this image is not really suitable as it only has two bands. We would like some example images that can be made public that also have a decent number of bands. This is important as we will then be able to compute the various indices that we might then want to learn on.
One suggestion by @ebo was to use these images of a disappearing lake although I only see a link to download them in JPG format?
Lastly, a new notebook has been committed referencing 'Midwest_Mosaic.tif' which I don't think we have discussed yet. Is this something we could slice down and add to the repo as an example?
While driving with my wife for the holidays we got to talking about some apps her and her field crews recently published in Applications in Plant Sciences. It got me to thinking of an example I started playing with one night which scrapes local forecast information for her field sites, so that they can receive site specific forecasts targeting when they expect to be at the field sites. Since they are often gone for a month or two at a time, these have to be regenerated on the fly...
Anyway, I reached out to the editor regarding copyright and got the following reply:
"...Regarding the copyright/license, articles published in APPS are published under a Creative Commons license. With Creative Commons licenses, the author retains copyright and the public is allowed to reuse the content. The author grants the Botanical Society of America a license to publish the article. ..."
is CC an acceptable copyright for one of the notebooks? I figure that I could work this up as an example for either EarthML or EarthSIM, and it could also be published with APPS.
I thought I would ask before spending much time on it as I can easily get it to the point where it is useful for my wife and her field crew, but it is an interesting offshoot to things I already discussed with weather/climate data collection.
training = intake.open_csv('../data/landsat*_training.csv')
worked fine, but
training = intake.open_csv('../data/landsat{version:d}_training.csv')
training_df = training.read()
training_df.head()
produced a value error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-6ade6c87b33e> in <module>
1 training = intake.open_csv('../data/landsat{version:d}_training.csv')
----> 2 training_df = training.read()
3 training_df.head()
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in read(self)
140
141 def read(self):
--> 142 self._get_schema()
143 return self._dataframe.compute()
144
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in _get_schema(self)
125
126 if self._dataframe is None:
--> 127 self._open_dataset(urlpath)
128
129 dtypes = self._dataframe._meta.dtypes.to_dict()
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in _open_dataset(self, urlpath)
116
117 # add the new columns to the dataframe
--> 118 self._set_pattern_columns(path_column)
119
120 if drop_path_column:
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in _set_pattern_columns(self, path_column)
73 col.cat.codes.map(dict(enumerate(values))).astype(
74 "category" if not _HAS_CDT else CategoricalDtype(set(values))
---> 75 ) for field, values in reverse_formats(self.pattern, paths).items()
76 }
77 self._dataframe = self._dataframe.assign(**column_by_field)
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\utils.py in reverse_formats(format_string, resolved_strings)
126 args = {field_name: [] for field_name in field_names}
127 for resolved_string in resolved_strings:
--> 128 for field, value in reverse_format(format_string, resolved_string).items():
129 args[field].append(value)
130
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\utils.py in reverse_format(format_string, resolved_string)
193
194 # get a list of the parts that matter
--> 195 bits = _get_parts_of_format_string(resolved_string, literal_texts, format_specs)
196
197 for i, (field_name, format_spec) in enumerate(zip(field_names, format_specs)):
C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\utils.py in _get_parts_of_format_string(resolved_string, literal_texts, format_specs)
41 if literal_text not in _text:
42 raise ValueError(("Resolved string must match pattern. "
---> 43 "'{}' not found.".format(literal_text)))
44 bit, _text = _text.split(literal_text, 1)
45 if bit:
ValueError: Resolved string must match pattern. '../data/landsat' not found.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.