Comments (14)
Thinking about L3 and L4 subsetting again.
Concerning only the variable names we would want to retrieve from Hyrax…
Consider the following DDX response.
Would we want to use the ‘standard_name’ e.g. ‘sea_surface_subskin_temperature’ or normal ‘name’ e.g. ‘sea_surface_temperature’? Which is it that we would use when calling the subset request from Hyrax?
<?xml version="1.0"?>
<Array name="sea_surface_temperature">
<Attribute name="long_name" type="String">
<value>sea surface sub-skin temperature</value>
</Attribute>
<Attribute name="standard_name" type="String">
<value>sea_surface_subskin_temperature</value>
</Attribute>
<Attribute name="units" type="String">
<value>K</value>
</Attribute>
<Attribute name="_FillValue" type="Int16">
<value>-32768</value>
</Attribute>
<Attribute name="add_offset" type="Float32">
<value>273.149994</value>
</Attribute>
<Attribute name="scale_factor" type="Float32">
<value>0.00999999978</value>
</Attribute>
<Attribute name="valid_min" type="Int16">
<value>-5000</value>
</Attribute>
<Attribute name="valid_max" type="Int16">
<value>5000</value>
</Attribute>
<Attribute name="coordinates" type="String">
<value>lon lat</value>
</Attribute>
<Attribute name="source" type="String">
<value>REMSS AMSR2 L2B Version-8</value>
</Attribute>
<Attribute name="comment" type="String">
<value>Microwave SST = approximately the top 1 milimeter</value>
</Attribute>
<Int16/>
<dimension name="time" size="1"/>
<dimension name="nj" size="4193"/>
<dimension name="ni" size="243"/>
</Array>
from podaacpy.
As it turns out, we want to be retrieving the
<Array name="sea_surface_temperature">
as there is no guarantee that standard_name or long_name are the variable names.
from podaacpy.
Hi @lewismc, can you please pull up some link that would help me retrieve OpeNDAP DDX response or maybe some API link? I would like to explore more. Is pydap one of the utilities to access the data? When I google it all I could find was some documentation links. Thanks.
from podaacpy.
Hi @Omkar20895 yes one resides here. It's very simple XML.
from podaacpy.
Hi @lewismc,
I see from the attached xml data that the following are the list of variables in the data:
- lat
- lon
- time
- sea_surface_temperature
- sst_dtime
- dt_analysis
.
.
. - cloud_liquid_water
- rain_rate
Correct me if I am getting something wrong. We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names. I will start working on it and write a prototype. Can you please give me a link of the api to call with the dataset name or id from the prototype function to get the response? Please let me know if you have any questions/concerns in the approach.
I will also look for some documentation on l2, l3 and l4 subsetting on PO.DAAC forums, I need to read more on this, honestly I forgot a lot of stuff, please suggest any documentation that you would think would be helpful to me.
Thanks.
from podaacpy.
@Omkar20895 thanks for stepping up here.
We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names.
The only issue just now is that Podaac.dataset_variables function is only available for a handful of datasets... this means that, by enlarge level 3 and 4 subsetting is unavailable using the Webservices API. We need to be more creative in the implementation!
I think we need to do as follows
Edit the function called 'dataset_variables` to do the following
Execute a granule_search (because we can only obtain a DDX for an OPeNDAP granule) e.g.
p.granule_search(dataset_id='PODAAC-GHGMB-3CO02', start_time='2019-02-12T01:30:00Z', end_time='2019-02-012T01:30:00Z')
this will return an atom XML response which include the OPeNDAP URL as follows
<entry>
...
<link href="https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.html" rel="enclosure" title="OPeNDAP URL" type="text/html"/>
From that we can substitute the trailing .html
for the .ddx
suffix we want. This will allow us to retrieve the following https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.ddx
We can then parse our the variables from that XML response and return them to the user as a list.
Add a new function called subset_L3_L4_granules()
Essentially here we design the function as follows
subset_L3_L4_granules(dataset_id='', short_name='', start_time='', end_time='', bbox='', path='', variables'')
This allows us to essentially execute a granule_search, extract the OPeNDAP URL and then to execute the OPeNDAP request with all of the parameters. The response can be saved to wherever path
is defined.
Does this make sense?
from podaacpy.
@lewismc yes, it makes sense to me.
I have a question, the present example that you have mentioned above returns only one entry because it is subsetted using start time and end time. I tried removing start and end times, it returned multiple entries of the dataset and the set of variables in all the entries are common.
For example, I used:
p.granule_search(dataset_id='PODAAC-GHGMB-3CO02')
then replaced .html of each entry with .ddx and observed the set of variables for each entry.
I see that the entries are basically time series datasets, measuring the same set of variables at different time instances. But still, Is there a case where different entries have different variables?
from podaacpy.
from podaacpy.
Hi @lewismc, I have one last question, please bear with me here. Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)? The presence of both Array and Grid tag was not common, in most of the cases the .ddx response has Array tag, but there are some datasets which have both Array and Grid tags, for example, the .ddx response for the dataset PODAAC-GHGMB-3CO02: here.
If they are associated with different levels what are other possible tags(except Array or Grid)?
Please let me know. Once we are clear on these, I can start writing a prototype.
Thanks.
from podaacpy.
Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)?
No.
If you look at the following collapsed XML snippet you will see that the Grid
child elements are the variables we are interested in. The top three Array
elements define the structural dimensions for the Grid
's
from podaacpy.
So really, it is the Grid
's which we are interested in extracting.
from podaacpy.
@lewismc I am almost done writing new code(rewriting the original dataset_variables function) instead of using the API provided by web services as it does not support all the datasets. But, this increases dependency since we are basically providing a workaround, for example, what if replacing .html with .ddx does not work in the future? Feel free to correct me if I am missing something.
Please let me know your thoughts on this, in the meanwhile I will send a pull request for review.
from podaacpy.
Hi,I like to help.So,going through comments and from my understanding the function 'dataset_variables' isn't working for all L3 and L4 datasets. The examples in the issue title as Updating dataset_variable to support L3 and L4 datasets #129 where Dataset id = PODAAC-SASSX-L3UCD deoesn't have OPENDAP URL links as per the the code line no 224
for link in dataset_links:
if(link.attrib['title'] == "OPeNDAP URL")
Therefore we get an empty dataset_url which gives the error
requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?
Do we want to handle this error or we want to find variables for this data using other methods?
If I am wrong please correct me
from podaacpy.
Hi @ShubhamShaswat did you see the proposed solution at the following PR #129 (comment)
from podaacpy.
Related Issues (20)
- Add API's for PO.DAAC Drive HOT 1
- Update release management policy to use twine HOT 6
- Python 2.6 master build failing HOT 2
- Maintain directory structure when downloading from PODAAC Drive
- Augment Drive.download_granules with 'root' parameter HOT 1
- Function "mine_drive_urls_from_granule_search()" Returns Empty List HOT 3
- Refactor for the deprecation of FTP? HOT 10
- Question about granule_search() from web services API HOT 4
- Replace xml.etree.ElementTree.fromstring functions with their defusedxml equivalents. HOT 3
- Implement integration tests for drive.py HOT 2
- Accessing CYGNSS L0 data with podaacpy HOT 9
- Add support for Remote Sensing Systems (RSS) HTTP data access HOT 1
- Add PodaacUtils.mine_drive_urls_from_granule_search function
- drive.Drive(file) attempts to read from erroneous 'tests' directory
- Change root branch from master to main HOT 1
- PodaacUtils().list_all_available_extract_granule_dataset_ids() returns None HOT 1
- Port to Rust and Expose Crate and FFI Python Module HOT 3
- Working with proxy HOT 2
- Error in the example file "Using Podaacpy to retrieve CYGNSS Level 3 Science Data.ipynb" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from podaacpy.