Right now there is no standardized, user friendly mechanism for subsetting level3 or l

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Implement Level3 and Level4 subsetting logic about podaacpy HOT 14 OPEN

lewismc commented on June 16, 2024

Implement Level3 and Level4 subsetting logic

from podaacpy.

Comments (14)

lewismc commented on June 16, 2024

Thinking about L3 and L4 subsetting again.
Concerning only the variable names we would want to retrieve from Hyrax…
Consider the following DDX response.
Would we want to use the ‘standard_name’ e.g. ‘sea_surface_subskin_temperature’ or normal ‘name’ e.g. ‘sea_surface_temperature’? Which is it that we would use when calling the subset request from Hyrax?

<?xml version="1.0"?>
<Array name="sea_surface_temperature">
    <Attribute name="long_name" type="String">
        <value>sea surface sub-skin temperature</value>
    </Attribute>
    <Attribute name="standard_name" type="String">
        <value>sea_surface_subskin_temperature</value>
    </Attribute>
    <Attribute name="units" type="String">
        <value>K</value>
    </Attribute>
    <Attribute name="_FillValue" type="Int16">
        <value>-32768</value>
    </Attribute>
    <Attribute name="add_offset" type="Float32">
        <value>273.149994</value>
    </Attribute>
    <Attribute name="scale_factor" type="Float32">
        <value>0.00999999978</value>
    </Attribute>
    <Attribute name="valid_min" type="Int16">
        <value>-5000</value>
    </Attribute>
    <Attribute name="valid_max" type="Int16">
        <value>5000</value>
    </Attribute>
    <Attribute name="coordinates" type="String">
        <value>lon lat</value>
    </Attribute>
    <Attribute name="source" type="String">
        <value>REMSS AMSR2 L2B Version-8</value>
    </Attribute>
    <Attribute name="comment" type="String">
        <value>Microwave SST = approximately the top 1 milimeter</value>
    </Attribute>
    <Int16/>
    <dimension name="time" size="1"/>
    <dimension name="nj" size="4193"/>
    <dimension name="ni" size="243"/>
</Array>

from podaacpy.

lewismc commented on June 16, 2024

As it turns out, we want to be retrieving the
<Array name="sea_surface_temperature">
as there is no guarantee that standard_name or long_name are the variable names.

from podaacpy.

Omkar20895 commented on June 16, 2024

Hi @lewismc, can you please pull up some link that would help me retrieve OpeNDAP DDX response or maybe some API link? I would like to explore more. Is pydap one of the utilities to access the data? When I google it all I could find was some documentation links. Thanks.

from podaacpy.

lewismc commented on June 16, 2024

Hi @Omkar20895 yes one resides here. It's very simple XML.

from podaacpy.

Omkar20895 commented on June 16, 2024

Hi @lewismc,

I see from the attached xml data that the following are the list of variables in the data:

lat
lon
time
sea_surface_temperature
sst_dtime
dt_analysis
.
.
.
cloud_liquid_water
rain_rate

Correct me if I am getting something wrong. We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names. I will start working on it and write a prototype. Can you please give me a link of the api to call with the dataset name or id from the prototype function to get the response? Please let me know if you have any questions/concerns in the approach.

I will also look for some documentation on l2, l3 and l4 subsetting on PO.DAAC forums, I need to read more on this, honestly I forgot a lot of stuff, please suggest any documentation that you would think would be helpful to me.

Thanks.

from podaacpy.

lewismc commented on June 16, 2024

@Omkar20895 thanks for stepping up here.

We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names.

The only issue just now is that Podaac.dataset_variables function is only available for a handful of datasets... this means that, by enlarge level 3 and 4 subsetting is unavailable using the Webservices API. We need to be more creative in the implementation!

I think we need to do as follows

Edit the function called 'dataset_variables` to do the following

Execute a granule_search (because we can only obtain a DDX for an OPeNDAP granule) e.g.

p.granule_search(dataset_id='PODAAC-GHGMB-3CO02', start_time='2019-02-12T01:30:00Z', end_time='2019-02-012T01:30:00Z')

this will return an atom XML response which include the OPeNDAP URL as follows

<entry>
...
   <link href="https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.html" rel="enclosure" title="OPeNDAP URL" type="text/html"/>

From that we can substitute the trailing .html for the .ddx suffix we want. This will allow us to retrieve the following https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.ddx
We can then parse our the variables from that XML response and return them to the user as a list.

Add a new function called subset_L3_L4_granules()

Essentially here we design the function as follows

subset_L3_L4_granules(dataset_id='', short_name='', start_time='', end_time='', bbox='', path='', variables'')

This allows us to essentially execute a granule_search, extract the OPeNDAP URL and then to execute the OPeNDAP request with all of the parameters. The response can be saved to wherever path is defined.

Does this make sense?

from podaacpy.

Omkar20895 commented on June 16, 2024

@lewismc yes, it makes sense to me.

I have a question, the present example that you have mentioned above returns only one entry because it is subsetted using start time and end time. I tried removing start and end times, it returned multiple entries of the dataset and the set of variables in all the entries are common.

For example, I used:

p.granule_search(dataset_id='PODAAC-GHGMB-3CO02')

then replaced .html of each entry with .ddx and observed the set of variables for each entry.

I see that the entries are basically time series datasets, measuring the same set of variables at different time instances. But still, Is there a case where different entries have different variables?

from podaacpy.

lewismc commented on June 16, 2024

In short no that is not a scenario I think is possible. The contents of the granule data products is consistent. It’s only the actual sensor observation values which change. Thanks for looking at this.

from podaacpy.

Omkar20895 commented on June 16, 2024

Hi @lewismc, I have one last question, please bear with me here. Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)? The presence of both Array and Grid tag was not common, in most of the cases the .ddx response has Array tag, but there are some datasets which have both Array and Grid tags, for example, the .ddx response for the dataset PODAAC-GHGMB-3CO02: here.

If they are associated with different levels what are other possible tags(except Array or Grid)?
Please let me know. Once we are clear on these, I can start writing a prototype.

Thanks.

from podaacpy.

lewismc commented on June 16, 2024

@Omkar20895

Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)?

No.

If you look at the following collapsed XML snippet you will see that the Grid child elements are the variables we are interested in. The top three Array elements define the structural dimensions for the Grid's

from podaacpy.

lewismc commented on June 16, 2024

So really, it is the Grid's which we are interested in extracting.

from podaacpy.

Omkar20895 commented on June 16, 2024

@lewismc I am almost done writing new code(rewriting the original dataset_variables function) instead of using the API provided by web services as it does not support all the datasets. But, this increases dependency since we are basically providing a workaround, for example, what if replacing .html with .ddx does not work in the future? Feel free to correct me if I am missing something.

Please let me know your thoughts on this, in the meanwhile I will send a pull request for review.

from podaacpy.

ShubhamShaswat commented on June 16, 2024

Hi,I like to help.So,going through comments and from my understanding the function 'dataset_variables' isn't working for all L3 and L4 datasets. The examples in the issue title as Updating dataset_variable to support L3 and L4 datasets #129 where Dataset id = PODAAC-SASSX-L3UCD deoesn't have OPENDAP URL links as per the the code line no 224
for link in dataset_links:
if(link.attrib['title'] == "OPeNDAP URL")
Therefore we get an empty dataset_url which gives the error

requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?

Do we want to handle this error or we want to find variables for this data using other methods?
If I am wrong please correct me

from podaacpy.

lewismc commented on June 16, 2024

Hi @ShubhamShaswat did you see the proposed solution at the following PR #129 (comment)

from podaacpy.

Implement Level3 and Level4 subsetting logic about podaacpy HOT 14 OPEN

Comments (14)

Edit the function called 'dataset_variables` to do the following

Add a new function called subset_L3_L4_granules()

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent