Giter VIP home page Giter VIP logo

Comments (14)

lewismc avatar lewismc commented on June 16, 2024

Thinking about L3 and L4 subsetting again.
Concerning only the variable names we would want to retrieve from Hyrax…
Consider the following DDX response.
Would we want to use the ‘standard_name’ e.g. ‘sea_surface_subskin_temperature’ or normal ‘name’ e.g. ‘sea_surface_temperature’? Which is it that we would use when calling the subset request from Hyrax?

<?xml version="1.0"?>
<Array name="sea_surface_temperature">
    <Attribute name="long_name" type="String">
        <value>sea surface sub-skin temperature</value>
    </Attribute>
    <Attribute name="standard_name" type="String">
        <value>sea_surface_subskin_temperature</value>
    </Attribute>
    <Attribute name="units" type="String">
        <value>K</value>
    </Attribute>
    <Attribute name="_FillValue" type="Int16">
        <value>-32768</value>
    </Attribute>
    <Attribute name="add_offset" type="Float32">
        <value>273.149994</value>
    </Attribute>
    <Attribute name="scale_factor" type="Float32">
        <value>0.00999999978</value>
    </Attribute>
    <Attribute name="valid_min" type="Int16">
        <value>-5000</value>
    </Attribute>
    <Attribute name="valid_max" type="Int16">
        <value>5000</value>
    </Attribute>
    <Attribute name="coordinates" type="String">
        <value>lon lat</value>
    </Attribute>
    <Attribute name="source" type="String">
        <value>REMSS AMSR2 L2B Version-8</value>
    </Attribute>
    <Attribute name="comment" type="String">
        <value>Microwave SST = approximately the top 1 milimeter</value>
    </Attribute>
    <Int16/>
    <dimension name="time" size="1"/>
    <dimension name="nj" size="4193"/>
    <dimension name="ni" size="243"/>
</Array>

from podaacpy.

lewismc avatar lewismc commented on June 16, 2024

As it turns out, we want to be retrieving the
<Array name="sea_surface_temperature">
as there is no guarantee that standard_name or long_name are the variable names.

from podaacpy.

Omkar20895 avatar Omkar20895 commented on June 16, 2024

Hi @lewismc, can you please pull up some link that would help me retrieve OpeNDAP DDX response or maybe some API link? I would like to explore more. Is pydap one of the utilities to access the data? When I google it all I could find was some documentation links. Thanks.

from podaacpy.

lewismc avatar lewismc commented on June 16, 2024

Hi @Omkar20895 yes one resides here. It's very simple XML.

from podaacpy.

Omkar20895 avatar Omkar20895 commented on June 16, 2024

Hi @lewismc,

I see from the attached xml data that the following are the list of variables in the data:

  • lat
  • lon
  • time
  • sea_surface_temperature
  • sst_dtime
  • dt_analysis
    .
    .
    .
  • cloud_liquid_water
  • rain_rate

Correct me if I am getting something wrong. We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names. I will start working on it and write a prototype. Can you please give me a link of the api to call with the dataset name or id from the prototype function to get the response? Please let me know if you have any questions/concerns in the approach.

I will also look for some documentation on l2, l3 and l4 subsetting on PO.DAAC forums, I need to read more on this, honestly I forgot a lot of stuff, please suggest any documentation that you would think would be helpful to me.

Thanks.

from podaacpy.

lewismc avatar lewismc commented on June 16, 2024

@Omkar20895 thanks for stepping up here.

We can use XML Xpaths using lxml.xtree/xpath module in the variable utility function to get the variable names.

The only issue just now is that Podaac.dataset_variables function is only available for a handful of datasets... this means that, by enlarge level 3 and 4 subsetting is unavailable using the Webservices API. We need to be more creative in the implementation!

I think we need to do as follows

Edit the function called 'dataset_variables` to do the following

Execute a granule_search (because we can only obtain a DDX for an OPeNDAP granule) e.g.

p.granule_search(dataset_id='PODAAC-GHGMB-3CO02', start_time='2019-02-12T01:30:00Z', end_time='2019-02-012T01:30:00Z')

this will return an atom XML response which include the OPeNDAP URL as follows

<entry>
...
   <link href="https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.html" rel="enclosure" title="OPeNDAP URL" type="text/html"/>

From that we can substitute the trailing .html for the .ddx suffix we want. This will allow us to retrieve the following https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L3C/GLOB/AVHRR_SST_METOP_B_GLB/OSISAF/v1/2019/042/20190212000000-OSISAF-L3C_GHRSST-SSTsubskin-AVHRR_SST_METOP_B_GLB-sstglb_metop01_20190212_000000-v02.0-fv01.0.nc.ddx
We can then parse our the variables from that XML response and return them to the user as a list.

Add a new function called subset_L3_L4_granules()

Essentially here we design the function as follows

subset_L3_L4_granules(dataset_id='', short_name='', start_time='', end_time='', bbox='', path='', variables'')

This allows us to essentially execute a granule_search, extract the OPeNDAP URL and then to execute the OPeNDAP request with all of the parameters. The response can be saved to wherever path is defined.

Does this make sense?

from podaacpy.

Omkar20895 avatar Omkar20895 commented on June 16, 2024

@lewismc yes, it makes sense to me.

I have a question, the present example that you have mentioned above returns only one entry because it is subsetted using start time and end time. I tried removing start and end times, it returned multiple entries of the dataset and the set of variables in all the entries are common.

For example, I used:

p.granule_search(dataset_id='PODAAC-GHGMB-3CO02')

then replaced .html of each entry with .ddx and observed the set of variables for each entry.

I see that the entries are basically time series datasets, measuring the same set of variables at different time instances. But still, Is there a case where different entries have different variables?

from podaacpy.

lewismc avatar lewismc commented on June 16, 2024

from podaacpy.

Omkar20895 avatar Omkar20895 commented on June 16, 2024

Hi @lewismc, I have one last question, please bear with me here. Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)? The presence of both Array and Grid tag was not common, in most of the cases the .ddx response has Array tag, but there are some datasets which have both Array and Grid tags, for example, the .ddx response for the dataset PODAAC-GHGMB-3CO02: here.

If they are associated with different levels what are other possible tags(except Array or Grid)?
Please let me know. Once we are clear on these, I can start writing a prototype.

Thanks.

from podaacpy.

lewismc avatar lewismc commented on June 16, 2024

@Omkar20895

Each XML tag(related to a variable) in the .ddx response either has Array or Grid as the tag name, are these in any way associated with different levels of datasets(l1, l2, l3)?

No.

If you look at the following collapsed XML snippet you will see that the Grid child elements are the variables we are interested in. The top three Array elements define the structural dimensions for the Grid's

screen shot 2019-02-22 at 1 58 00 pm

from podaacpy.

lewismc avatar lewismc commented on June 16, 2024

So really, it is the Grid's which we are interested in extracting.

from podaacpy.

Omkar20895 avatar Omkar20895 commented on June 16, 2024

@lewismc I am almost done writing new code(rewriting the original dataset_variables function) instead of using the API provided by web services as it does not support all the datasets. But, this increases dependency since we are basically providing a workaround, for example, what if replacing .html with .ddx does not work in the future? Feel free to correct me if I am missing something.

Please let me know your thoughts on this, in the meanwhile I will send a pull request for review.

from podaacpy.

ShubhamShaswat avatar ShubhamShaswat commented on June 16, 2024

Hi,I like to help.So,going through comments and from my understanding the function 'dataset_variables' isn't working for all L3 and L4 datasets. The examples in the issue title as Updating dataset_variable to support L3 and L4 datasets #129 where Dataset id = PODAAC-SASSX-L3UCD deoesn't have OPENDAP URL links as per the the code line no 224
for link in dataset_links:
if(link.attrib['title'] == "OPeNDAP URL")
Therefore we get an empty dataset_url which gives the error

requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?

Do we want to handle this error or we want to find variables for this data using other methods?
If I am wrong please correct me

from podaacpy.

lewismc avatar lewismc commented on June 16, 2024

Hi @ShubhamShaswat did you see the proposed solution at the following PR #129 (comment)

from podaacpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.