Giter VIP home page Giter VIP logo

Comments (4)

bouweandela avatar bouweandela commented on August 26, 2024

It is possible to see the search queries that the Python code sends to the ESGF REST API by setting the log level to debug with the following Python code before running your script:

import logging
logging.getLogger().setLevel(logging.DEBUG)

If you compare those to the query you're using with the web interface, it should be possible to find out why the number of results is different.

from esgf-pyclient.

EmmaHibling avatar EmmaHibling commented on August 26, 2024

Thanks for that help. I've added the debugging and got more details about the differences. But it still isn't clear to me what is causing the different behaviour. If I run my query on the web, I match a single dataset id CMIP6.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r11i1p1f1.Amon.pr.gr, data node esgf.ichec.ie, version 20200201. I added a print(results[0].dataset_id) to my code snippet and I match dataset id CMIP6.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r1i1p1f1.Amon.pr.gr.v20200310|esgf.ceda.ac.uk, which appears to be a later version of the dataset on CEDA rather than ichec.ie.

There's quite a few differences in the query that's being run. The browser interface produces:

https://esgf.ceda.ac.uk/esg-search/search/?offset=0&limit=10&type=Dataset&replica=false&latest=true&source_id=EC-Earth3&experiment_id=ssp370&variant_label=r11i1p1f1&table_id=Amon&variable_id=pr&mip_era=CMIP6&activity_id%21=input4MIPs&facets=mip_era%2Cactivity_id%2Cproduct%2Csource_id%2Cinstitution_id%2Csource_type%2Cnominal_resolution%2Cexperiment_id%2Csub_experiment_id%2Cvariant_label%2Cgrid_label%2Ctable_id%2Cfrequency%2Crealm%2Cvariable_id%2Ccf_standard_name%2Cdata_node

whereas my code produces:

DEBUG:urllib3.connectionpool:https://esgf.ceda.ac.uk:443 "GET /esg-search/search?format=application%2Fsolr%2Bjson&limit=50&distrib=false&offset=0&type=Dataset&latest=true&mip_era=CMIP6&source_id=EC-Earth3&experiment_id=ssp370&variant_label=r1i1p1f1&table_id=Amon&variable_id=pr HTTP/1.1" 200 4158

The main differences I can see are:

  1. I don't have replica=false. But when I try adding this to my program it doesn't match any dataset ids.
  2. I don't have activity_id!=input4MIPs - it isn't clear to me how I could specify this using the API?
  3. I don't have the null settings for all the other facets (institution_id etc). I guess I could try adding those to see if the behaviour is changed?

I've tried selecting various options on the web site (such as "show all replicas", "show all versions") but I haven't found any combination of options that finds the v20200310 of the dataset that my program matches.

from esgf-pyclient.

bouweandela avatar bouweandela commented on August 26, 2024

In general, it is to be expected that different search queries return different results.

When I run

from pyesgf.search import SearchConnection

conn = SearchConnection(
    "https://esgf.ceda.ac.uk/esg-search", distrib=True)
ctx = conn.new_context(
    mip_era="CMIP6", source_id="EC-Earth3", experiment_id="ssp370",
    member_id="r1i1p1f1", table_id="Amon", variable_id="pr",
    latest=True, replica=False)
results = ctx.search(batch_size=1000)
files = results[0].file_context().search()
print(len(files))

the result 86 is printed, which seems to match what you get from the web API.

I haven't found any combination of options that finds the v20200310 of the dataset that my program matches.

If you're having issues with the web search, I would recommend having a look at their tutorial and contacting the ESGF support mailinglist if that doesn't help.

from esgf-pyclient.

EmmaHibling avatar EmmaHibling commented on August 26, 2024

It seems to be the replica=False that is changing the results. Without that option I pick up a replica dataset at CEDA with a later version id that has fewer files. I'll tell the scientist who raised this problem to use replica=False.

from esgf-pyclient.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.