Comments (4)
It is possible to see the search queries that the Python code sends to the ESGF REST API by setting the log level to debug with the following Python code before running your script:
import logging
logging.getLogger().setLevel(logging.DEBUG)
If you compare those to the query you're using with the web interface, it should be possible to find out why the number of results is different.
from esgf-pyclient.
Thanks for that help. I've added the debugging and got more details about the differences. But it still isn't clear to me what is causing the different behaviour. If I run my query on the web, I match a single dataset id CMIP6.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r11i1p1f1.Amon.pr.gr, data node esgf.ichec.ie, version 20200201. I added a print(results[0].dataset_id)
to my code snippet and I match dataset id CMIP6.ScenarioMIP.EC-Earth-Consortium.EC-Earth3.ssp370.r1i1p1f1.Amon.pr.gr.v20200310|esgf.ceda.ac.uk, which appears to be a later version of the dataset on CEDA rather than ichec.ie.
There's quite a few differences in the query that's being run. The browser interface produces:
https://esgf.ceda.ac.uk/esg-search/search/?offset=0&limit=10&type=Dataset&replica=false&latest=true&source_id=EC-Earth3&experiment_id=ssp370&variant_label=r11i1p1f1&table_id=Amon&variable_id=pr&mip_era=CMIP6&activity_id%21=input4MIPs&facets=mip_era%2Cactivity_id%2Cproduct%2Csource_id%2Cinstitution_id%2Csource_type%2Cnominal_resolution%2Cexperiment_id%2Csub_experiment_id%2Cvariant_label%2Cgrid_label%2Ctable_id%2Cfrequency%2Crealm%2Cvariable_id%2Ccf_standard_name%2Cdata_node
whereas my code produces:
DEBUG:urllib3.connectionpool:https://esgf.ceda.ac.uk:443 "GET /esg-search/search?format=application%2Fsolr%2Bjson&limit=50&distrib=false&offset=0&type=Dataset&latest=true&mip_era=CMIP6&source_id=EC-Earth3&experiment_id=ssp370&variant_label=r1i1p1f1&table_id=Amon&variable_id=pr HTTP/1.1" 200 4158
The main differences I can see are:
- I don't have replica=false. But when I try adding this to my program it doesn't match any dataset ids.
- I don't have activity_id!=input4MIPs - it isn't clear to me how I could specify this using the API?
- I don't have the null settings for all the other facets (institution_id etc). I guess I could try adding those to see if the behaviour is changed?
I've tried selecting various options on the web site (such as "show all replicas", "show all versions") but I haven't found any combination of options that finds the v20200310 of the dataset that my program matches.
from esgf-pyclient.
In general, it is to be expected that different search queries return different results.
When I run
from pyesgf.search import SearchConnection
conn = SearchConnection(
"https://esgf.ceda.ac.uk/esg-search", distrib=True)
ctx = conn.new_context(
mip_era="CMIP6", source_id="EC-Earth3", experiment_id="ssp370",
member_id="r1i1p1f1", table_id="Amon", variable_id="pr",
latest=True, replica=False)
results = ctx.search(batch_size=1000)
files = results[0].file_context().search()
print(len(files))
the result 86
is printed, which seems to match what you get from the web API.
I haven't found any combination of options that finds the v20200310 of the dataset that my program matches.
If you're having issues with the web search, I would recommend having a look at their tutorial and contacting the ESGF support mailinglist if that doesn't help.
from esgf-pyclient.
It seems to be the replica=False that is changing the results. Without that option I pick up a replica dataset at CEDA with a later version id that has fewer files. I'll tell the scientist who raised this problem to use replica=False.
from esgf-pyclient.
Related Issues (20)
- Failing CI build: parsing of HTTPResponse by defusedxml.ElementTree HOT 1
- No consistent result from different queries (CMIP6) HOT 1
- Spatial constraints
- Dependency problem (version mismatch between pyesgf and requests library) leading to AttributeError: module 'requests_cache' has no attribute 'core' HOT 5
- add "facets" keyword argument to DatasetResult.file_context
- New release? HOT 5
- Import trial of `MyProxyClient` in `pyesgf/logon.py` outputs misleading error and incompatibility with `cryptography` from Anaconda `main` channel HOT 2
- logon for http request HOT 2
- lm.logon timeout HOT 12
- Facets warning with aggregation_context(): unexpected keyword
- CMIP6 data availability? HOT 2
- logon does not allow access to all ESGF nodes HOT 13
- Unexpected number of results for large query
- logon refused from my laptop HOT 3
- Facets warning appears even when facets are specified HOT 5
- collection import errors from python 3.10 HOT 1
- [via MyProxyClient] Import of `MyProxyClient` fails due to complete retirement of `SafeConfigParser` in Python 3.12 HOT 2
- Fix `tests/test_results` test module
- 503 HTTPErrors should not be failing the tests
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esgf-pyclient.