Giter VIP home page Giter VIP logo

Comments (6)

pdeiml avatar pdeiml commented on August 18, 2024

I read through the code which makes the collection and the catalog and I have a few suggestions.

We have https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml which is intended to control which datasets (tev-*.yaml) go into the collection. But - as you can see in https://github.com/gammapy/gamma-cat/blob/master/gammacat/collection.py#L275 - it is not used to control it. The datasets which are being written to the collection are searched by going through the info.yaml files.
I think this is not the idea of https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml
Moreover, https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml is not used for the seds and lightcurves, too.

Therefore, my suggestion is the following:
We keep the CollectionMaker as it is, meaning that the list of leds, seds and datasets which go into the collection are produced by scanning through the info.yaml files and the folders, respectively (I don't see a reason why a sed, lc, dataset which exists in the input should not go to the collection).
To control which sed, lc, dataset goes to the catalog, we use https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml for the datasets and we write two new yaml-files gamma_cat_lightcurve.yaml and gamma_cat_sed.yaml.
Another way to implement it whould be to only have one file for the control of the catalog which could be formatted like this (the references are arbitrary, I only want to show the format):

  • source_id: 1
    status: source
    sed: 2013ApJ...764...38A
    lc: 2015ApJ...456..40H
    dataset: 2017A&A..56...40A

from gamma-cat.

cdeil avatar cdeil commented on August 18, 2024

@pdeiml - gamma_cat_dataset.yaml is the configuration file that controls the content of the catalog.

You can see that it's used from catalog.py here:
https://github.com/gammapy/gamma-cat/search?utf8=%E2%9C%93&q=gamma_cat_dataset.yaml+&type=

Of course, a fix for this issue and generally improvements to the catalog making / configuration are very welcome! If you take on this task, please make small pull requests that improve code / docs / content of the catalog that are easy and quick to review for me. (and not a large restructuring / rewrite, together with a fix for this issue)

Since it seems it wasn't clear to you how the catalog is built, starting by improving the contributor docs (and maybe renaming gamma_cat_dataset.yaml to gamma_cat_catalog_config.yaml or something) on that point would probably be the best first step.

For users, the most useful improvement (apart from fixing this issue which SED is filled in the catalog) would be to explain what's in the catalog here: https://gamma-cat.readthedocs.io/data/catalog.html

Again, I would suggest to make small improvements, e.g. just to link to https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_columns.yaml and say that those columns are available, or add a link to https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml and explain that this controls the content of the catalog.

from gamma-cat.

pdeiml avatar pdeiml commented on August 18, 2024

Well, gamma_cat_dataset.yaml is written down in catalog.py but if you take a closer look to the CatalogMaker, you see in https://github.com/gammapy/gamma-cat/blob/master/gammacat/catalog.py#L576 that not the gamma_cat_dataset.yaml in the input folder is used but the one in the output folder which is basically a sum of all output resources.
This means that in fact input/gamma_cat_dataset.yaml is not used by the catalog (I am sure that make.py will break as soon as I delete the file but it is not really used).

But what about my suggestion above with one single file for all types of data and that we copy the datasets by scanning through the folders?

from gamma-cat.

cdeil avatar cdeil commented on August 18, 2024

That sounds like a regression bug then. Initially when I wrote and used this, gamma_cat_dataset.yaml from the input folder was used. Can you start by making a PR fixing this?
In this case, please also update this output file (or all catalog output files) so that we can see what changes in the catalog:
https://github.com/gammapy/gamma-cat/blob/master/output/gammacat.yaml

It would also be good to have some checks, e.g. you could add a test_catalog.py here:
https://github.com/gammapy/gamma-cat/tree/master/gammacat/tests
which reads the catalog and puts some asserts to make sure it's filled OK.
Especially "regression tests", i.e. some information / number that is currently not filled correctly, is a good candidate to add an assert line.

But what about my suggestion above with one single file for all types of data and that we copy the datasets by scanning through the folders?

I don't understand. IMO the way it should work is that only input folder is scanned once to make an input index file. Then the output collection is made, starting with the input index. Then the catalog is made, from the output collection (via the output index), plus the config file what to put in the catalog. I thought it already worked that way; if not, please fix.

@pdeiml - I'm very busy this week, but if there's still questions, I'd prefer a 10 min call over Github back and forth.

from gamma-cat.

pdeiml avatar pdeiml commented on August 18, 2024

I will take a closer look to this but I think that is a regression bug, yes.

I totally agree with you about the procedure at the end of your comment but I think that is not working like that. I will look at this problem as well the next days.

from gamma-cat.

cdeil avatar cdeil commented on August 18, 2024

@pdeiml - Thank you!

from gamma-cat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.