Giter VIP home page Giter VIP logo

Comments (13)

gonzaponte avatar gonzaponte commented on July 29, 2024

in general how to deal with merging files with repeated event_id. This issue is certainly present when mixing several MC productions and we still dont have a good solution for it.

How about adding a sub_event_id column? We would need to add some stuff to deal with MC and data separately and transparently, but it sounds feasible.

from ic.

jmalbos avatar jmalbos commented on July 29, 2024

Dealing with #693 we run into a problem of merging several nexus files information of the configuration table. Seems that it makes sense to have a configuration information (such as Geometry/Physics used to generate files) unique for all concatenated files, however, random_seed is a per-file information that needs to be saved and we are not sure what is the best way to save it. Maybe the best option is to have another table that will match event numbers and random seed?

There may be cases (e.g. mixing of background events from different sources) in which not only the random seed but also other configuration parameters could be different. It may be simpler to copy all the configuration tables, labelling them with a subrun tag (or something similar).

from ic.

andLaing avatar andLaing commented on July 29, 2024

I think that this is something that needs to be solved in the medium term but I'd propose a staged approach since we really need to get PR #693 completed so that the integration of the detector simulation code isn't delayed too much.

I think we need to come up with a more general solution (please keep suggesting here) but I'd propose a minimal protection in PR #693 so that the configuration info isn't confusing/clashing (still not that simple, really) and that we deal with it in a more complete way in the PRs related with event splitting etc. I think that the favoured production paradigm needs to remain what we've used up til now in the short term -- processing single files per job.

A patch could be adding file number/name in the param_keys somewhere in the configuration information and adding a check for overlap in the merging of other MC tables.

Thoughts?

from ic.

jmalbos avatar jmalbos commented on July 29, 2024

How about adding a sub_event_id column? We would need to add some stuff to deal with MC and data separately and transparently, but it sounds feasible.

This would handle well the event splitting in detsim, and events from the same MC production (which have by construction different event ids) could be processed with no issue as part of the same run.

A possible problem would be the event mixing from different MC productions, e.g. what was being done for the mixing of different background sources (@msorel, @paolafer: are we going to continue doing this?). Possible solutions include:

  • Mixing only events from a NEXUS production in which we've ensured that the events ids are unique.
  • Mixing the events outside IC, renumbering them as needed.
  • Handling in the processing the unique combination of run id, event id and subevent id. This would allow as well merging files from different data runs.

from ic.

msorel avatar msorel commented on July 29, 2024

How about adding a sub_event_id column? We would need to add some stuff to deal with MC and data separately and transparently, but it sounds feasible.

This would handle well the event splitting in detsim, and events from the same MC production (which have by construction different event ids) could be processed with no issue as part of the same run.

A possible problem would be the event mixing from different MC productions, e.g. what was being done for the mixing of different background sources (@msorel, @paolafer: are we going to continue doing this?). Possible solutions include:

  • Mixing only events from a NEXUS production in which we've ensured that the events ids are unique.
  • Mixing the events outside IC, renumbering them as needed.
  • Handling in the processing the unique combination of run id, event id and subevent id. This would allow as well merging files from different data runs.

We will continue mixing MC events, yes. In case it is relevant for this discussion: until now we have mixed events at the nexus level, and went through al other processing steps only for the mixed files and not for the single-source files. We want to change this, allowing for mixing files at different stages of processing. We have not decided yet if mixing post-irene, post-esmeralda or what.

Concerning your possible ways to tackle this, Justo. Renumbering events would be fine (and could be done outside IC, as right now still) if information were not dropped from one processing step to other, but only added, so that effectively you would never need to go back to a previous processing step, as you have all information available at the end. But this is not the case: we drop information. If we renumber events, we cannot relate events in different processing steps anymore. So at first thought I would vote against option 2.

Option 1 should work: some sort of script that runs over nexus output directories, and raises a flag if an event_id is repeated? Option 3 too, I guess, but sounds like it requires some more gymnastics? Beyond these three, perhaps there are more elegant ways to deal with this.

from ic.

paolafer avatar paolafer commented on July 29, 2024
  • Mixing only events from a NEXUS production in which we've ensured that the events ids are unique.

This is what we're doing now: we're doing the gymnastic of not having repeated event IDs across the full background production, is that right, @msorel ?

About the configuration table when merging files, I agree with Justo that there may be more parameters that differ from one file to another one and we may not be able to foresee them now. I understand that if we saved the configuration table of each file, we should have a column somewhere that relates a specific event ID to its correct table. Maybe it would be useful to add another table to the file, that deals with all the information that we need event by event. This table would be useful also to simplify the reading of event IDs in the MC readers (see #693).

from ic.

andLaing avatar andLaing commented on July 29, 2024

I was thinking about the implementation necessary for the long event splitting and I kept hitting mental or physical blocks. The idea to put a sub-event number made sense but I hit a problem when I got to the output of irene where the pmaps are indexed according to event number. Without adding quite a lot of complexity I couldn't think of a way to get the output non-repeating.

The only other, suboptimal, idea I had was to make the event number a float (either for pmaps and above or in general) and make the sub-event be the first decimal.

from ic.

gonzaponte avatar gonzaponte commented on July 29, 2024

when I got to the output of irene where the pmaps are indexed according to event number

What should Irene do: merge subevents into a single event or store each subevent separately?

Without adding quite a lot of complexity I couldn't think of a way to get the output non-repeating.

If Irene merges subevents into a single one this complexity should go away, I think.

The only other, suboptimal, idea I had was to make the event number a float (either for pmaps and above or in general) and make the sub-event be the first decimal.

I also thought of that, it is not terrible, but I agree it is not optimal. I don't know if we can also encounter precision problems...

from ic.

andLaing avatar andLaing commented on July 29, 2024

What should Irene do: merge subevents into a single event or store each subevent separately?

No, nexus simulates all the activity coming from, for example, a muon and records the times that energy was deposited or sensors recorded photons. In detsim we want to be able to recognise events which would be two or more triggers in the detector and split accordingly into subevents. These subevents need to be treated as independent entities by the processing as in data that would be the case. We come into difficulties with indexing though.

from ic.

andLaing avatar andLaing commented on July 29, 2024

I started to have a look at a version of IC that could read events with (evt_number, subevt_number). It's a bit fiddly but it might be an ok starting point. Have a look if you can: https://github.com/andLaing/IC/tree/new-run-table

from ic.

andLaing avatar andLaing commented on July 29, 2024

Hi everyone. I recently came back to thinking about this issue as I'm starting to hit some walls (semi)related to this in the analysis of cosmogenic backgrounds. The attempt I made to solve the problem (in previous comment) involved a lot of changes to IC and was quite fiddly. I thought about some possible alternatives, it'd be good to have some comments on them or other suggestions which could be better. The two possible alternatives I came up with yesterday were:

  1. Keep the nexus event number and add a subevent number (basically what I tried above)
  • Pros: Implementation in detsim/bufferization is simple

  • Cons: Doesn't necessarily solve all possible issues with merging files, requires a lot of underlying changes to IC.

  1. Generate a new event number in detsim/bufferization. Structured, for example, as 'run code'0'file number'0'generator code'0'nexus event number'. The run and generator codes could come from some convention or an enum (for generator) or even an IC tag number or production date.
  • Pros: Basically all the complexity is taken by detsim/bufferization, with a bit of work should be close to unique.

  • Cons: Probably requires changing the MC tables event number in detsim/bufferization (event_mapping table could link backwards?), depending on what structure for the number is chosen could still lead to possible overlap.

I currently favour option 2 but that could just be that it's newer and it should cause less upstream issues. Please comment @mmkekic , @paolafer , @jmalbos , @gonzaponte , @jjgomezcadenas and all.

from ic.

msorel avatar msorel commented on July 29, 2024

Hi @andLaing , without having thought too hard on it, looks also to me that 2 is better, so that code downstream is untouched. Basically anything that falls outside the event time window defined in detsim/bufferization gets a new event id.

I am not sure I understand what you mean by "structured as 000", though. Can you explain?

from ic.

andLaing avatar andLaing commented on July 29, 2024

Sorry, the example didn't render, I've fixed it in the original comment.

from ic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.