hsf / documents Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 29.0 73.8 MB

Repository for HSF documents (e.g. technical notes)

TeX 99.22% Makefile 0.03% Python 0.75%

documents's Issues

Licensing TN: finalize contents

Before making the Licensing TN final, a few improvements to the contents are needed:

Add a reference to LLVM discussion in the difficulty to change licenses and the specific constraints of collaboration with commercial contributors
Add non CERN examples
Put a reference to NSF and DOE policy (see Liz's email from Oct. 13)

Rediscuss organization of this directory

Issue dedicated to a topic started by @davidlange6 in #105:

how are people cloning this documents repository? The entire thing just to get one of the N documents?
Eventually we will all care when there are enough papers there. Should think how to organize for the future.

Current repository remains small (75 MB) but potentially a problem if we continue to add PDFs and other binary files to the repo. Should we move to LFS (LFS is a standard component in Git v2 if I'm right) for these files?

Author processing script adding \bigskip

@jouvin I see that with your last changes to a2tex.py you added a \bigskip to the output between each line of the institutions. This seems like very unnecessary space and it makes the institution list grow from 3.5 pages to 6.

Was there a good reason for this or can we revert?

P.S. I reviewed this, so I am also to blame for not noticing!

Technical note on software lacks LaTeX source

The published version of the technical note on software project best practices is evidently generated from a LaTeX version of the document. However, all that is in GitHub is the original Markdown.

Can either of @hegner or @drbenmorgan remember how we did this change or where the LaTeX is?

I would propose that after accepting the recent change in #148 we just convert the master document to LaTeX (with pandoc) and consider the MD obsolete.

What do you think?

Some issues with JHEP output

I found that the JHEP author output made a few mistakes - there was no separator between the author's names and the order was wrong (should be "Forename Surname"). Getting the commas and final "and Author X" correct is quite fiddly.

There is a fix for that in #72.

One other issue though is that no footnotes are printed in the JHEP styled output. Most papers like to at least identify the editors and other authors will want their grants acknowledged.

Potential author list updates for CWP doc

we asked for and got some feedback by checking the reco-trigger author list - these changes might be of interest for the full CWP list as well

EPFL: Institute of Physics, École Polytechnique F'ed'erale de Lausanne (EPFL), Lausanne, Switzerland

ETHZurich: ETH Z"urich - Institute for Particle Physics and Astrophysics (IPA), Z"urich, Switzerland

LMU: Fakult"at f"ur Physik, Ludwig-Maximilians-Universit"at M"unchen, M"unchen, Germany

MIT: Laboratory for Nuclear Science, Massachusetts Institute of Technology, Cambridge, MA, USA
(MIT is different from UMass)

at least some NIKHEF authors are affiliated with Vrije university not university of Amsterdam (ymmv here)
NIKHEF: Nikhef National Institute for Subatomic Physics and Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

UZurich: Physik-Institut, Universit"at Z"urich, Z"urich, Switzerland

Licensing TN: better address the license compatibility issues

In the first version of the TN, there is no explicit paragraph on license compatibility issues. @valassi provided a first draft for such a paragraph in #8. @sextonkennedy also suggested in #4 (#4 (comment)) that we add a reference to:
https://en.wikipedia.org/wiki/GPL_linking_exception
as this clause has some relevance for our case since many of us use a framework where dynamic loading of libraries is a core strategy in forming our large applications.

This TN revision should also address @valassi comment in #4 (depending of whether #8 is merged or not).

Licensing TN: Copyright by the employing organization?

I read the note on licensing and found it helpful. However, one case is not quite clear to me and perhaps it can be clarified in the "Recommendations" section. I imagine a common situation when a software developer is employed by a non-profit organization such as a national lab or a university and funded by a government agency like DOE. In such case, should the organization and/or the agency be mentioned in the copyright notice as an owner along with the actually contributing authors? Is there a common rule for that or it should be decided on a case by case basis? I would be interested to know how it applies to the US.

CWP version for CSBS journal: comments from reviewers

This is the comments received by email on Sept. 5 turned into a task list for easier tracking. Be sure to update appropriately the Google Docs-based draft answer to CSBS reviewers before marking a task done.*

Introduction

line 141: I suggest to exchange the word "contributors" to "stakeholders" + I would suggest to start a new bullet point before "by the effective"
line 201: please provide a reference for the "20 million lines of code"
line 296: please provide a reference for "AArch64 may achieve lower power costs..."
line 299: please provide a reference or a more specific name/example for "More extreme is an architecture that would see specialised..."
line 377: please provide examples incl. references for "which developed novel and major new technologies..."

Section 3.2:

line 684: The first action item for detector simulation seems out of place. While it is desirable to extend the validity of the physics modeling towards the FCC, this is not really a computing issue, nor does it impact the speed of simulations that will be necessary for the HL-LHC, except if by making it more accurate the result is slower code. If I go to the section on current practices, this is a bit better targeted towards improving accuracy AND efficiency. My suggestion is to at least mention software performance as a goal in this bullet.
lines 715-727: it is striking that this section (and section 3.1) discusses human resources, while the following sections do not. It might be more impactful to isolate all human resource discussions to section 4.

Section 3.3:

line 1103: remove the double "the"
lines 1212-1219+1221: in lines 1212-1219 you speak of algorithms etc., but in 1221 you talk about educating physicists in modern coding practices - NO, these are two different things: algorithms and coding practise is like theoretical and experimental particle physics; both are connected but require different educational profiles. Concerning algorithms I advise that HEP stronger collaborates with computer science, which are experts in algorithms.
The last piece of the R&D program, l. 1320, is not very specific. It would be useful to sharpen up the deliverable here, rather than using an "e.g. charged particle tracking" and "a number of such efforts"

Section 3.4:

“Scope and Challenges”: I’m sure that HEP Data Management also adheres to the FAIR principles. This is a much used buzzword, but not using it here in this context might raise the question why. Hence, you should make a conscious decision whether or not to mention these FAIR principles.
line 1334: “quasi-real time”; perhaps better: “near real-time”
The first piece of the R&D "enable ... to be plugged in dynamically" could do with some more specificity. The bullet is very general. Is it in conflict with the 3rd to last about interacting and exchanging data?
line 1371: provide examples (incl. references) for "...emergence of new analysis tools coming from industry and open source projects..."
line 1386 ff.: This is also called “provenance”. You might want to use this term here.
lines 1476ff.: please provide references for each given example technology
line 1569: I would have expected more milestones, e.g. in the direction of uptake of the software (or at least an evaluation) mentioned above or integrating/interfacing them in existing environments

Section 3.5 (Machine Learning):

line 1636: “… most CPU intensive elements …”; GPUs are apparently well suited for at leas some of the ML algorithms. You might want to mention them here.
line 1710: “HSF IML”; for better readability, you might want to write “HEP Software Foundation IML” here.

Section 3.7:

line 2150: please provide a reference for "two orders of magnitude..."
lines 2153-2161: it would be good to have some "test projects" for SDNs in HEP, but nothing could be found later in the R&D programme from line 2242 ff.
lines 2238-2241: please update the numbers and the reference, which is from 2017
lines 2257 + 2301: will these working groups be part of HSF or somewhere else - in any case state something

Section 3.8:

line 2405: It is striking that this section adds a 2018 programme. Others only discuss 2020 and 2022. Is this an intentional difference?

Section 3.10:

lines 2746-2780: the whole Section should be improved. The challenge of R&D is not clear. Seems to be a minor activity compared to the other subsections in section 3.
line 2762: the text here is vague e.g. "the wg will also work towards a more convenient access... through a client-server interface", "...a service to deliver streamed event data would be designed". More details would be nice
line 2774: please provide a reference/link for the mentioned workshop

Section 3.11:

In contrast to previous sections, this R&D section does not have dates. Just short-term and long-term
line 2861: is it git or github - is there an effect of the recent purchase from Microsoft, please elaborate.

Section 3.12:

line 3291 please provide examples and/or a reference for "next-generation identity federations". There are some out there, e.g. bwIDM.

Section 3.13 (Security)

Line 3171: add data privacy. Data privacy: legal questions, as in unauthorised access to personal data. Data protection: avoiding unauthorised access.
line 3286: “… attributes published by each federation …”: Federations do not provide attributes. And attributes are not necessarily linked to authentication.
line 3298: “Although federated identity provides…”: X.509 is also a ferreted ID solution.
lines 3322+3333: please also give the long name for WISE and FIM4R
line 3333: CERN is not the only partner in AARC, also major Tier-1s also participate, e.g. KIT and NIKHEF. Please re-formulate/add this.

Section 4:

line 3432: what does "promoting specific champions in the field" mean here in detail, please provide examples. How can they be promoted?

Section 4.1:

lines 3447: please provide examples (+ a reference) from science where "users express their requirements and computer specialists implement solutions", otherwise this falls from heaven and this argumentation can not be used.

Section 4.2 and 4.3:

these are rather weak sections compared to the importance of the topic. After reading the sections 3.x I would have expected to also see a work programm and milestones, but nothing is here - a pity... You could generate ideas e.g. establish working groups on this topic, organise workshops, etc. but nothing is mentioned...not good.

These detailed comments where preceded by the following general remarks:

This document provides to the High Energy Physics community a clear status of the current software and computing practices which have been set along the last decade in order to achieve the exploitation of the data produced by the four detectors installed on the Large Hadron Collider at CERN. It also, and this is its main objective, proposes a program of work (PoW) in various identified areas that must be done to be able to successfully process the data that will be produced by the upgraded LHC (HL-LHC) in less than 10 years from now.

Reviewing the document was made more difficult due to the poor formatting of the manuscript. Therefore, some of the line numbers mentioned below might be wrong and we encourage the authors to seek a solution for the formatting problems before resubmission.

In addition, it seems that many/most of the references (only analysing the reference names) are from within the HEP community. A broader view should be taken, i.e. references from outside the HEP community should be considered.
 
The proposed program of work appears to be exhaustive from a technology point of view; at least with the current HEP community knowledge of the evolution of hardware and software that may occur during this period.  

Some general points were mentioned in the referee reports. These might be addressed in the Introduction and/or Conclusion  (or Section 2?):

Attention must be better paid to 
⁃ Carefully take into account the consequences on the computing facilities (Tiers 0 to 1) of such or such solution that may be raised from the POW and that may lead to extra cost. For instance, HPC or GP/GPU facilities if they provide more efficient (in CPU time) ways to do specific processing are much more expensive traditional HTC farms. 
⁃ extra cost that may be generated by Hybrid Computing especially if made of commercial resources .
⁃ not to produce solutions that may lead to the set up of dedicated hardware that then may not be mutualized with other experiments yet supported by the same funding agencies as HL-LHC .

Tier 1s must be part of these works at the earliest possible stage.  

Even though it is well understood that this document focuses on HEP computing challenges, the proposed program does not make any reference to the European or worldwide computing for research context. It is mandatory that all these works take into account the evolution of the e-infrastructures they rely on . This may require some closed interactions with bodies like EGI, OSG, EUDAT, PRACE, NRENs… This « collaborative » work must be more clearly identified within the relevant POW if not within a dedicated one.

Finally, the work package relative to Security and Authentication and Authorization Infrastructure purposes a sum-up of the most important actions expected to be achieved. As these issues are beyond the scope of the HEP community and depend on the various national security policy it is of first importance that this work package is handled at the very stage of the proposed roadmap

hsf / documents Goto Github PK

documents's People

Contributors

Stargazers

Watchers

Forkers

documents's Issues

Licensing TN: finalize contents

Rediscuss organization of this directory

Author processing script adding \bigskip

Technical note on software lacks LaTeX source

Some issues with JHEP output

Potential author list updates for CWP doc

Licensing TN: better address the license compatibility issues

Licensing TN: Copyright by the employing organization?

CWP version for CSBS journal: comments from reviewers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent