Giter VIP home page Giter VIP logo

etmc / tmlqcd Goto Github PK

View Code? Open in Web Editor NEW
32.0 25.0 47.0 28.42 MB

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.

Home Page: http://www.itkp.uni-bonn.de/~urbach/software.html

License: GNU General Public License v3.0

C 90.33% C++ 3.54% Python 0.28% Makefile 0.87% Shell 0.14% Perl 0.15% Lex 3.01% Pawn 0.49% Assembly 0.82% R 0.30% POV-Ray SDL 0.08%
hmc lqcd multigrid quda solver ddalphaamg qphix clover rhmc twisted

tmlqcd's People

Contributors

amabdelrehim avatar aniketsen avatar finkenrath avatar florian-burger avatar grodid avatar karljansen avatar kostrzewa avatar m-schroeck avatar marcogarofalo avatar martin-ueding avatar opene avatar palao avatar pittlerf avatar sbacchio avatar siebren avatar simone-romiti avatar sunpho84 avatar tschew avatar urbach avatar uwenger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tmlqcd's Issues

Remove configure script from repository

I have tentatively removed the configure script from the repository in my unit testing branch. A change in configure.in results in thousands of lines of changes in configure. While the configure script should remain in the tarball distribution I don't think a git repository is the right place for it. Are there any objections to this?

define debug levels

currently we do have the DebugLevel option, what it was never written down which type of message we want to have at which deubg level. Siebren, you thought about it already, didn't you? So maybe we can have sort of a list defining this, but right now I don't have a very good idea, how...!

LEMON writer fails

This is using LEMON from the git repository.

# Trajectory is accepted.
# Writing gauge field to .conf.tmp.
# Constructing LEMON writer for file .conf.tmp for append = 0
[LEMON] Node 0 reports in lemonWriteLatticeParallel:
    Could not write the required amount of data.
[LEMON] Node 1 reports in lemonWriteLatticeParallel:
    Could not write the required amount of data.
LEMON write error occurred with status = -5, while writing in gauge_write_binary.c!
[LEMON] Node 1 reports in lemonWriteRecordHeader:
    Writer not ready for header.
KILL_WITH_ERROR on node 1: Header writing error. Aborting
LEMON write error occurred with status = -5, while writing in gauge_write_binary.c!
[LEMON] Node 0 reports in lemonWriteRecordHeader:
    Writer not ready for header.
KILL_WITH_ERROR on node 0: Header writing error. Aborting
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 10128 on
node artemis exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[artemis:10126] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[artemis:10126] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
zsh: exit 1     mpirun -np 2 ./hmc_tm -f ../sample-input/sample-hmc0.input

Tag 5-1-6 is missing

The tmLQCD version currently describes itself as 5.1.6, yet there is no tag for this version in the repository. There should be a version tagged as 5.1.6, and we are currently developing 5.1.7, 5.2 or 6.

util/io.c contains gauge read functionality, can that be dropped

Inside util/io.c there exists a function

int read_lime_gauge_field_doubleprec(double * config, char * filename,
const int T, const int LX, const int LY, const int LZ) {

and the function:

int read_lime_gauge_field_singleprec(float * config, char * filename,
const int T, const int LX, const int LY, const int LZ){

I suspect both are to be removed, because this functionality can be found in the io/ directory. So unless I hear protests, I will do so.

Why .NOTPARALLEL: in Makefile.in?

Is there are reason why we have the .NOTPARALLEL in the Makefile? It really slows down the compilation, especially on machines with many cores where one could potentially compile 6 to 8 modules in parallel.

Definition of ALIGN and ALIGN_BASE

These preprocessor defines are currently done in sse.h, if some level of explicit SSE optimization is requested. This means, however, that any code that declares memory for potential use with SSE routines needs to include some #ifdef checks. Those could be skipped if these defines were done in some central location and were simply set to 0 if no SSE optimization was requested.

A corollary to this, is the special case of wanting alignment without wanting to use the manual SSE routines. With increasing compiler sophistication, there are indications that automatically generated SSE code is starting to outperform our current implementations. While this is in fact quite awesome, it does make a case for allowing for alignment -- still needed for optimal performance -- without necessarily activating the SSE routines themselves. This is currently impossible.

One possible location for alignment definitions if global.h. If we want alignment to be a separate option, maybe an inclusion in the configuration script and config.h might be more appropriate.

commit 9cec2dfa has timestamp in the future

I don't know what happened there and whether it will continue to be a problem... It seems like commit 9cec2df was timestamped with Sat Jan 21 03:15:17 2012 +0000, I don't know why or how this happened. I will close this issue if it goes away on its own.

rebase / merge discussion

I wanted to discuss this outside of Carsten's pull request that created the flurry of comments (mostly by me, sorry about the number!)

This blog post describes succinctly the difference between merge and rebase in day to day git usage. Now as long as work is not shared, rebase certainly keeps the noise down in the repository because there will only be one merge message, that of the pull request once it is merged into the master branch in etmc/tmLQCD.

On the other hand, those extra commits preserve some handy information about the development process which can be seen in the network graph if you compare my three rebased branches (read_input, cgmms_input and urbachFixAutomaticTSDetect) to a branch like Albert's c99_complex or Carsten's AutomaticTSDetect. In the rebased branches it looks as though they have been split off the etmc master branch today (24.01.12) even though in truth they have been split off a week ago.

What are your opinions on merge and rebase? Personally I am quite happy to drop the extra temporal information given by the merge messages for a more streamlined commit history.

configure is buggy

the current configure version does not work on all platforms. On my ubuntu 10.4 I have to run autoconf. But I did not understand the problem yet. It seems the very same version works e.g. on jugene

CGMMS source+propagator format struct confusion

operator.c defines the following structs for use

paramsSourceFormat *sourceFormat = NULL;
paramsPropagatorFormat *propagatorFormat = NULL;
paramsInverterInfo *inverterInfo = NULL;

io/params.h declares
extern paramsGaugeInfo GaugeInfo;
extern paramsPropInfo PropInfo;
extern paramsSourceInfo SourceInfo;

These structs are fairly similar, and their mixing is partly the source of issue #29.
Some cleaning up, or at least clear defined overarching idea would help here.

P_M_eta.c cleanup

P_M_eta.c needs several cleanups:

  • Multiple functions need their own files
  • printf commands need to be dependent on debug level and processor id
  • Check_Approximation is probably not compiled now (not used), and is also broken in current form
  • comments and indenting are in different formats throughout the code, make it uniform

Some solvers try to read a source even when ReadSource = no

BiCGstab, GCR and maybe also MR try to read a source, or end up in utils_parse_propagator_type.c when the ReadSource parameter is set to no. This should not happen, either:

  • If a source must be present, and ReadSource = no, an error should be returned.
  • If a source should not be read, utils_parse_propagator_type should not be entered at all

Schroedinger Functional cleanup

The Schroedinger functional code that currently exists inside the tmLQCD package has several issues.

  • It is not included in existing code very well
    a) For example, sf_get_staples.c replicates the code for get_staples.c 8 times, with only very minor changes (copy the same 13 lines over and over again, modifying only a single line). This is really bad, because the code becomes hard to read (if clause is far separated from its actual effect), and very hard to maintain. What if a bug is found in one of these if clauses, it needs to be fixed everywhere, but where is everywhere?
    b) sf_get_rectangle_staples.c does the same to get_rectangle_staples.c, but then on a whole different scale, one that is SO bad that the compilation time of the ENTIRE tmLQCD package noticeably increases due to this single file. Also here a proper if construction would solve everything in a clean way, and the same story about bug fixing.
    c) Inside hmc_tm, there are a few places where the SF code gets called. Also here a lot of replication of code is present, particularly for the output. Similar things are true for update_tm.c.
  • Debug output appears to be still present.
    a) There are instances of #if 1 or #if 0 preprocessor directives (sf_gauge_monomial.c, sf_observables.c)
    b) printf ("hola"); in sf_gauge_monomial.c
  • Many functions in the same file, mainly in sf_calc_action.c
  • Output not prepared for parallel running, so no preparation for many cores repeating the same statement
  • Inefficiencies in recomputing the same value
    For example, inside hmc_tm.c
    if(g_proc_id==0){
    fprintf(parameterfile,"# First plaquette value for SF: %14.12f \n", plaquette_energy/(6._VOLUME_g_nproc));
    printf("# First plaquette value for SF: %14.12f \n", plaquette_energy/(6._VOLUME_g_nproc));
    fprintf(parameterfile,"# First rectangle value for SF: %14.12f \n", rectangle_energy/(12._VOLUME_g_nproc));
    printf("# First rectangle value for SF: %14.12f \n", rectangle_energy/(12._VOLUME_g_nproc));
    }
    calls both functions twice without first storing intermediate result. Other examples exist.
  • No tests are supplied, and the sample input file currently does not work. And even when fixed (online measurement error) it is hard to see if the code is working, as all trajectories from the test get rejected (tested the first 250, not going to wait more).

Unless someone steps up claiming to currently use the SF code, and provides usable test cases, applying fixes for these issues is likely to introduce undetected bugs, and unlikely to lead to any clear benefit.
Since the code will exist in the repository anyway, I'd propose to just remove it for now, especially since that does have at least some benefits: faster compilation, smaller files, and cleaner code.

gauge_input_filename is too short!

This is quite a dangerous problem which goes back to read_input. When your GaugeInputFilename is longer than 100 characters, read_input WILL write into unspecified memory (strcpy!!) as the length of gauge_input_filename is hardcoded to be 100 characters. This is also true for the ranlux input filename.

Add in CGMMSEO solver (Xining Du)

Xining Du has done some work to the CGMMS solver to make it compatible with EO preconditioning:

"Basically I was using the CG solver with evenodd preconditioning, and I did some modifications (I call it CGMMSEO) to let it solve multiple masses one by one. The reason I did this was that I found it is faster than the CGMMS solver without evenodd preconditioning."
"The code I have been using is the tmLQCD version 5.1.6. However, I did some modifications on top of it, mainly reorganizing the CGeo solvers for multiple masses in a single solver. The basic solver routines are not changed."

It would be good to merge these changes back into the general code.

Hosting of related code

There is some code related to tmLQCD, the repositories of which are currently hosted elsewhere. The obvious example is Lemon, which is even an optional dependency of tmLQCD. I'd say it makes sense to move (or at least mirror) these repositories here as well. It makes for a convenient one-stop process for those who just want to use the code. As for development, there are equally good reasons for moving away from SVN in these cases as there were for tmLQCD itself.

The only significant downside I can think of would be that the current addresses may be provided in publications and presentations. We could fairly easily take care of this by providing tarballs there and/or merging patches back into public SVN repositories. In the case of Lemon, we should actually still be able to update the address.

Are there any objections that I am missing?

DUM_SOLVER needs removal

DUM_SOLVER was removed in solver sub-dir almost completely
checks are needed
and the same must be done for bispinors in solver
then it can be removed also from hmc_tm
in invert it is not needed any longer

square_norm question

I don't really understand what's going on in the linalg/square_norm function (at least the unoptimized one). It seems to me like there are four empty operations taking place.

At the third iteration of the loop we have:

tr = ds_3
ts = ds_1 + ds_2 + ds_3
tt = ds_3
ks = ds_1 + ds_2 + ds_3
kc = 0

Since all the variables are overwritten from one call to the next I presume they were declared static for performance purposes rather than data persistence, so I don't really know why the additions and subtractions are carried out and then discarded.

Configuration file parser debugging is not switchable in inverter

It seems like the debugging information of the configuration file parser (which depends on g_proc_id and verbose through myverbose) cannot be switched on and off without changing invert.c. Is everyone OK with me adding a -v flag to the inverter that switches verbose on? If not, why not?

I'm working on reading the CGMMS masses directly from the configuration file and need to see if my debugging messages are correct.

phase_N <-> kaN "redundancy"

There is a certain redundancy from boundary.[c,h] in D_psi.c now that c99 complex is being integrated, but I don't know how much of a performance impact, if any, it would make to simply replace instances of phase_N by -kaN (where N = {0,1,2,3} ) Any ideas?

Improved version reporting

For debugging purposes it is often necessary to find out what the exact version of a compiled executable is. We have version numbers such as 5.1.5, 5.1.6, but these encompass quite a few SVN revision numbers. SVN provides some revision number information through use of the keywords $Id$, but these only refer to the last change of a file. So, if the backend of the IO changes, but the interface inside hmc_tm.c does not change, there is no way to know this from within hmc_tm.c, and therefore $Id$ will still report an older version number, even though the newer version of the IO backend is already compiled in. There is no way inside SVN to get global revision numbers, outside tools such as svnversion are needed for this. It would be useful to have a workaround for this that does give version information in more detail.

A proposal: make the version calls for hmc_tm and invert go to a separate usage function, which is modified in a pre-commit hook script upon every commit.

CGMMS documentation and sample input

Currently, the CGMMS solver is not listed in section 1.4.2 or section 2.7 of the documentation as one of the possible solvers. This should be fixed.
No sample input files are provided with the CGMMS solver either, and since particularly the extra_masses.input file is needed, a sample for this should be provided.

config.h inclusion everywhere

Every .c file needs to start with

ifdef HAVE_CONFIG_H

include<config.h>

endif

before any other includes, otherwise certain defines do not take effect.

I do not know any case where this is happening right now, reported by Carsten.

modenumber computation happens inline in invert.c, subfunction needed

Invert.c has most of its functionality handed off to subfunctions, such as eigenvalue computation or plaquette measurement. The modenumber computation is done inline inside invert.c, at around line 360-400 in revision 1783. This code should be moved to a subfunction, to clean up and to factorize.

Elaborate on collaboration process

In my opinipn the section on the website describing the collaboration process for tmLQCD could be a bit more explicit to give some pointers:

git branch branchname
git checkout branchname
work, 
git add [...], 
git commit
work,
git add [...],
git commit...
git push origin branchname    # send work to own fork on github

In addition, before submitting a pull request the person should resolve any conflicts that might have developed with the current state of the code. (this can also be done by the integrator of the pull request, but it will be additional work for us, which is not necessarily a bad thing). Doing this regularly will keep the codes in sync and make integration easier.

git remote add upstream [email protected]:etmc/tmLQCD.git
git fetch upstream -v
git merge upstream/master   # merge any conflicts

Remove legacy SVN keywords

The $Id SVN keyword has been used extensively throughout the code to allow automatic documenting of last changes to a file. It and other SVN keywords (Date, Revision, Author, HeadURL) need to be removed everywhere.

Source types affect writing behaviour

When the input file for the inverter is given the following input line:
SourceType = VOLUME

Inversion works and proceeds without problems, however, the propagator is not written to disk.
If however
SourceType = POINT
Is selected, the propagator is written to disk. This discrepancy needs to be cleaned up, either with an additional input parameter that governs propagator writing, or just by fixing the VOLUME case.

SSE2 and SSE3 for smearing and clover_leaf.c

compiling with --enable-sse3|2 gives errors

../../tmLQCD/smearing/stout_stout_smear.c:39: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../../tmLQCD/smearing/stout_stout_smear.c:30: error: ‘asm’ operand has impossible constraints

../tmLQCD/clover_leaf.c:719: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../tmLQCD/clover_leaf.c:611: error: ‘asm’ operand has impossible constraints

which is due to problems in the inline assembly implementation of the su3 etc. macros.

Need to either rework the routines or undef SSE macros in those files.

Benchmark segfaults

It seems to have issues in the xchange_deri routine, probably related to the following when compiling that file:
./xchange_deri.h:33:6: note: expected ‘struct su3adj ** const’ but argument is of type ‘struct su3adj ***’. I think we can just remove the ampersand before the argument at both locations where the function is called, but I'm not sure if it's that simple.

automatic detection of source timeslice

when one wants to treat more than one gauge in a single invert run with reading sources from files depending on the timeslice, an automatic detection of this value is required. I have implemented this in the branch

AutomaticTimesliceDetect

in my fork of tmLQCD in commit urbach@14ab210

Comments would be helpful!

Graceful exit

Is there already some centralised way to force a graceful exit in case of a fatal error such as a failure to malloc? If not, shouldn't we add one? Doesn't have to be complicated, but it would be nice to be able to write something like if (result == very_bad) fatal_error( "Error_message_goes_here"); and have the system take care of dumping the error message, flush buffers and finalizing MPI before quitting. I think this would encourage error checking and result in better code, as well as remove ugly preprocessor checks for MPI in the middle of random functions.

Gauge fixing

It would be good to have a gauge fixing routine in the tmLQCD code, for example for Landau/Coulomb gauge fixing. There are several independently written versions circulated around, but this seems like a specific functionality that would benefit greatly from all the parallel functionality existing in the tmLQCD package. Checks can be made against existing gwc, Zpackage and other codes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.