etmc / tmlqcd Goto Github PK

tmLQCD is a freely available software suite providing a set of tools to be used in lattice QCD simulations. This is mainly a HMC implementation (including PHMC and RHMC) for Wilson, Wilson Clover and Wilson twisted mass fermions and inverter for different versions of the Dirac operator. The code is fully parallelised and ships with optimisations for various modern architectures, such as commodity PC clusters and the Blue Gene family.

Home Page: http://www.itkp.uni-bonn.de/~urbach/software.html

License: GNU General Public License v3.0

C 90.33% C++ 3.54% Python 0.28% Makefile 0.87% Shell 0.14% Perl 0.15% Lex 3.01% Pawn 0.49% Assembly 0.82% R 0.30% POV-Ray SDL 0.08%

hmc lqcd multigrid quda solver ddalphaamg qphix clover rhmc twisted

tmlqcd's People

Contributors

Stargazers

Watchers

tmlqcd's Issues

configuration file parser

FLEX is a pain to use, XML is a pain to write, how about finding a usable alternative?

Source location information is not always available in propagators

It would be good to have the source location (and perhaps more info) added in the inverter info lime message or be present in propagator files in some other way.

can linsolve.c|h be removed from suite

solve_cg is not used any longer, so can it be removed?

Send notes of Bonn meeting

Make and send the notes of this meeting around to the larger ETMC group.

Remove configure script from repository

I have tentatively removed the configure script from the repository in my unit testing branch. A change in configure.in results in thousands of lines of changes in configure. While the configure script should remain in the tarball distribution I don't think a git repository is the right place for it. Are there any objections to this?

Remove global spinor field g_spinor_field

move to something like used now for gauge fields

define debug levels

currently we do have the DebugLevel option, what it was never written down which type of message we want to have at which deubg level. Siebren, you thought about it already, didn't you? So maybe we can have sort of a list defining this, but right now I don't have a very good idea, how...!

cgmms solver always writes in single precision

Output propagator precision is hardcoded in solver/cg_mms_tm.c: it is set to 32

Move from own complex to C99 complex

install some unit testing system

check cunit

Add link to all pages on wiki

As discusssed in Bonn we should probably have a link in the menu to display all pages on the wiki. This is achieved in MoinMoin by linking to

https://znwiki3.ifh.de/ETMC/TitleIndex

LEMON writer fails

This is using LEMON from the git repository.

# Trajectory is accepted.
# Writing gauge field to .conf.tmp.
# Constructing LEMON writer for file .conf.tmp for append = 0
[LEMON] Node 0 reports in lemonWriteLatticeParallel:
    Could not write the required amount of data.
[LEMON] Node 1 reports in lemonWriteLatticeParallel:
    Could not write the required amount of data.
LEMON write error occurred with status = -5, while writing in gauge_write_binary.c!
[LEMON] Node 1 reports in lemonWriteRecordHeader:
    Writer not ready for header.
KILL_WITH_ERROR on node 1: Header writing error. Aborting
LEMON write error occurred with status = -5, while writing in gauge_write_binary.c!
[LEMON] Node 0 reports in lemonWriteRecordHeader:
    Writer not ready for header.
KILL_WITH_ERROR on node 0: Header writing error. Aborting
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 10128 on
node artemis exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[artemis:10126] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[artemis:10126] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
zsh: exit 1     mpirun -np 2 ./hmc_tm -f ../sample-input/sample-hmc0.input

Tag 5-1-6 is missing

The tmLQCD version currently describes itself as 5.1.6, yet there is no tag for this version in the repository. There should be a version tagged as 5.1.6, and we are currently developing 5.1.7, 5.2 or 6.

util/io.c contains gauge read functionality, can that be dropped

Inside util/io.c there exists a function

int read_lime_gauge_field_doubleprec(double * config, char * filename,
const int T, const int LX, const int LY, const int LZ) {

and the function:

int read_lime_gauge_field_singleprec(float * config, char * filename,
const int T, const int LX, const int LY, const int LZ){

I suspect both are to be removed, because this functionality can be found in the io/ directory. So unless I hear protests, I will do so.

Supply sample input files for all available solvers

Why .NOTPARALLEL: in Makefile.in?

Is there are reason why we have the .NOTPARALLEL in the Makefile? It really slows down the compilation, especially on machines with many cores where one could potentially compile 6 to 8 modules in parallel.

Definition of ALIGN and ALIGN_BASE

These preprocessor defines are currently done in sse.h, if some level of explicit SSE optimization is requested. This means, however, that any code that declares memory for potential use with SSE routines needs to include some #ifdef checks. Those could be skipped if these defines were done in some central location and were simply set to 0 if no SSE optimization was requested.

A corollary to this, is the special case of wanting alignment without wanting to use the manual SSE routines. With increasing compiler sophistication, there are indications that automatically generated SSE code is starting to outperform our current implementations. While this is in fact quite awesome, it does make a case for allowing for alignment -- still needed for optimal performance -- without necessarily activating the SSE routines themselves. This is currently impossible.

One possible location for alignment definitions if global.h. If we want alignment to be a separate option, maybe an inclusion in the configuration script and config.h might be more appropriate.

commit 9cec2dfa has timestamp in the future

I don't know what happened there and whether it will continue to be a problem... It seems like commit 9cec2df was timestamped with Sat Jan 21 03:15:17 2012 +0000, I don't know why or how this happened. I will close this issue if it goes away on its own.

rebase / merge discussion

I wanted to discuss this outside of Carsten's pull request that created the flurry of comments (mostly by me, sorry about the number!)

This blog post describes succinctly the difference between merge and rebase in day to day git usage. Now as long as work is not shared, rebase certainly keeps the noise down in the repository because there will only be one merge message, that of the pull request once it is merged into the master branch in etmc/tmLQCD.

On the other hand, those extra commits preserve some handy information about the development process which can be seen in the network graph if you compare my three rebased branches (read_input, cgmms_input and urbachFixAutomaticTSDetect) to a branch like Albert's c99_complex or Carsten's AutomaticTSDetect. In the rebased branches it looks as though they have been split off the etmc master branch today (24.01.12) even though in truth they have been split off a week ago.

What are your opinions on merge and rebase? Personally I am quite happy to drop the extra temporal information given by the merge messages for a more streamlined commit history.

configure is buggy

the current configure version does not work on all platforms. On my ubuntu 10.4 I have to run autoconf. But I did not understand the problem yet. It seems the very same version works e.g. on jugene

remove LAPACK dependency

Remove APEnext define

CGMMS source+propagator format struct confusion

operator.c defines the following structs for use

paramsSourceFormat *sourceFormat = NULL;
paramsPropagatorFormat *propagatorFormat = NULL;
paramsInverterInfo *inverterInfo = NULL;

io/params.h declares
extern paramsGaugeInfo GaugeInfo;
extern paramsPropInfo PropInfo;
extern paramsSourceInfo SourceInfo;

These structs are fairly similar, and their mixing is partly the source of issue #29.
Some cleaning up, or at least clear defined overarching idea would help here.

Parallel I/O for propagators needs to be better protected against errors

Similar to what was done for gauge I/O, propagator I/O needs to be protected against I/O errors through readbacks or other checks. Part of this is already implemented, but it needs to be everywhere.

P_M_eta.c cleanup

P_M_eta.c needs several cleanups:

Multiple functions need their own files
printf commands need to be dependent on debug level and processor id
Check_Approximation is probably not compiled now (not used), and is also broken in current form
comments and indenting are in different formats throughout the code, make it uniform

Some solvers try to read a source even when ReadSource = no

BiCGstab, GCR and maybe also MR try to read a source, or end up in utils_parse_propagator_type.c when the ReadSource parameter is set to no. This should not happen, either:

If a source must be present, and ReadSource = no, an error should be returned.
If a source should not be read, utils_parse_propagator_type should not be entered at all

Schroedinger Functional cleanup

The Schroedinger functional code that currently exists inside the tmLQCD package has several issues.

It is not included in existing code very well
a) For example, sf_get_staples.c replicates the code for get_staples.c 8 times, with only very minor changes (copy the same 13 lines over and over again, modifying only a single line). This is really bad, because the code becomes hard to read (if clause is far separated from its actual effect), and very hard to maintain. What if a bug is found in one of these if clauses, it needs to be fixed everywhere, but where is everywhere?
b) sf_get_rectangle_staples.c does the same to get_rectangle_staples.c, but then on a whole different scale, one that is SO bad that the compilation time of the ENTIRE tmLQCD package noticeably increases due to this single file. Also here a proper if construction would solve everything in a clean way, and the same story about bug fixing.
c) Inside hmc_tm, there are a few places where the SF code gets called. Also here a lot of replication of code is present, particularly for the output. Similar things are true for update_tm.c.
Debug output appears to be still present.
a) There are instances of #if 1 or #if 0 preprocessor directives (sf_gauge_monomial.c, sf_observables.c)
b) printf ("hola"); in sf_gauge_monomial.c
Many functions in the same file, mainly in sf_calc_action.c
Output not prepared for parallel running, so no preparation for many cores repeating the same statement
Inefficiencies in recomputing the same value
For example, inside hmc_tm.c
if(g_proc_id==0){
fprintf(parameterfile,"# First plaquette value for SF: %14.12f \n", plaquette_energy/(6._VOLUME_g_nproc));
printf("# First plaquette value for SF: %14.12f \n", plaquette_energy/(6._VOLUME_g_nproc));
fprintf(parameterfile,"# First rectangle value for SF: %14.12f \n", rectangle_energy/(12._VOLUME_g_nproc));
printf("# First rectangle value for SF: %14.12f \n", rectangle_energy/(12._VOLUME_g_nproc));
}
calls both functions twice without first storing intermediate result. Other examples exist.
No tests are supplied, and the sample input file currently does not work. And even when fixed (online measurement error) it is hard to see if the code is working, as all trajectories from the test get rejected (tested the first 250, not going to wait more).

Unless someone steps up claiming to currently use the SF code, and provides usable test cases, applying fixes for these issues is likely to introduce undetected bugs, and unlikely to lead to any clear benefit.
Since the code will exist in the repository anyway, I'd propose to just remove it for now, especially since that does have at least some benefits: faster compilation, smaller files, and cleaner code.

clean README file

gauge_input_filename is too short!

This is quite a dangerous problem which goes back to read_input. When your GaugeInputFilename is longer than 100 characters, read_input WILL write into unspecified memory (strcpy!!) as the length of gauge_input_filename is hardcoded to be 100 characters. This is also true for the ranlux input filename.

Add in CGMMSEO solver (Xining Du)

Xining Du has done some work to the CGMMS solver to make it compatible with EO preconditioning:

"Basically I was using the CG solver with evenodd preconditioning, and I did some modifications (I call it CGMMSEO) to let it solve multiple masses one by one. The reason I did this was that I found it is faster than the CGMMS solver without evenodd preconditioning."
"The code I have been using is the tmLQCD version 5.1.6. However, I did some modifications on top of it, mainly reorganizing the CGeo solvers for multiple masses in a single solver. The basic solver routines are not changed."

It would be good to merge these changes back into the general code.

Add in gradient flow

Hosting of related code

There is some code related to tmLQCD, the repositories of which are currently hosted elsewhere. The obvious example is Lemon, which is even an optional dependency of tmLQCD. I'd say it makes sense to move (or at least mirror) these repositories here as well. It makes for a convenient one-stop process for those who just want to use the code. As for development, there are equally good reasons for moving away from SVN in these cases as there were for tmLQCD itself.

The only significant downside I can think of would be that the current addresses may be provided in publications and presentations. We could fairly easily take care of this by providing tarballs there and/or merging patches back into public SVN repositories. In the case of Lemon, we should actually still be able to update the address.

Are there any objections that I am missing?

DUM_SOLVER needs removal

DUM_SOLVER was removed in solver sub-dir almost completely
checks are needed
and the same must be done for bispinors in solver
then it can be removed also from hmc_tm
in invert it is not needed any longer

square_norm question

I don't really understand what's going on in the linalg/square_norm function (at least the unoptimized one). It seems to me like there are four empty operations taking place.

At the third iteration of the loop we have:

tr = ds_3
ts = ds_1 + ds_2 + ds_3
tt = ds_3
ks = ds_1 + ds_2 + ds_3
kc = 0

Since all the variables are overwritten from one call to the next I presume they were declared static for performance purposes rather than data persistence, so I don't really know why the additions and subtractions are carried out and then discarded.

Configuration file parser debugging is not switchable in inverter

It seems like the debugging information of the configuration file parser (which depends on g_proc_id and verbose through myverbose) cannot be switched on and off without changing invert.c. Is everyone OK with me adding a -v flag to the inverter that switches verbose on? If not, why not?

I'm working on reading the CGMMS masses directly from the configuration file and need to see if my debugging messages are correct.

phase_N <-> kaN "redundancy"

There is a certain redundancy from boundary.[c,h] in D_psi.c now that c99 complex is being integrated, but I don't know how much of a performance impact, if any, it would make to simply replace instances of phase_N by -kaN (where N = {0,1,2,3} ) Any ideas?

Improved version reporting

For debugging purposes it is often necessary to find out what the exact version of a compiled executable is. We have version numbers such as 5.1.5, 5.1.6, but these encompass quite a few SVN revision numbers. SVN provides some revision number information through use of the keywords $Id$, but these only refer to the last change of a file. So, if the backend of the IO changes, but the interface inside hmc_tm.c does not change, there is no way to know this from within hmc_tm.c, and therefore $Id$ will still report an older version number, even though the newer version of the IO backend is already compiled in. There is no way inside SVN to get global revision numbers, outside tools such as svnversion are needed for this. It would be useful to have a workaround for this that does give version information in more detail.

A proposal: make the version calls for hmc_tm and invert go to a separate usage function, which is modified in a pre-commit hook script upon every commit.

CGMMS documentation and sample input

Currently, the CGMMS solver is not listed in section 1.4.2 or section 2.7 of the documentation as one of the possible solvers. This should be fixed.
No sample input files are provided with the CGMMS solver either, and since particularly the extra_masses.input file is needed, a sample for this should be provided.

config.h inclusion everywhere

Every .c file needs to start with

ifdef HAVE_CONFIG_H

include<config.h>

endif

before any other includes, otherwise certain defines do not take effect.

I do not know any case where this is happening right now, reported by Carsten.

modenumber computation happens inline in invert.c, subfunction needed

Invert.c has most of its functionality handed off to subfunctions, such as eigenvalue computation or plaquette measurement. The modenumber computation is done inline inside invert.c, at around line 360-400 in revision 1783. This code should be moved to a subfunction, to clean up and to factorize.

Elaborate on collaboration process

In my opinipn the section on the website describing the collaboration process for tmLQCD could be a bit more explicit to give some pointers:

git branch branchname
git checkout branchname
work, 
git add [...], 
git commit
work,
git add [...],
git commit...
git push origin branchname    # send work to own fork on github

In addition, before submitting a pull request the person should resolve any conflicts that might have developed with the current state of the code. (this can also be done by the integrator of the pull request, but it will be additional work for us, which is not necessarily a bad thing). Doing this regularly will keep the codes in sync and make integration easier.

git remote add upstream [email protected]:etmc/tmLQCD.git
git fetch upstream -v
git merge upstream/master   # merge any conflicts

Remove legacy SVN keywords

The $Id SVN keyword has been used extensively throughout the code to allow automatic documenting of last changes to a file. It and other SVN keywords (Date, Revision, Author, HeadURL) need to be removed everywhere.

Source types affect writing behaviour

When the input file for the inverter is given the following input line:
SourceType = VOLUME

Inversion works and proceeds without problems, however, the propagator is not written to disk.
If however
SourceType = POINT
Is selected, the propagator is written to disk. This discrepancy needs to be cleaned up, either with an additional input parameter that governs propagator writing, or just by fixing the VOLUME case.

SSE2 and SSE3 for smearing and clover_leaf.c

compiling with --enable-sse3|2 gives errors

../../tmLQCD/smearing/stout_stout_smear.c:39: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../../tmLQCD/smearing/stout_stout_smear.c:30: error: ‘asm’ operand has impossible constraints

../tmLQCD/clover_leaf.c:719: error: can't find a register in class ‘GENERAL_REGS’ while reloading ‘asm’
../tmLQCD/clover_leaf.c:611: error: ‘asm’ operand has impossible constraints

which is due to problems in the inline assembly implementation of the su3 etc. macros.

Need to either rework the routines or undef SSE macros in those files.

Benchmark segfaults

It seems to have issues in the xchange_deri routine, probably related to the following when compiling that file:
./xchange_deri.h:33:6: note: expected ‘struct su3adj ** const’ but argument is of type ‘struct su3adj ***’. I think we can just remove the ampersand before the argument at both locations where the function is called, but I'm not sure if it's that simple.

automatic detection of source timeslice

when one wants to treat more than one gauge in a single invert run with reading sources from files depending on the timeslice, an automatic detection of this value is required. I have implemented this in the branch

AutomaticTimesliceDetect

in my fork of tmLQCD in commit urbach@14ab210

Comments would be helpful!

enabling gprof is currently broken

Enabling gprof is currently broken because both the compiler and the linker need to be called with the -pg flag.

Graceful exit

Is there already some centralised way to force a graceful exit in case of a fatal error such as a failure to malloc? If not, shouldn't we add one? Doesn't have to be complicated, but it would be nice to be able to write something like if (result == very_bad) fatal_error( "Error_message_goes_here"); and have the system take care of dumping the error message, flush buffers and finalizing MPI before quitting. I think this would encourage error checking and result in better code, as well as remove ugly preprocessor checks for MPI in the middle of random functions.

Gauge fixing

It would be good to have a gauge fixing routine in the tmLQCD code, for example for Landau/Coulomb gauge fixing. There are several independently written versions circulated around, but this seems like a specific functionality that would benefit greatly from all the parallel functionality existing in the tmLQCD package. Checks can be made against existing gwc, Zpackage and other codes.

Minutes archive on wiki

I've started an archive of minutes on the wiki to make it possible to easily recover discussions even in the far past.

https://znwiki3.ifh.de/ETMC/Minutes

Just creating this issue here in case there are any objections to this.

etmc / tmlqcd Goto Github PK

tmlqcd's People

Contributors

Stargazers

Watchers

Forkers

tmlqcd's Issues

ifdef HAVE_CONFIG_H

endif

Recommend Projects

Recommend Topics

Recommend Org