flamegpu / flamegpu Goto Github PK

License: MIT License

XSLT 6.83% C++ 40.15% C 21.91% Batchfile 0.04% CMake 0.04% Shell 0.06% Python 0.41% Cuda 30.22% Makefile 0.34%

flamegpu's Introduction

FLAME GPU

http://www.flamegpu.com

Current version: 1.5.0

FLAME GPU (Flexible Large-scale Agent Modelling Environment for Graphics Processing Units) is a high performance Graphics Processing Unit (GPU) extension to the FLAME framework.

It provides a mapping between a formal agent specifications with C based scripting and optimised CUDA code. This includes a number of key ABM building blocks such as multiple agent types, agent communication and birth and death allocation. The advantages of our contribution are three fold.

Agent Based (AB) modellers are able to focus on specifying agent behaviour and run simulations without explicit understanding of CUDA programming or GPU optimisation strategies.
Simulation performance is significantly increased in comparison with desktop CPU alternatives. This allows simulation of far larger model sizes with high performance at a fraction of the cost of grid based alternatives.
Massive agent populations can be visualised in real time as agent data is already located on the GPU hardware.

Documentation

The FLAME GPU documentation and user guide can be found at http://docs.flamegpu.com, with source hosted on GitHub at FLAMEGPU/docs.

Getting FLAME GPU

Pre-compiled Windows binaries are available for the example projects in the FLAME-GPU-SDK, available as an archive for each release.

Source is available from GitHub, either as a zip download or via git:

git clone https://github.com/FLAMEGPU/FLAMEGPU.git

git clone [email protected]:FLAMEGPU/FLAMEGPU.git

Building FLAME GPU

FLAME GPU can be built for Windows and Linux. MacOS should work, but is unsupported.

Dependencies

CUDA 8.0 or later
Compute Capability 2.0 or greater GPU (CUDA 8)
- Compute Capability 3.0 or greater GPU (CUDA 9)
Windows
- Microsoft Visual Studio 2015 or later
- Visualisation:
  - freeglut and glew are included with FLAME GPU.
- Optional: make
Linux
- make
- g++ (which supports the cuda version used)
- xsltproc
- Visualistion:
  - GL (deb: libgl1-mesa-dev, yum: mesa-libGL-devel)
  - GLU (deb: libglu1-mesa-dev, yum: mesa-libGLU-devel)
  - GLEW (deb: libglew-dev, yum: glew-devel)
  - GLUT (deb: freeglut3-dev, yum: freeglut-devel)
- Optional: xmllint

Windows using Visual Studio

Visual Studio 2015 solutions are provided for the example FLAME GPU projects. Release and Debug build configurations are provided, for both console mode and (optionally) visualisation mode. Binary files are places in bin/x64/<OPT>_<MODE> where <OPT> is Release or Debug and <MODE> is Console or Visualisation.

An additional solution is provided in the examples directory, enabling batch building of all examples.

`make` for Linux and Windows

make can be used to build FLAME GPU simulations under linux and windows (via a windows implementation of make).

Makefiles are provided for each example project examples/project/Makefile), and for batch building all examples (examples/Makefile).

To build a console example in release mode:

cd examples/EmptyExample/
make console

Or for a visualisation example in release mode:

cd examples/EmptyExample/
make visualisation

Debug mode executables can be built by specifying debug=1 to make, i.e make console debug=1.

Binary files are places in bin/linux-x64/<OPT>_<MODE> where <OPT> is Release or Debug and <MODE> is Console or Visualisation.

For more information on building FLAME GPU via make, run make help in an example directory.

Note on Linux Dependencies

If you are using linux on a managed system (i.e you do not have root access to install packages) you can provide shared object files (.so) for the missing dependencies.

I.e. libglew and libglut.

Download the required shared object files specific to your system configuration, and place in the lib directory. This will be linked at compile time and the dynamic linker will check this directory at runtime.

Alternatively, to package FLAME GPU executables with a different file structure, the .so files can be placed adjacent to the executable file.

Usage

FLAME GPU can be executed as either a console application or as an interactive visualisation. Please see the documentation for further details.

# Console mode
usage: executable [-h] [--help] input_path num_iterations [cuda_device_id] [XML_output_override]

# Interactive visualisation
usage: executable [-h] [--help] input_path [cuda_device_id]

For further details, see the documentation or see executable --help.

Running a Simulation on Windows

Assuming the GameOfLife example has been compiled for visualisation, there are several options for running the example.

Run the included batch script in bin/x64/: GameOfLife_visualisation.bat
Run the executable directly with an initial states file
1. Navigate to the examples/GameOfLife/ directory in a command prompt
2. Run ..\..\bin\x64\Release_Visualisation\GameOfLife.exe iterations\0.xml

Running a Simulation on Linux

Assuming the GameOfLife example has been compiled for visualisation, there are several options for running the example.

Run the included bash script in bin/linux-x64/: GameOfLife_visualisation.sh
Run the executable directly with an initial states file
1. Navigate to the examples/GameOfLife/ directory
2. Run ../../bin/linux-x64/Release_Visualisation/GameOfLife iterations/0.xml

How to Contribute

To report FLAME GPU bugs or request features, please file an issue directly using Github. If you wish to make any contributions, please issue a Pull Request on Github.

Publications

Please cite FLAME GPU using

@article{richmond2010high,
  doi={10.1093/bib/bbp073},
  title={High performance cellular level agent-based simulation with FLAME for the GPU},
  author={Richmond, Paul and Walker, Dawn and Coakley, Simon and Romano, Daniela},
  journal={Briefings in bioinformatics},
  volume={11},
  number={3},
  pages={334--347},
  year={2010},
  publisher={Oxford Univ Press}
}

For an up to date list of publications related to FLAME GPU and it's use, visit the flamegpu.com website.

Authors

FLAME GPU is developed as an open-source project by the Visual Computing research group in the Department of Computer Science at the University of Sheffield. The primary author is Dr Paul Richmond.

Copyright and Software Licence

FLAME GPU is copyright the University of Sheffield 2009 - 2018. Version 1.5.X is released under the MIT open source licence. Previous versions were released under a University of Sheffield End User licence agreement.

Release Notes

1.5.0

Documentation now hosted on readthedocs, http://docs.flamegpu.com and https://github.com/flamegpu/docs
Supports CUDA 8 and CUDA 9
- Removed SM20 and SM21 support from the default build settings (Deprecated / Removed by CUDA 8.0 / 9.0)
Graph communication for agents with new example
Updated Visual Studio version to 2015
Improved linux support by upgraded Makefiles
Additional example projects
Template example has been renamed EmptyExample
tools/new_example.py to quickly create a new example project.
Various bugfixes
Adds step-functions
Adds host-based agent creation for init and step functions
Adds parallel reductions for use in init, step and exit functions
Additional command line options
Environmental variables can now be loaded from 0.xml
Adds the use of colour agent variable to control agent colour in the default visualisation
Additional controls for the default visualisation
Macro definitions for default visualisation colours
Macro definitions for message partitioning strategy
Adds instrumentation for simple performance measurement via preprocessor macros
Improved functions.xslt output for generating template functions files.
Improved state model diagram generator

1.4.3

Updated Circles Example
Purged binaries form history, reducing repository size
Updated Visual Studio Project files to 2013
Improved Visual Studio build customisation
Fixed double precision support within spatial partitioning
Compile-time spatial partition configuration validation

1.4.2

Added support for continuous agents reading discrete messages.

1.4.1

Minor bug fixes and added missing media folder

1.4.0

FLAME GPU 1.4 for CUDA 7 and Visual Studio 2012

##Problem reports

To report a bug in this documentation or in the software or propose an improvement, please use the FLAMEGPU GitHub issue tracker.

flamegpu's People

Contributors

Stargazers

Watchers

flamegpu's Issues

printf & comment typos/errors

Several printf statements (and comments) contain errors / inconsistencies which can be corrected (i.e. memeory).

Invalid XML Model File handling

When an XML Model File is incomplete, there is not necessarily a relevant error message produced for the user.

For instance:
If a message partitioning scheme is not specified for a message, the code is generated but will not compile, due to undefined variable message_<message_name>_count.

Is the model file validated at all or is it up to the end user to ensure the model is valid?
If not, should the default of <gpu:partitioningNone></gpu:partitioningNone> be implied?

[question] FLAMEGPU cluster configuration

Hi,
Is it possible to run FLAMEGPU on a GPU cluster environment (nodes connected via InfiniBand)?

Limitations on Spatially Partitioned Messaging dimensions

Windows imposes limits on the size of statically allocated data. For 32bit and 64bit windows this is limited to 2GB (source).

The number of bins required for spatially partitioned messaging is dependant upon the bounds and radius. If the combination of environment bounds and radius result in a significant number of bins being required (approx 536,870,912 bins, or roughly an 812*812*812 cube ) then the application will not compile.
This should not be a problem for FLAMEGPU2 as dynamically allocated memory can be up to 8TB on Windows x64.

Furthermore, the use of cudaBindTexture in spatially partitioned messaging imposes an additional limit (device dependant) at runtime.
The total number of elements in a linear address range cannot exceed cudaDeviceProp::maxTexture1DLinear[0] (source), which for a Geforce GTX Titan X is 134217728 texels (source).
In practice this limits the number of bins in spatially partitioned messaging to 512*512*512.

Instrumentation: Population from disk

When OUTPUT_POPULATION_PER_ITERATION is set, it would be good to output the population after any INIT functions and/or after the initial set of agents have been loaded from disk.

Support loading of environment constants from initial states file

To avoid users having to create init functions, which set the initial value of environment constants either to a fixed value or by loading from disk manually, it would be beneficial to have this as loading from the initial states XML file (0.xml).

All required information to handle this is available at compile time from XMLModelFile.xml and can be built into io.cu.
This would be much simpler to support using @Robadob 's updated XML handling branch mentioned in #16

Bug on discrete messaging

Alcione has discovered a bug which causes occasional launch failures for discrete message input (of discrete agents). Code sent to Paul for testing.

Split into multiple compilation units

When FLAMEGPU was first setup the 'relocatable device code' option wasn't available, so device functions had to be in the same object as their calling function. This meant that functions.c was made to be included from simulation.cu/fgpu_kernals.cu

This is a bad idea because it means that any pre-processor macro's defined within functions.c, leak into the remainder of simulations.cu to be compiled. It also makes partial compilation slower as the entire of simulation/fgpu_kernals/functions.c must be recompiled if you change 1 line in functions.c. Also being an include, marked as do not compile can confuse IDE's making them not detect when a recompilation is required.

@mozhgan-kch had this issue this afternoon, and short of noticing the highlighting is off (only going to work in an IDE), or working it out yourself it's a pain to diagnose. In this case the cuda constant symbol was being replaced with a literal at compile time, compiling fine and then throwing runtime err with invalid device symbol.

It might be feasible to rename functions.c to functions.cu and enable the relocatable device code option, however I expect there will be other minor changes.

Linux Binary tracked bin/x64/Release_Console

An executable binary for linux has been tracked in the bin/x64/Release_Console directory. This should be removed and ideally purged from history.

https://github.com/FLAMEGPU/FLAMEGPU/blob/master/bin/x64/Release_Console/CirclesBruteForce_float_console

Purge Binaries which should not be tracked from history
Purge XML files which should not be tracked from history
Purge shell scripts which should not be tracked from history
Update .gitignore to prevent these issues in the future / make people have to use git add -f to force them in (which becomes a conscious decision)

Support static graphs for on-network communication

To support on-network communication a static network should be defined as a part of the environment and a new message partitioning technique.

Only IDs will be used in user facing locations, with indices used internally, as supporting work for dynamic networks. This will require lookups between IDs and indices, however if the ID == index for the entire network structure this lookup could be skipped.

An Example XMLModelFile is included

Sort agents before message input functions for spatial partitioned messages

Would improve cache coherence at the cost of sort. @Robadob thoughts?

Poor visualisation performance for Sugarscape Example

The Sugarscape visualisation performance is very poor, and does not render agents in the expected 2D square grid.

Sugarscape agents are discrete and have the agent variable location_id for visualisation purposes, rather than x/y.
x and y are calculated using location_id and population_width during output_agent_agent_to_VBO in visualisation.cu.

int population_width = (int)floor(sqrt((float)get_agent_agent_default_count()));
....
vbo[index].x = (agents->location_id[index] % population_width) - centralise.x;
vbo[index].y = floor((float)agents->location_id[index] / (float)population_width) - centralise.y;

get_agent_agent_default_count() returns xmachine_memory_agent_MAX defined at compile time

#define xmachine_memory_agent_MAX 1048576

There are 65536 agents in examples/Sugarscape/iterations/0.xml, not 104857.

2 solutions:

Adjust the model file to only allow 65k agents, fixing the visualisation for when the number of agents is the maximum defined at compile time
Fix the root issue of using the maximum number of agents not the actual number of agents for the visualisation.

Reduce repository file size

The repository includes files which are not well suited to being tracked by git, nor should be tracked by git.

These files should be removed, and purged from the history of the repository to greatly reduce file size, while all the required files and tools to generate them will still be available.

Executable files should be provided as an archive with each tagged release, removing the need for them to be updated for all commits.

Files to purge from history and add to .gitignore if not already present:

/bin/* other than *.bat files
/examples/*/src/dynamic/*

Optionally a script should be added to the /tools/ directory which compiles all examples and produces an archive of the binary folder to be distributed with a tagged release.

Bug with optional messaging when no messages are output

Spotted by Alcione

Support dynamic graphs with multiple new messaging types

Following on from static graph support a more general case of dynamic graphs can be implemented.
This will allow agents to represent vertices and edges, with additional messaging techniques for edge/vertex based messaging etc.

network-as-agents.txt

@Robadob brought up that we should also consider directed/undirected graphs.

Tidy up branches

Some of these have additional examples which should be merged. Branches which are not merged should have updated descriptions explaining what they are.

Visualisation does not seem to work for multiple agent types

Only the first agent type is rendered. Disabling the draw call for the first agent type enables the second agent type to be seen.

Agent creation from host in INIT and STEP functions

Allow users to create agents from the host, within INIT and STEP functions.

Return/Exit Codes

Currently exit codes / the return of main() are generally all exit(0) for successful or not successful operations, with exceptions for when a cuda error occurs, the cuda error code is used as the exit value, and in main 1 is returned when the device could not be reset.

It would be better to make use of EXIT_FAILURE and EXIT_SUCCESS from <stdlib> where appropriate.

Generated functions.c commented out code formatting

The functions.c template generates commented-out code for each function - i.e. agent output functions or message processing code.

This code uses docblock style comments rather than normal multi-line comments which is unnecessary and results in extra work for the end user.

i.e.

/* //Template for agent output functions
*
* int id = 0;
* int integer_value = 0;
* float float_value = 0;
* add_agent(agents, int id, int integer_value, float float_value);
*/

should be replaced with the following (or similar)

/* 
// Template for agent output functions
int id = 0;
int integer_value = 0;
float float_value = 0;
add_agent(agents, id, integer_value, float_value);
*/

Getting more done in GitHub with ZenHub

Hola! @mondus has created a ZenHub account for the FLAMEGPU organization. ZenHub is the only project management tool integrated natively in GitHub – created specifically for fast-moving, software-driven teams.

How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

Real-time, customizable task boards for GitHub issues;
Multi-Repository burndown charts, estimates, and velocity tracking based on GitHub Milestones;
Personal to-do lists and task prioritization;
Time-saving shortcuts – like a quick repo switcher, a “Move issue” button, and much more.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @mondus.

Project creation / Empty Project

Creating a new model generally involves taking an existing model and stripping out the details.

Providing an empty example would simplify this process.

Another option would be to add the relevant VS commands to create a new project of the sort.

Projects outside of the examples folder for independent git repos would be nice too.

makefile typos

Noticed a few typos in the makefiles.

Issues with negative timestep on Pedestrian Navigation

Linux Build Warnings

Warnings are generated during compilation under linux, which should be resolved where possible.

In file included from src/visualisation/CustomVisualisation.cpp:26:0:
src/visualisation/CustomVisualisation.h:50:8: warning: extra tokens at end of #endif directive
 #endif __VISUALISATION
        ^
In file included from src/visualisation/CustomVisualisation.cpp:27:0:
src/visualisation/GLUTInputController.h:55:8: warning: extra tokens at end of #endif directive
 #endif __GLUT_INPUT_CONTROLLER

...

 src/visualisation/CustomVisualisation.cpp: In function ‘void initVisualisation()’:
src/visualisation/CustomVisualisation.cpp:65:45: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
     char *argv[] = {"GLUT application", NULL};

...

src/visualisation/MenuDisplay.cpp: In function ‘void drawInfoDisplay(int, int)’:
src/visualisation/MenuDisplay.cpp:206:63: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
   printInfoLine("********** Simulation Information **********");

Modify Archive creation

Modify Archive creation to include all source + binaries (anything other than temporary files) so that only a single download is required when users wish to run examples without compilation, and immediately follow development, rather than having to also download the source zip.

This will also mark the next release 1.4.3.

Generate Step/Exit functions in functions.xslt

In the Analytics branch the step/exit functions outlines should be generated, i.e

__FLAME_GPU_STEP_FUNC__ void .... {
//...
}

Discrete agents can not read continuous messages (either non partitioned or spatial)

No examples in the SDK use this functionality. Issue spotted by Alcione.

Environment constant getter functions do not compile for arrays

The constant getter functions added in 09c4c67 such as const int* get_CYCLE_LENGTH() from the keratinocyte example do not compile.

As these are fixed size arrays, (i.e. int h_env_CYCLE_LENGTH[5]) this is not simple, and merely changing the prototype to const int** get_CYCLE_LENGTH() is not sufficient.

Incorrect FPS reporter

The average FPS reported using average/(millis/1000.0f) does not match with the actual FPS seen. Can fix this by adding a variable frame_time in global visualisation.cu memory and including frame_time += millis; each frame as well as setting to zero during if(frame_count == average){ ... average/(frame_time/1000.0f); frame_time = 0.0f; }

Also possible to include is the frame time, which can be calculated every average frames with millis for that frame. I.e. the entire window title becomes
sprintf(title, "Execution & Rendering Total: %f (FPS), %f milliseconds per frame", average/(frame_time/1000.0f), millis);

Linux builds report missing glu.h for boids example

ubuntu@ip-10-63-219-58:~/FLAMEGPU/examples$ make
*********************************************************************
*  Copyright 2016 University of Sheffield.  All rights reserved.    *
*********************************************************************
make all -> Processes XML model and builds default modes per example*
           ------------------------------------------------         *
make build -> builds all executables in either release or debug     *
              only use this, if you already have all the .cu files  *
           ------------------------------------------------         *
All scripts are stored in bin/x64. To run, simple exectue the script*
*********************************************************************
ubuntu@ip-10-63-219-58:~/FLAMEGPU/examples$ make all
make[1]: Entering directory `/home/ubuntu/FLAMEGPU/examples/Boids_BruteForce'
xmllint --noout src/model/XMLModelFile.xml --schema ../../FLAMEGPU/schemas/XMMLG             PU.xsd
src/model/XMLModelFile.xml validates
xsltproc ../../FLAMEGPU/templates/header.xslt  src/model/XMLModelFile.xml> src/d             ynamic/header.h
xsltproc ../../FLAMEGPU/templates/FLAMEGPU_kernals.xslt src/model/XMLModelFile.x             ml > src/dynamic/FLAMEGPU_kernals.cu
xsltproc ../../FLAMEGPU/templates/io.xslt src/model/XMLModelFile.xml > src/dynam             ic/io.cu
xsltproc ../../FLAMEGPU/templates/simulation.xslt src/model/XMLModelFile.xml > s             rc/dynamic/simulation.cu
xsltproc ../../FLAMEGPU/templates/main.xslt src/model/XMLModelFile.xml > src/dyn             amic/main.cu
xsltproc ../../FLAMEGPU/templates/visualisation.xslt src/model/XMLModelFile.xml              > src/dynamic/visualisation.cu
"/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -I../../common/inc  -m64     -gencode              arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=com             pute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,             code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=com             pute_52 -I ../../include/ -I src/model/ -I src/dynamic/ -I src/visualisation/ -I              ../../include//GL/ -I../../lib/ -o io.o -c src/dynamic/io.cu
"/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -I../../common/inc  -m64     -gencode              arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=com             pute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,             code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=com             pute_52 -I ../../include/ -I src/model/ -I src/dynamic/ -I src/visualisation/ -I              ../../include//GL/ -I../../lib/ -o simulation.o -c src/dynamic/simulation.cu
"/usr/local/cuda-7.5"/bin/nvcc -ccbin g++ -I../../common/inc  -m64     -gencode              arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=com             pute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,             code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=com             pute_52 -I ../../include/ -I src/model/ -I src/dynamic/ -I src/visualisation/ -I              ../../include//GL/ -I../../lib/ -o main_.o -c -DVISUALISATION src/dynamic/main.             cu
In file included from src/dynamic/main.cu:21:0:
../../include/GL/glew.h:1142:20: fatal error: GL/glu.h: No such file or director             y
 #include <GL/glu.h>
                    ^
compilation terminated.
make[1]: *** [main_.o] Error 1
make[1]: Leaving directory `/home/ubuntu/FLAMEGPU/examples/Boids_BruteForce'
make: *** [Boids_BruteForce/Makefile.ph_build] Error 2

Multiple Spatial Partitions unsupported in agent_function

If an agent function requires access to two spatially partitioned message lists, both parameters are named partition_matrix within header.h, meaning that compilation fails.

Noticed by Mozhgan.

Should be a quick fix;

a) Remove identifiers from the method prototypes
b) append _<xsl-select....name()> to the identifiers

The rarely used template for functions.c will likely also require updating.

Upgrade projects + makefiles to use CUDA 8.0 by default

CUDA 8.0 was announced over a year ago, and provides significantly improved compilation time compared to 7.5.

Additionally CUDA 9.0 has been announced (although with no public release date, likely Q3/Q4 2017 to support Volta GPUs in DGX-1v/DGX Station etc)

Visual studio project files, and linux Makefiles should be modified to default to 8.0, or the latest installed version if possible (Makefiles).

Move to glm for math vectors

Initial change is at the glm-switch feature branch, I've fixed the bugs that appeared when compiling a few different projects (GLM required switching to CPP from C and a double support function which requires internal use of int2). Leaving the whole solution to batch build overnight.

Updating the model diagram generation code

The model diagram generation code (model2dot.py) was previously written by Alcione Oliveira. The generated model should be able to include init,step,and exit functions.

Stable Marriage Example init file is invalid

Stable Marriage init file incorrectly provides Woman agents with preferred_woman properties, assuming the data is as intended this should be preferred_man as specified in the xmlmodelfile.

XML Upgrade

Working in the rapidxml branch, I've updated the io.xslt template to use the rapidxml header lib for importing initialisation files. This makes the generated code far more approachable and maintainable (example), (however I decided to change io.cu, into io.cpp as the CUDA compiler produce alot of garbage warnings when including rapidxml.hpp).

In doing so, I added support for a <nowarn\> flag in the root of an init file to suppress warnings about missing agent properties and new support for setting environmental constants based on the init file as visible in the Keratinocyte init file (this was something visible in a few places, but never implemented with most examples using init functions or other custom code).

I've tested that it still loads things correctly with a few existing models (e.g. the pedestrian visualisation) and breakpointing to check var values (e.g. Keratinocyte constants). Just running a full rebuild to ensure nothing else has been broken, will merge after @mondus has had a chance to approve next week.

Add FLAME GPU documentation source files + build script(s) to repository

Add the source files for the FLAME GPU documentation to the repository to enable collaborative editing.

We will likely use pandoc to convert a markdown file to pdf, or possibly just latex. Build script(s) for windows/linux will also be required.

FLAME GPU makefile problem with Iceberg

FLAME GPU make files do not work on Iceberg (The Sheffiled Uni HPC facility). The issue resolves around the call to nvcc.

The make file includes the CUDA_PATH so that calls to nvcc end up looking like /usr/local/cuda-7.5/bin/nvcc rather than just nvcc. I think that the former changes the location of where nvcc looks for files to the bin directory rather than the makefile location so that the cuda input files are not found. E.g.

/usr/local/cuda-7.5/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -I ../../include/ -I src/model/ -I src/dynamic/ -I src/visualisation/ -I ../../include/GL/ -o io.o -c src/dynamic/io.cu bash: /usr/local/cuda-7.5/bin/nvcc: No such file or directory

From the makefile fails where as modifying the call to

nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -I ../../include/ -I src/model/ -I src/dynamic/ -I src/visualisation/ -I ../../include/GL/ -o io.o -c src/dynamic/io.cu
works correctly.

Not possible to run generic code between simulation steps

This is required in some models where rules are based on statistical data from the population. For example birth rate may be a function of the total population density, rather than the perceived local density. Such calculations could be performed by each agent but it makes sense to have host functions which are able perform things like reductions on agent variables.

Possible bug in circles model

Just noticed that on Line 71 of circles/functions.c the value of xmemory->fx is being overwritten with the count of the number of iterations at the end of the kernel. This seems out of place, as xmemory->fx and xmemory->fy have been adjusted within the neighbourhood search and nothing done with the result (so why would they be treated differently?).

https://github.com/FLAMEGPU/FLAMEGPU/blob/master/examples/CirclesBruteForce_float/src/model/functions.c#L71

Generic functions required for reducing agent variables

Required for calculating properties about agents for analysis. Required functions would be min, max, histogram etc.

Pointer swaps for agent xmachine_memory_*_lists

@mondus suggested that under certain conditions it should be possible to avoid using append_<agentname>_Agents kernels to move agents from one state to another during agent functions.

If currentState == nextState, it should be safe to do a simple pointer swap.
- If there are no function conditions and reallocate==false
- Global conditions should be fine.
If there is a state change (currentState != nextState), and the population of the nextState == 0 then a pointer swap should be possible
- If there are no function conditions and reallocate == false.
- Possible edge case: multiple agent functions in the same layer have the same nextState
If the
When reallocate == true, it may be possible to still do the pointer swap
- It depends on the reallocation process, needs to be looked into.
The case of functions which create agents needs to be considered.

The cost of the append function is relatively small compared to typical agent functions, however, for large agent populations with large numbers of agent parameters the cost will be insignificant.
For the keratinocyte model with only 1024 agents and the append kernel only takes ~4μs per invocation plus any kernel sync overheads, however this is a very small agent population.

Texture binding doesn't support double's correctly.

When binding spatially partitioned messages, the allocated texture memory is sized as count*sizeof(int) this should be count*sizeof(<xsl:value-of select="xmml:type"/>) otherwise none 4-byte values (e.g. doubles) are handled incorrectly, leading to bad message parsing (aka a bunch of 0'd messages if in the 2nd half).

Simulation.xslt::746

Is:
gpuErrchk( cudaBindTexture(&tex_xmachine_message_<xsl:value-of select="../../xmml:name"/>_<xsl:value-of select="xmml:name"/>_byte_offset, tex_xmachine_message_<xsl:value-of select="../../xmml:name"/>_<xsl:value-of select="xmml:name"/>, d_<xsl:value-of select="../../xmml:name"/>s-><xsl:value-of select="xmml:name"/>, sizeof(int)*xmachine_message_<xsl:value-of select="../../xmml:name"/>_MAX));

Should be:
gpuErrchk( cudaBindTexture(&tex_xmachine_message_<xsl:value-of select="../../xmml:name"/>_<xsl:value-of select="xmml:name"/>_byte_offset, tex_xmachine_message_<xsl:value-of select="../../xmml:name"/>_<xsl:value-of select="xmml:name"/>, d_<xsl:value-of select="../../xmml:name"/>s-><xsl:value-of select="xmml:name"/>, sizeof(<xsl:value-of select="xmml:type"/>)*xmachine_message_<xsl:value-of select="../../xmml:name"/>_MAX));

Notice the change of sizeof(int) on the start of the last wrapped line.

Linux Compilation

With the majority of national HPC facilities and most research Universities running Linux on their production servers (with GPUs), it would be great if FlameGPU was developed with a multi-platform focus such that compilation/execution was supported on both Windows and Linux (Ubuntu/CentOS). My understanding is that the new NVIDIA DGX system will run a custom version of Ubuntu Linux and the majority of GPU accelerated servers in US University compute centers (such as mine at ND) run CentOS/RedHat.

Potentially incorrect velocity matching in boids model

In https://github.com/FLAMEGPU/FLAMEGPU/blob/master/examples/Boids_Partitioning/src/model/functions.c#L148 the global velocity is normalized over the collision_count. However, the velocity is added for all boids in the interaction radius rather than separation radius, so this should probably be using the global_centre_count.

Additionally https://github.com/FLAMEGPU/FLAMEGPU/blob/master/examples/Boids_Partitioning/src/model/functions.c#L152 assigns match_velocity = match_velocity * MATCH_SCALE, where probably global_velocity was intended on the right-hand-side. Right now match_velocity will always stay zero.

Agent's located outside of env bounds read from some bins twice during spatial partitioning.

If an agent is located outside of the environmental bounds min<= x <max their grid location is wrapped. Due to the wrapping only replacing out of bounds locations with the opposite bound, an agent that is out of bounds, can search further out of bounds cells, causing both cells to be 'wrapped' to the same cell.

Hence causing the agent to double dip.

Potential fixes:

Do proper modular wrapping
Throw an assertion error when agents are located out of bounds
Increase env max by +radius (and potentially decrease min by the same)
- If you only increase env max by 1, the number of bins increases, however they are shared across the env width, meaning that the interaction radius is subsequently reduced.

Reductions across agent variable arrays

Reductions across agent arrays seems incorrect, as the number of elements in the array is not considered.
i.e.

float reduce_Agent_default_example_array_variable(){
    //reduce in default stream
    return thrust::reduce(thrust::device_pointer_cast(d_Agents_default->example_array),  thrust::device_pointer_cast(d_Agents_default->example_array) + h_xmachine_memory_Agent_default_count);
}

Where each example_array is 4 elements long per agent.

As the array is strided (h_Agents_default->example_array[(j*xmachine_memory_Agent_MAX)+i])) the reduction is only applied to the 0th element of each agent array.

Incorrect error message with invalid device ID

When specifying a device ID on launch of a simulation, if an invalid device is selected an appropriate error message should be shown.

The wrong error is generated when a user selects the first device which does not exist. I.e. on a system with 2 GPUs, setting the device id parameter to 2 generates the error message

Error setting CUDA device!

rather than

Error selecting CUDA device! Device id '2' is not found?

Update FLAME GPU Documentation

The documentation needs to be updated with recent features and developments including:

Instrumentation constants
Limitations of statically allocated memory (2GB limit on windows) which effects agent populations, message list size and the number of bins available for spatially partitioned messaging.
The current Linux build process
Step Functions
Analytics functions such as reductions
- Including that these are not available for agent variable arrays
Upcoming host-based agent creation (init and step functions)
PAUSE_ON_START in visualisation.h

etc.

https://github.com/FLAMEGPU/FLAMEGPU_TechnicalReport