Giter VIP home page Giter VIP logo

hwloc's Introduction

This is a truncated and poorly-formatted version of the documentation main page.
See https://www.open-mpi.org/projects/hwloc/doc/ for more.


hwloc Overview

The Hardware Locality (hwloc) software project aims at easing the process of
discovering hardware resources in parallel architectures. It offers
command-line tools and a C API for consulting these resources, their locality,
attributes, and interconnection. hwloc primarily aims at helping
high-performance computing (HPC) applications, but is also applicable to any
project seeking to exploit code and/or data locality on modern computing
platforms.

hwloc provides command line tools and a C API to obtain the hierarchical map of
key computing elements within a node, such as: NUMA memory nodes, shared
caches, processor packages, dies and cores, processing units (logical
processors or "threads") and even I/O devices. hwloc also gathers various
attributes such as cache and memory information, and is portable across a
variety of different operating systems and platforms.

hwloc primarily aims at helping high-performance computing (HPC) applications,
but is also applicable to any project seeking to exploit code and/or data
locality on modern computing platforms.

hwloc supports the following operating systems:

  * Linux (with knowledge of cgroups and cpusets, memory targets/initiators,
 etc.) on all supported hardware, including Intel Xeon Phi, ScaleMP vSMP,
 and NumaScale NumaConnect.
  * Solaris (with support for processor sets and logical domains)
  * AIX
  * Darwin / OS X
  * FreeBSD and its variants (such as kFreeBSD/GNU)
  * NetBSD
  * HP-UX
  * Microsoft Windows

Since it uses standard Operating System information, hwloc's support is mostly
independant from the processor type (x86, powerpc, ...) and just relies on the
Operating System support. The main exception is BSD operating systems (NetBSD,
FreeBSD, etc.) because they do not provide support topology information, hence
hwloc uses an x86-only CPUID-based backend (which can be used for other OSes
too, see the Components and plugins section).

To check whether hwloc works on a particular machine, just try to build it and
run lstopo or lstopo-no-graphics. If some things do not look right (e.g. bogus
or missing cache information), see Questions and Bugs.

hwloc only reports the number of processors on unsupported operating systems;
no topology information is available.

For development and debugging purposes, hwloc also offers the ability to work
on "fake" topologies:

  * Symmetrical tree of resources generated from a list of level arities, see
 Synthetic topologies.
  * Remote machine simulation through the gathering of topology as XML files,
 see Importing and exporting topologies from/to XML files.

hwloc can display the topology in a human-readable format, either in graphical
mode (X11), or by exporting in one of several different formats, including:
plain text, LaTeX tikzpicture, PDF, PNG, and FIG (see Command-line Examples
below). Note that some of the export formats require additional support
libraries.

hwloc offers a programming interface for manipulating topologies and objects.
It also brings a powerful CPU bitmap API that is used to describe topology
objects location on physical/logical processors. See the Programming Interface
below. It may also be used to binding applications onto certain cores or memory
nodes. Several utility programs are also provided to ease command-line
manipulation of topology objects, binding of processes, and so on.

Bindings for several other languages are available from the project website.

Command-line Examples

On a 4-package 2-core machine with hyper-threading, the lstopo tool may show
the following graphical output:

[dudley]

Here's the equivalent output in textual form:

Machine
  NUMANode L#0 (P#0)
  Package L#0 + L3 L#0 (4096KB)
 L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
   PU L#0 (P#0)
   PU L#1 (P#8)
 L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
   PU L#2 (P#4)
   PU L#3 (P#12)
  Package L#1 + L3 L#1 (4096KB)
 L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
   PU L#4 (P#1)
   PU L#5 (P#9)
 L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
   PU L#6 (P#5)
   PU L#7 (P#13)
  Package L#2 + L3 L#2 (4096KB)
 L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
   PU L#8 (P#2)
   PU L#9 (P#10)
 L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
   PU L#10 (P#6)
   PU L#11 (P#14)
  Package L#3 + L3 L#3 (4096KB)
 L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
   PU L#12 (P#3)
   PU L#13 (P#11)
 L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
   PU L#14 (P#7)
   PU L#15 (P#15)

Note that there is also an equivalent output in XML that is meant for exporting
/importing topologies but it is hardly readable to human-beings (see Importing
and exporting topologies from/to XML files for details).

On a 4-package 2-core Opteron NUMA machine (with two core cores disallowed by
the administrator), the lstopo tool may show the following graphical output
(with --disallowed for displaying disallowed objects):

[hagrid]

Here's the equivalent output in textual form:

Machine (32GB total)
  Package L#0
 NUMANode L#0 (P#0 8190MB)
 L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
 L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
  Package L#1
 NUMANode L#1 (P#1 8192MB)
 L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
 L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
  Package L#2
 NUMANode L#2 (P#2 8192MB)
 L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
 L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
  Package L#3
 NUMANode L#3 (P#3 8192MB)
 L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
 L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)

On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
package):

[emmett]

Here's the same output in textual form:

Machine (total 16GB)
  NUMANode L#0 (P#0 16GB)
  Package L#0
 L2 L#0 (4096KB)
   L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
   L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4)
 L2 L#1 (4096KB)
   L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
   L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
  Package L#1
 L2 L#2 (4096KB)
   L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
   L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
 L2 L#3 (4096KB)
   L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3)
   L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)

Programming Interface

The basic interface is available in hwloc.h. Some higher-level functions are
available in hwloc/helper.h to reduce the need to manually manipulate objects
and follow links between them. Documentation for all these is provided later in
this document. Developers may also want to look at hwloc/inlines.h which
contains the actual inline code of some hwloc.h routines, and at this document,
which provides good higher-level topology traversal examples.

To precisely define the vocabulary used by hwloc, a Terms and Definitions
section is available and should probably be read first.

Each hwloc object contains a cpuset describing the list of processing units
that it contains. These bitmaps may be used for CPU binding and Memory binding.
hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h.

Moreover, hwloc also comes with additional helpers for interoperability with
several commonly used environments. See the Interoperability With Other
Software section for details.

The complete API documentation is available in a full set of HTML pages, man
pages, and self-contained PDF files (formatted for both both US letter and A4
formats) in the source tarball in doc/doxygen-doc/.

NOTE: If you are building the documentation from a Git clone, you will need to
have Doxygen and pdflatex installed -- the documentation will be built during
the normal "make" process. The documentation is installed during "make install"
to $prefix/share/doc/hwloc/ and your systems default man page tree (under
$prefix, of course).

Portability

Operating System have varying support for CPU and memory binding, e.g. while
some Operating Systems provide interfaces for all kinds of CPU and memory
bindings, some others provide only interfaces for a limited number of kinds of
CPU and memory binding, and some do not provide any binding interface at all.
Hwloc's binding functions would then simply return the ENOSYS error (Function
not implemented), meaning that the underlying Operating System does not provide
any interface for them. CPU binding and Memory binding provide more information
on which hwloc binding functions should be preferred because interfaces for
them are usually available on the supported Operating Systems.

Similarly, the ability of reporting topology information varies from one
platform to another. As shown in Command-line Examples, hwloc can obtain
information on a wide variety of hardware topologies. However, some platforms
and/or operating system versions will only report a subset of this information.
For example, on an PPC64-based system with 8 cores (each with 2 hardware
threads) running a default 2.6.18-based kernel from RHEL 5.4, hwloc is only
able to glean information about NUMA nodes and processor units (PUs). No
information about caches, packages, or cores is available.

Here's the graphical output from lstopo on this platform when Simultaneous
Multi-Threading (SMT) is enabled:

[ppc64-with]

And here's the graphical output from lstopo on this platform when SMT is
disabled:

[ppc64-with]

Notice that hwloc only sees half the PUs when SMT is disabled. PU L#6, for
example, seems to change location from NUMA node #0 to #1. In reality, no PUs
"moved" -- they were simply re-numbered when hwloc only saw half as many (see
also Logical index in Indexes and Sets). Hence, PU L#6 in the SMT-disabled
picture probably corresponds to PU L#12 in the SMT-enabled picture.

This same "PUs have disappeared" effect can be seen on other platforms -- even
platforms / OSs that provide much more information than the above PPC64 system.
This is an unfortunate side-effect of how operating systems report information
to hwloc.

Note that upgrading the Linux kernel on the same PPC64 system mentioned above
to 2.6.34, hwloc is able to discover all the topology information. The
following picture shows the entire topology layout when SMT is enabled:

[ppc64-full]

Developers using the hwloc API or XML output for portable applications should
therefore be extremely careful to not make any assumptions about the structure
of data that is returned. For example, per the above reported PPC topology, it
is not safe to assume that PUs will always be descendants of cores.

Additionally, future hardware may insert new topology elements that are not
available in this version of hwloc. Long-lived applications that are meant to
span multiple different hardware platforms should also be careful about making
structure assumptions. For example, a new element may someday exist between a
core and a PU.

API Example

The following small C example (available in the source tree as ``doc/examples/
hwloc-hello.c'') prints the topology of the machine and performs some thread
and memory binding. More examples are available in the doc/examples/ directory
of the source tree.

/* Example hwloc API program.
*
* See other examples under doc/examples/ in the source tree
* for more details.
*
* Copyright (c) 2009-2016 Inria. All rights reserved.
* Copyright (c) 2009-2011 Universit?eacute; Bordeaux
* Copyright (c) 2009-2010 Cisco Systems, Inc. All rights reserved.
* See COPYING in top-level directory.
*
* hwloc-hello.c
*/
#include "hwloc.h"
#include <errno.h>
#include <stdio.h>
#include <string.h>
static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
int depth)
{
char type[32], attr[1024];
unsigned i;
hwloc_obj_type_snprintf(type, sizeof(type), obj, 0);
printf("%*s%s", 2*depth, "", type);
if (obj->os_index != (unsigned) -1)
printf("#%u", obj->os_index);
hwloc_obj_attr_snprintf(attr, sizeof(attr), obj, " ", 0);
if (*attr)
printf("(%s)", attr);
printf("\n");
for (i = 0; i < obj->arity; i++) {
print_children(topology, obj->children[i], depth + 1);
}
}
int main(void)
{
int depth;
unsigned i, n;
unsigned long size;
int levels;
char string[128];
int topodepth;
void *m;
hwloc_topology_t topology;
hwloc_cpuset_t cpuset;
hwloc_obj_t obj;
/* Allocate and initialize topology object. */
hwloc_topology_init(&topology);
/* ... Optionally, put detection configuration here to ignore
some objects types, define a synthetic topology, etc....
The default is to detect all the objects of the machine that
the caller is allowed to access. See Configure Topology
Detection. */
/* Perform the topology detection. */
hwloc_topology_load(topology);
/* Optionally, get some additional topology information
in case we need the topology depth later. */
topodepth = hwloc_topology_get_depth(topology);
/*****************************************************************
* First example:
* Walk the topology with an array style, from level 0 (always
* the system level) to the lowest level (always the proc level).
*****************************************************************/
for (depth = 0; depth < topodepth; depth++) {
printf("*** Objects at level %d\n", depth);
for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
i++) {
hwloc_obj_type_snprintf(string, sizeof(string),
hwloc_get_obj_by_depth(topology, depth, i), 0);
printf("Index %u: %s\n", i, string);
}
}
/*****************************************************************
* Second example:
* Walk the topology with a tree style.
*****************************************************************/
printf("*** Printing overall tree\n");
print_children(topology, hwloc_get_root_obj(topology), 0);
/*****************************************************************
* Third example:
* Print the number of packages.
*****************************************************************/
depth = hwloc_get_type_depth(topology, HWLOC_OBJ_PACKAGE);
if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
printf("*** The number of packages is unknown\n");
} else {
printf("*** %u package(s)\n",
hwloc_get_nbobjs_by_depth(topology, depth));
}
/*****************************************************************
* Fourth example:
* Compute the amount of cache that the first logical processor
* has above it.
*****************************************************************/
levels = 0;
size = 0;
for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
obj;
obj = obj->parent)
if (hwloc_obj_type_is_cache(obj->type)) {
levels++;
size += obj->attr->cache.size;
}
printf("*** Logical processor 0 has %d caches totaling %luKB\n",
levels, size / 1024);
/*****************************************************************
* Fifth example:
* Bind to only one thread of the last core of the machine.
*
* First find out where cores are, or else smaller sets of CPUs if
* the OS doesn't have the notion of a "core".
*****************************************************************/
depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);
/* Get last core. */
obj = hwloc_get_obj_by_depth(topology, depth,
hwloc_get_nbobjs_by_depth(topology, depth) - 1);
if (obj) {
/* Get a copy of its cpuset that we may modify. */
cpuset = hwloc_bitmap_dup(obj->cpuset);
/* Get only one logical processor (in case the core is
SMT/hyper-threaded). */
hwloc_bitmap_singlify(cpuset);
/* And try to bind ourself there. */
if (hwloc_set_cpubind(topology, cpuset, 0)) {
char *str;
int error = errno;
hwloc_bitmap_asprintf(&str, obj->cpuset);
printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error));
free(str);
}
/* Free our cpuset copy */
hwloc_bitmap_free(cpuset);
}
/*****************************************************************
* Sixth example:
* Allocate some memory on the last NUMA node, bind some existing
* memory to the last NUMA node.
*****************************************************************/
/* Get last node. There's always at least one. */
n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NUMANODE);
obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NUMANODE, n - 1);
size = 1024*1024;
m = hwloc_alloc_membind(topology, size, obj->nodeset,
HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_BYNODESET);
hwloc_free(topology, m, size);
m = malloc(size);
hwloc_set_area_membind(topology, m, size, obj->nodeset,
HWLOC_MEMBIND_BIND, HWLOC_MEMBIND_BYNODESET);
free(m);
/* Destroy topology object. */
hwloc_topology_destroy(topology);
return 0;
}

hwloc provides a pkg-config executable to obtain relevant compiler and linker
flags. See Compiling software on top of hwloc's C API for details on building
program on top of hwloc's API using GNU Make or CMake.

On a machine 2 processor packages -- each package of which has two processing
cores -- the output from running hwloc-hello could be something like the
following:

shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine
*** Objects at level 1
Index 0: Package#0
Index 1: Package#1
*** Objects at level 2
Index 0: Core#0
Index 1: Core#1
Index 2: Core#3
Index 3: Core#2
*** Objects at level 3
Index 0: PU#0
Index 1: PU#1
Index 2: PU#2
Index 3: PU#3
*** Printing overall tree
Machine
  Package#0
 Core#0
   PU#0
 Core#1
   PU#1
  Package#1
 Core#3
   PU#2
 Core#2
   PU#3
*** 2 package(s)
*** Logical processor 0 has 0 caches totaling 0KB
shell$

Questions and Bugs

Bugs should be reported in the tracker (https://github.com/open-mpi/hwloc/
issues). Opening a new issue automatically displays lots of hints about how to
debug and report issues.

Questions may be sent to the users or developers mailing lists (https://
www.open-mpi.org/community/lists/hwloc.php).

There is also a #hwloc IRC channel on Libera Chat (irc.libera.chat).

History / Credits

hwloc is the evolution and merger of the libtopology project and the Portable
Linux Processor Affinity (PLPA) (https://www.open-mpi.org/projects/plpa/)
project. Because of functional and ideological overlap, these two code bases
and ideas were merged and released under the name "hwloc" as an Open MPI
sub-project.

libtopology was initially developed by the Inria Runtime Team-Project. PLPA was
initially developed by the Open MPI development team as a sub-project. Both are
now deprecated in favor of hwloc, which is distributed as an Open MPI
sub-project.



See https://www.open-mpi.org/projects/hwloc/doc/ for more hwloc documentation,
actual links to related pages, images, etc.

hwloc's People

Contributors

awlauria avatar bgoglin avatar cbordage avatar civodul avatar clementfoyer avatar dawid-lukwinski avatar ggouaillardet avatar grzegorz-andrejczuk avatar haampie avatar hannesweisbach avatar hzhou avatar jjhursey avatar jpeyton52 avatar jsquyres avatar jyvet avatar mark-mb avatar michalbiesek avatar miketxli avatar ncorgan avatar nfurmento avatar ompiteam avatar pavanbalaji avatar philippemilink avatar pioy avatar pnacht avatar roblatham00 avatar scivision avatar sthibaul avatar tkoeppe avatar xiongzubiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hwloc's Issues

Hwloc build errors on SPARC Solaris with native compiler

Attempting a build on Solaris 10 with Sun compiler tools v 5.9, configured with:
/configure --target=sparc-sun-solaris2.10 --prefix=.../hwloc-sparc --enable-debug=no --disable-xml --disable-cairo --disable-visibility CC="cc -xc99=all" CXX="CC" CFLAGS="-m64" CXXFLAGS="-m64" LDFLAGS=""

hwloc_have_cpuid() was undefined (and referenced), as was hwloc_cpuid(). I couldn't find a configuration option that would fix this, so ended up changing include/private/cpuid.h as shown in the attachment. The config.h comes up with HWLOC_HAVE_CPUID=1.

network topology support

How do we gather multiple machine information and store them in the same big topology so that the process manager has a global knowledge of the cluster?

  • Need a way to merge multiple "local" topology in a single big one

    /* create a topology with only a System object root /
    hwloc_topology_create_empty()
    /
    load a XML topology and insert it below a given object */
    hwloc_topology_insert_xml_by_parent()

A new utility would use these to agregate multiple XML topologies. You
would have to run lstopo foo.xml on each node and run this new utility
to create the global XML topology. Finally, you can run hwloc with the
new global topology and do whatever you want.

mpirun lstopo .xml
hwloc_xml_agregate cluster.xml *.xml
export HWLOC_XMLFILE=cluster.xml

  • Need to extend cpusets, either by extracting the local part before binding, or by adding a "network id" attribute internally, or a local flag to objects (set to 0 by default when agregating topologies, and process can then set it back to 1 their own Machine object and children).
  • Need network topology detection

Note: For OFED, ibnetdiscover provides the network topology

fgets() return value not checked

Pavan reported that when compiling with super-picky compiler flags, we get warnings about not checking the return status of fgets():

{{{
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'hwloc_parse_sysfs_unsigned':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:241:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'hwloc_read_cpuset_mask':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:326:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:346:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'look_cpuinfo':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:863:
warning: ignoring return value of 'fscanf', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'hwloc__get_dmi_info':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:917:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:931:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
}}}

Here's the super-picky flags that he used:

{{{
libtool: compile: gcc -DHAVE_CONFIG_H -I.
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src
-I../include/private -I../include/hwloc
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include
-I../include -std=c99 -Wall -Wmissing-prototypes -Wundef -Wpointer-arith
-Wcast-align -O2 -Wall -Wextra -Wno-missing-field-initializers
-Wstrict-prototypes -Wmissing-prototypes -DGCC_WALL
-Wno-unused-parameter -Wno-unused-label -Wshadow -Wmissing-declarations
-Wno-long-long -Wfloat-equal -Wdeclaration-after-statement -Wundef
-Wno-endif-labels -Wpointer-arith -Wbad-function-cast -Wcast-align
-Wwrite-strings -Wno-sign-compare -Waggregate-return
-Wold-style-definition -Wno-multichar -Wno-deprecated-declarations
-Wpacked -Wnested-externs -Winvalid-pch -Wno-pointer-sign
-Wvariadic-macros -std=c89 -Wno-format-zero-length -Wno-type-limits
-D_POSIX_C_SOURCE=199506L -MT cpuset.lo -MD -MP -MF .deps/cpuset.Tpo -c
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c
-o cpuset.o >/dev/null 2>&1
}}}

utils man pages depend on executables

As mentioned in http://www.open-mpi.org/community/lists/hwloc-devel/2009/09/0060.php, there's a causality issue in "make dist": the man pages depend on the executables (because the man pages are generated via help2man).

This causes a problem with the following (e.g., nightly tarball generation):

{{{
svn co ...
cd ...
./autogen.sh
./configure
make dist
}}}

because the executables will try to build, but fail when there is no libhwloc.la.

A workaround, of course, is to "make all" first and then "make dist". But this is somewhat icky; it would be nice to have a better solution.

hwloc fails to link with gcc >= 4.3 with -fexceptions

Initially reported by LANL on the OMPI trac (https://svn.open-mpi.org/trac/ompi/ticket/2778), I have similar linking problems if I use a hand-installed gcc 4.5 on RHEL5. Notes:

  • Happens with hand-install gcc 4.5 on RHEL5, but not the built-in gcc 4.1
  • Happens with hwloc 1.1.2, but not with hwloc 1.2 or trunk

Here's the specific link failure I get if I compile hwloc by itself (i.e., not as part of OMPI):

{{{
[11:36] svbu-mpi:/svn/hwloc-1.1 % ./configure CFLAGS=-fexceptions
...lots of output...
[11:36] svbu-mpi:
/svn/hwloc-1.1 % make
Making all in src
make[1]: Entering directory /home/jsquyres/svn/hwloc-1.1/src' CC topology.lo CC traversal.lo CC topology-synthetic.lo CC bind.lo CC cpuset.lo CC misc.lo CC topology-xml.lo CC topology-linux.lo CC topology-x86.lo CCLD libhwloc.la .libs/traversal.o: In function__pthread_cleanup_routine':
traversal.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-synthetic.o: In function__pthread_cleanup_routine':
topology-synthetic.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/bind.o: In function__pthread_cleanup_routine':
bind.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/cpuset.o: In function__pthread_cleanup_routine':
cpuset.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/misc.o: In function__pthread_cleanup_routine':
misc.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-xml.o: In function__pthread_cleanup_routine':
topology-xml.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-linux.o: In function__pthread_cleanup_routine':
topology-linux.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-x86.o: In function__pthread_cleanup_routine':
topology-x86.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here collect2: ld returned 1 exit status make[1]: *** [libhwloc.la] Error 1 make[1]: Leaving directory/home/jsquyres/svn/hwloc-1.1/src'
make: *** [all-recursive] Error 1
}}}

Need to investigate this more to see if it's worthwhile to issue a 1.1.3 or not.

hwloc build fails with strict compiler flags

Here's a snippet of the error:

{{{
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src -I../include/private -I../include/hwloc -I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include -I../include -std=c99 -Wall -Wmissing-prototypes -Wundef -Wpointer-arith -Wcast-align -Wall -Wextra -Wno-missing-field-initializers -Wstrict-prototypes -Wmissing-prototypes -DGCC_WALL -Wno-unused-parameter -Wno-unused-label -Wshadow -Wmissing-declarations -Wno-long-long -Wfloat-equal -Wdeclaration-after-statement -Wundef -Wno-endif-labels -Wpointer-arith -Wbad-function-cast -Wcast-align -Wwrite-strings -Wno-sign-compare -Waggregate-return -Wold-style-definition -Wno-multichar -Wno-deprecated-declarations -Wpacked -Wnested-externs -Winvalid-pch -Wno-pointer-sign -Wvariadic-macros -std=c89 -Wno-format-zero-length -Wno-type-limits -D_POSIX_C_SOURCE=199506L -g -MT topology.lo -MD -MP -MF .deps/topology.Tpo -c /home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c -fPIC -DPIC -o .libs/topology.o
In file included from /home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c:20:
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include/hwloc.h: In function 'hwloc_get_obj_by_type':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include/hwloc.h:425: warning: declaration of 'index' shadows a global declaration
/usr/include/string.h:309: warning: shadowed declaration is here

[...snip...]

/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c:1313: warning: ISO C90 forbids mixed declarations and code
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c:1337: warning: ISO C90 forbids mixed declarations and code
make[2]: *** [topology.lo] Error 1
make[2]: Leaving directory /home/balaji/projects/mpich2/hydra/build/tools/bind/hwloc/hwloc/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/home/balaji/projects/mpich2/hydra/build/tools/bind/hwloc/hwloc'
make: *** [all-recursive] Error 1
}}}

This is causing MPICH2's builds to fail when configured with strict compiler options.

vsnprintf warnings

When using super-picky compilation warning flags, hwloc gets warnings about vsnprintf:

{{{
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c:
In function 'hwloc_snprintf':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c:37:
warning: implicit declaration of function 'vsnprintf'
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c:37:
warning: nested extern declaration of 'vsnprintf'
}}}

Here's the flags that Pavan used to generate these warnings:

{{{
libtool: compile: gcc -DHAVE_CONFIG_H -I.
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src
-I../include/private -I../include/hwloc
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include
-I../include -std=c99 -Wall -Wmissing-prototypes -Wundef -Wpointer-arith
-Wcast-align -O2 -Wall -Wextra -Wno-missing-field-initializers
-Wstrict-prototypes -Wmissing-prototypes -DGCC_WALL
-Wno-unused-parameter -Wno-unused-label -Wshadow -Wmissing-declarations
-Wno-long-long -Wfloat-equal -Wdeclaration-after-statement -Wundef
-Wno-endif-labels -Wpointer-arith -Wbad-function-cast -Wcast-align
-Wwrite-strings -Wno-sign-compare -Waggregate-return
-Wold-style-definition -Wno-multichar -Wno-deprecated-declarations
-Wpacked -Wnested-externs -Winvalid-pch -Wno-pointer-sign
-Wvariadic-macros -std=c89 -Wno-format-zero-length -Wno-type-limits
-D_POSIX_C_SOURCE=199506L -MT cpuset.lo -MD -MP -MF .deps/cpuset.Tpo -c
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c
-o cpuset.o >/dev/null 2>&1
}}}

Pavan suggested the following (on #16):

The vsnprintf warnings occur because snprintf and vsnprintf are present only in C99, not C89. There are a few solutions possible:

  1. Check in configure to (i) add a prototype for snprintf/vsnprintf where needed and (ii) add an alternative implementation for them for platforms that don't provide them.

  2. An alternative (simpler) solution is to include MPL ( https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpl) into hwloc and just use MPL_snprintf and friends everywhere.

  3. Check if snprintf/vsnprintf exist in configure and abort if they don't. Other libraries relying on hwloc can see this error and not build hwloc in those cases.

    Not sure if either approach is acceptable for you guys, so I'm leaving this ticket as closed. Please reopen if appropriate.

IRIX support

sysmp(MP_NPROCS/MP_NAPROCS/MP_STAT)
NUMA: /hw : /hw/nodenum/0 -> /hw/module/1/slot/n1/node
/hw/cpunum/0 -> /hw/module/1/slot/n1/node/cpu/a
check through getmntent where hwgfs is mounted
sysmp(MP_MUSTRUN/MP_MUSTRUN_PID)
PTHREAD_SCOPE_BOUND_NP
pthread_setrunon_np()
process_cpulink()
mld_create() mldset_create() numa_acreate() migr_range_migrate()

hwloc-aware top/ps

Based on discussion here, it looks like some people use top to verify the binding of their MPI processes. They will get physical processor index there. But they might want to easily check that the binding is "hwloc-correct" when MPI uses hwloc for binding.

So having a hwloc-aware top (or ps) would be good. No need to reimplement everything, only showing basic top/ps info would be enough. Something like below should be easy:
<socket1.core5>

With some options for filtering with a userid, process name, ...

Maybe print %CPU if it's easy to retrieve (but we may need to make it refresh the display every second or so, which opens the room for lots of useless requests from users). Probably better to let the user revert to the plain top/ps instead or reinventing the wheel.

add linux cgroup support (seems to be cpuset-exclusive?)

add linux cgroup support (seems to be cpuset-exclusive?)
{{{
if /proc/self/cgroup exists and is not empty, take the path from the 3rd ':'-separated field
read cpuset cpulist in /dev/cgroup//cpuset.cpus
read cpuset memlist in /dev/cgroup//cpuset.mems
}}}

32 bit builds fail

hwloc fails to build when CFLAGS=-m32. It has shown up in nightly Open MPI test builds, but is easy to reproduce in standalone builds:

{{{
$ ./configure CFLAGS=-m32
...
$ make
...
Making all in src
make[1]: Entering directory /nfs/rinfs/san/homedirs/jsquyres/svn/hwloc/src' depbase=echo topology-x86.lo | sed 's|[^/]_$|.deps/&|;s|.lo$||';\ /bin/sh ../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../include/private -I../include/hwloc -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -std=gnu99 -fvisibility=hidden -I/usr/include/libxml2 -std=gnu99 -fvisibility=hidden -m32 -pipe -I/u/jsquyres/svn/hwloc/include -Wall -Wunused-parameter -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -MT topology-x86.lo -MD -MP -MF $depbase.Tpo -c -o topology-x86.lo topology-x86.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../include/private -I../include/hwloc -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -std=gnu99 -fvisibility=hidden -I/usr/include/libxml2 -std=gnu99 -fvisibility=hidden -m32 -pipe -I/u/jsquyres/svn/hwloc/include -Wall -Wunused-parameter -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -MT topology-x86.lo -MD -MP -MF .deps/topology-x86.Tpo -c topology-x86.c -fPIC -DPIC -o .libs/topology-x86.o /u/jsquyres/svn/hwloc/include/private/cpuid.h: In function ‘hwloc_cpuid’: /u/jsquyres/svn/hwloc/include/private/cpuid.h:54: error: can't find a register in class ‘BREG’ while reloading ‘asm’ make[1]: *_\* [topology-x86.lo] Error 1 make[1]: Leaving directory /nfs/rinfs/san/homedirs/jsquyres/svn/hwloc/src'
make: *** [all-recursive] Error 1
}}}

Any ideas?

convert between "L2cache"-like string and (type,depth) and actual depth within the tree

We usually cannot currently convert between a type and a depth if the type is cache or group because we ignore the corresponding depth attribute. People have to manually handle the HWLOC_TYPE_DEPTH_MULTIPLE.

Since r3255, hwloc-calc uses this internally:
int hwloc_obj_type_sscanf(const char *string, hwloc_obj_type_t *typep, unsigned *depthattrp)
It could be added to the public interface. But we need to make sure that unsigned depthattr is the only object attribute that may ever appear in the complete type name. It currently works for L2Cache and Group3. I don't know what else we could ever have.

Then, we could have something converting between (hwloc_obj_type_t type, unsigned depthattr) and an actual depth in the tree (unsigned depth). But we need good names for these. We already have get_type_depth and get_depth_type. Maybe this:

int hwloc_get_type_depthattr_depth(topology, type, depthattr, &depth);
int hwloc_get_depth_type_depthattr(topology, depth, &type, &depthattr);

Ticket #50 talks about adding instruction caches to hwloc, it would then be needed in the aforementioned function. depthattr may then become a union containing an int for groups (depth) and two ints for cache (depth + cachetype).

multinode graphical lstopo output

lstopo currently uses boxes for everything, except when a system object contains multiple Machine objects (it draws a network).

With custom topologies, we can now easily get multiple levels of Groups between Machine and System. And we can also get multiple System levels if we assemble multiple times.

Ideally, this special drawing would even be used as soon as we have objects with cpusets above objects with cpusets.

hwloc-calc hierarchical output formatting

From http://www.open-mpi.org/community/lists/hwloc-users/2011/02/0276.php

hwloc-calc may currently convert anything into a list of objects given as type:index. The above message suggests that it may be useful to report as type1:index1.type2:index2 but there is no easy to guess what type1 and type2 should be (and the user may want more levels).

So maybe do hwloc-calc --ho socket,core to report a hierarchical output as socket:X.core:Y

If multiple cores are included in the input, just append another socket:T.core:Z string.

If the input is smaller than a single core, two solutions:

  • socket:X.core:Y.L1Cache:Z
  • socket:X.core:Y and specify in the doc that the output may be larger than the input

array of stringified infos

As discussed a while ago, I think we should add something like this to the end of the hwloc_obj structure:

char infos; /< \brief Array of string name=value /
unsigned infos_count; /
*< \brief Length of the infos array */

We would store in there things like:

DMIBoardVendor=Tyan (currently in obj->attr->machine.dmi_board_vendor)
DMIBoardModel=S4885 (currently in obj->attr->machine.dmi_board_info)
PCIVendor=AMD
PCIModel=Radeon HD4350

Some of them are already used in obj->name but that doesn't need to change.

Some system-fields might be interesting too, they should go in the topology or in the widest related object:

Backend=Synthetic
OS=Linux
LinuxCpuset=/foobar
FsRoot=/var/lib/topology/myworderfulmachine
Hostname=foobar

get nbprocs on the command-line

As suggested by Samuel, we could have an easy way to get the number of processors from the command line:

shell$ lstopo --n<proc|core|socket|node|machine|system>
4

Pave the way for network support

  • the top object may not always be a system.
  • objects may not have a cpuset.
  • there is no global notion of cpuset, only relative to a tree of
    objects representing a machine.

hwloc_distribute should handle asymetric topologies

hwloc_distribute currently assumes that all children of an object have
the same weight. There should be at least variants which take into
account the cpuset/gpuset etc.

It should also likely ignore children with empty CPU sets (happens with CPU-less NUMA nodes).

Windows warning

I'm getting this warning when compiling on RHEL4 with gcc:

{{{
../../src/topology-windows.c: In function `hwloc_look_windows':
../../src/topology-windows.c:194: warning: assignment from incompatible pointer type
}}}

I don't know anything about Windows code to fix it...

PLPA-like API (or at least PLPA-like information retrieval)

Most core/socket/processor-id conversion routines are easy to implement.

One thing that we miss is the number of offline processors.
We could add offline_procs to struct topology_info, but should be put ignored procs there as well (in case of cpuset or other administrator-disabling thing) ?
Might be worth fixing before 0.9.1 so that we don't change struct topology_info later.

Or we could just drop struct topology_info since it became very small now (we didn't want ten different accessors but it's not the case anymore). Maybe make it hwloc_get_topology_depth() and hwloc_is_thissystem() ?

Add cpulist-string to/from cpuset conversion routines?

hwloc-ps core dump on Altix, with CPU set interactions

hwloc 1.2's hwloc-ps dumps core when executed from a user's non-root
CPU set:

{{{
cs@altix-02$ cat /proc/self/cpuset
/
cs@altix-02$ hwloc-ps
cs@altix-02$ echo $$ | sudo cpuset -a /test
cpuset: attached one pid to cpuset
cs@altix-02$ cat /proc/self/cpuset
/test
cs@altix-02$ hwloc-ps
Segmentation fault (core dumped)
}}}

After rebuilding hwloc-1.2 for debugging, here's what I learned
with gdb:

{{{
cs@altix-02$ gdb /usr/local/bin/hwloc-ps
GNU gdb 6.2.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-suse-linux"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run
Starting program: /usr/local/bin/hwloc-ps

Program received signal SIGSEGV, Segmentation fault.
hwloc_obj_type_snprintf (string=0x60000fffffffad00 "Machine", size=64,
obj=0x0, verbose=1) at traversal.c:188
188 hwloc_obj_type_t type = obj->type;
(gdb) bt
#0 hwloc_obj_type_snprintf (string=0x60000fffffffad00 "Machine", size=64,

obj=0x0, verbose=1) at traversal.c:188

#1 0x4000000000002c20 in main (argc=1, argv=0x60000fffffffb1a8)

at hwloc-ps.c:144

(gdb) up
#1 0x4000000000002c20 in main (argc=1, argv=0x60000fffffffb1a8)

at hwloc-ps.c:144

144 hwloc_obj_type_snprintf(type, sizeof(type), obj, 1);
(gdb) list 140
135 hwloc_bitmap_asprintf(&cpuset_str, cpuset);
136 printf("%s", cpuset_str);
137 } else {
138 hwloc_bitmap_t remaining = hwloc_bitmap_dup(cpuset);
139 int first = 1;
140 while (!hwloc_bitmap_iszero(remaining)) {
141 char type[64];
142 unsigned idx;
143 hwloc_obj_t obj = hwloc_get_first_largest_obj_inside_cpuset(topology, remaining);
144 hwloc_obj_type_snprintf(type, sizeof(type), obj, 1);
(gdb) print topology
$1 = 0x6000000000008010
(gdb) print *topology
$2 = {nb_levels = 3, next_group_depth = 0, level_nbobjects = {1, 2, 1,
0 <repeats 125 times>}, levels = {0x60000000000090d0, 0x600000000000ab70,
0x600000000000acc0, 0x0 <repeats 125 times>}, flags = 0, type_depth = {0,
0, 1, -1, -1, -1, 2, -1, -1}, ignored_types = {HWLOC_IGNORE_TYPE_NEVER,
HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER,
HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER,
HWLOC_IGNORE_TYPE_KEEP_STRUCTURE, HWLOC_IGNORE_TYPE_NEVER},
is_thissystem = 1, is_loaded = 1, pid = 0,
set_thisproc_cpubind = 0x200000000003ca38 <local+7656>,
get_thisproc_cpubind = 0x200000000003c868 <local+7192>,
set_thisthread_cpubind = 0x200000000003c878 <local+7208>,
get_thisthread_cpubind = 0x200000000003c888 <local+7224>,
set_proc_cpubind = 0x200000000003ca28 <local+7640>,
get_proc_cpubind = 0x200000000003c858 <local+7176>, set_thread_cpubind = 0,
get_thread_cpubind = 0,
get_thisproc_last_cpu_location = 0x200000000003c8a8 <local+7256>,
get_thisthread_last_cpu_location = 0x200000000003c8b8 <local+7272>,
get_proc_last_cpu_location = 0x200000000003ca58 <local+7688>,
set_thisproc_membind = 0, get_thisproc_membind = 0,
set_thisthread_membind = 0x200000000003c8e8 <local+7320>,
get_thisthread_membind = 0x200000000003c8f8 <local+7336>,
set_proc_membind = 0, get_proc_membind = 0,
set_area_membind = 0x200000000003c8c8 <local+7288>, get_area_membind = 0,
alloc = 0x200000000003c828 <local+7128>,
alloc_membind = 0x200000000003c8d8 <local+7304>,
free_membind = 0x200000000003c808 <local+7096>, support = {
discovery = 0x6000000000009070, cpubind = 0x6000000000009090,
membind = 0x60000000000090b0}, os_distances = {{nbobjs = 0, indexes = 0x0,
objs = 0x0, distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 2, indexes = 0x6000000000009700,
objs = 0x60000000000096c0, distances = 0x60000000000096e0}, {nbobjs = 0,
indexes = 0x0, objs = 0x0, distances = 0x0}, {nbobjs = 0, indexes = 0x0,
objs = 0x0, distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}}, backend_type = HWLOC_BACKEND_SYSFS, backend_params = {
sysfs = {root_path = 0x0, root_fd = -1}, synthetic = {arity = {0, 0,
4294967295, 0 <repeats 125 times>}, type = {
HWLOC_OBJ_SYSTEM <repeats 128 times>}, id = {0 <repeats 128 times>},
depth = {0 <repeats 128 times>}}}}
(gdb) print remaining
$3 = 0x6000000000009da0
(gdb) print *remaining
$4 = {ulongs_count = 1, ulongs_allocated = 8, ulongs = 0x600000000000b120,
infinite = 0}
(gdb) break hwloc_get_first_largest_obj_inside_cpuset
Breakpoint 1 at 0x4000000000003092: file helper.h, line 234.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/bin/hwloc-ps

Breakpoint 1, hwloc_get_first_largest_obj_inside_cpuset (
topology=0x6000000000008010, set=0x6000000000009da0) at helper.h:234
234 hwloc_obj_t obj = hwloc_get_root_obj(topology);
(gdb) next
236 if (!hwloc_bitmap_intersects(obj->cpuset, set))
(gdb) next
237 return NULL;
(gdb) break hwloc_bitmap_intersects
Breakpoint 2 at 0x2000000000076ee2: file cpuset.c, line 895.
(gdb) delete 1
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/bin/hwloc-ps

Breakpoint 2, hwloc_bitmap_intersects (set1=0x6000000000009890,
set2=0x6000000000009650) at cpuset.c:895
895 for(i=0; iulongs_count || iulongs_count; i++)
(gdb) print *set1
$5 = {ulongs_count = 1, ulongs_allocated = 8, ulongs = 0x60000000000098b0,
infinite = 0}
(gdb) print *set2
$6 = {ulongs_count = 1, ulongs_allocated = 8, ulongs = 0x6000000000009670,
infinite = 0}
(gdb) next
896 if ((HWLOC_SUBBITMAP_READULONG(set1, i) & HWLOC_SUBBITMAP_READULONG(set2, i)) != HWLOC_SUBBITMAP_ZERO)
(gdb) next
895 for(i=0; iulongs_count || iulongs_count; i++)
(gdb) next
899 if (set1->infinite && set2->infinite)
(gdb) next
902 return 0;
}}}

Add embedding capabilies

PLPA is "fully embeddable" in larger software projects, meaning:

  • Relevant m4 is available in a standalone file that is m4_include'able.
  • Specific m4 macros are exported in this file that can be called in a higher-level file (e.g., configure.ac).
  • When building in an "embedded" mode, only the library is made (as an LT convenience library); nothing is installed.
  • Prefix name shifting is available for all public symbols.

This capability needs to be brought to hwloc before it can be a wholesale replacement for PLPA.

I/O device support

Updated TODO-list:

  • Add iterators to find GPUs, NICs, ...
  • Update documentation
  • Find a pci lib for MacOSX (neither pciutils nor pciaccess seems available, and pciaccess doesn't expose the hierarchy of brdiges anyway)
  • Add some hwloc_insert_object_by_pcisomething, e.g. for a CUDA plugin which provides extended information to the object (e.g. number of streaming processors, etc.), which the core merges with the objects created by the libpci module.
    • provide functions like:

hwloc_obj_t hwloc_get_path_obj(hwloc_topology_t topo, const char *path);
hwloc_obj_t hwloc_get_fd_obj(hwloc_topology_t topo, int fd);

(the latter may return a network device or a disk device, depending on whether it's a socket or a file. Mmm and how about nfs-mounted files!)

heterogeneous topology support

what if we have a machine with different processors? for instance if one socket has a shared L3 and the other one doesn't?

  • sthibaul: should work fine already

support levels that do not cover the whole machine? (no L3 above the cores of the second socket above)

  • sthibaul: I do not understand

support object whose children are not in the exact next level? (socket pointing to cores instead of cache above)

  • sthibaul: should work fine already

need to be decided if we want to put GPUs as hwloc_obj_t, see ticket:5

dynamic cpusets

The dyncpuset branch might be mergeable now. The remaining possible optimization are orthogonal, not required before entering trunk. I'd like some feedback about the current implementation.

FWIW, below is the duration (in microseconds) of hwloc_topology_load() depending on the topology size/hierarchy. It's a synthetic topo, so load() does pretty much nothing apart from allocating objects and manipulating tons of cpusets to insert in the tree and compute the {allowed,complete,online}_{cpuset,nodeset}.

{{{
size trunk dyncpusets
synthetic proc:4 4 100 100
synthetic proc:32 32 745 413
synthetic node:4 die:4 core:4 proc:4 256 7750 5409
synthetic proc:256 256 41932 34773
synthetic mach:4 node:4 die:4 core:4 proc:4 1024 44215 49406
synthetic proc:1024 1024 1237049 1442945
synthetic m:4 n:4 d:4 cache:4 core:4 4 4096 X 700547
synthetic m:4 n:4 d:4 cache:4 cache:4 core:4 4 16384 X 11597185
}}}

In short, dyncpusets decrease memory waste, and do not increase CPU cycles.

1024 is the current static size in the trunk, that's why it's faster than dyncpusets. The dyncpusets branch works at least until 16384 in the above test but the lstopo time became too long for me :)

support get_area_membind on Linux

Do get_mempolicy (with MPOL_F_ADDR) on each virtual page in the area and combine the result. This should work because get_mempolicy seems to only look at VMA mempolicy (not at current task policy) in this case.

Requested by Alfredo Buttari.

Instruction Cache

We currently only detect Data and Unified caches. Some people want Instruction caches as well.

  • We can easily detect those and add a cache type attribute. But we we would add a new level to most exiting topologies (L1i above or below L1d).
  • We can make the detection depend on a new topology flag, so that the topology does not change much with next release
  • We can store both data and instruction sizes in the same object. Unfortunately, AMD Bulldozer has L1i and L1d with different sharing.

We'll have to take this new attribute into account in #41.

function to get the current cpu number

It can be useful to know where a thread is currently actually executing. Of course, the information may be outdated shortly after being returned, but that's still useful to monitoring applications.

USB tree?

I've came across the location of a CD-ROM drive:

/sys/devices/pci0000:00/0000:00:02.1/usb1/1-5/1-5:1.0/host6/target6:0:0/6:0:0:0/block:sr0

throughput distance matrix

Add a throughput matrix on the side of the existing latency one (basically the same behavior except that the grouping code looks at maximum instead of mininum values). set_distance() doesn't have a latency/throughput parameter. So we will look at the matrix to find out if it's throughput (diagonal is maximum) or latency (all other cases)

If we rework the distance API because of tickets #48, #67 and #68, it might be good to add a parameter specifying if the given matrice is latency/throughput/number-of-hops/...

XLC/AIX build warnings

Reported by Mathieu Faverge.

Summary of warnings:

{{{
"lstopo-text.c", line 292.12: 1506-077 (E) The wchar_t value 0x250c is not valid.

  • Just a warning that it won't work in a non-UTF-8 locale. We check that at runtime indeed so not a problem.
    }}}

Properly gather/support Linux Cgroup/Cpuset in remote topologies

http://www.open-mpi.org/community/lists/hwloc-devel/2010/12/1717.php

We need to:

  • gather /proc/mounts
  • gather the relevant cpuset/cgroup mount point in hwloc-gather-topology.sh (or warn if we didn't gather it)
  • make sure we properly read those in src/topology-linux.c when fsroot was changed
  • update the expected topologies of tests/linux/cpuset (might be wrong right now)
  • stop ignoring failures at the end of test-gather-topology.sh.in (or at least only ignore when Linux cpuset/cgroup are enabled)

Note that you need to mount a cpuset or cgroup/cpuset mount point to reproduce the problem.

Fix icc warnings

There's a truckload of warnings generated when icc 11.1.056 is used to compile hwloc. Most are in one of three types:

  • Variable/parameter is never referenced
  • Variable is set but never used
  • Mix enum with another type

The first two should probably be fixed; we may or may not care about fixing the third.

Make hwloc CLI commands all default to same index bias

After #25, make all hwloc CLI commands uniformly default to both output and accept as input either physical/OS or hwloc-logical index values.

Simple example: hwloc-bind should accept as input the index values output by the default output of lstopo.

See also the thread started here:

http://www.open-mpi.org/community/lists/hwloc-devel/2009/12/0456.php

support user-defined processor restriction

Use sched_getaffinity etc. to restrict discovery to the current cpumask.

  1. Add a configuration flag to limit the discovery to the current binding of the process. Could let the user choose between using the CPU or using the memory binding, and between using the current process or the current thread binding. But those variants are not very important and they can be implemented with (2) anyway. So just keep the important variant(s?). I'd say the current thread CPU binding (HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_BINDING)

(2) Add a configuration function to limit the discovery to a given cpuset. To get the current binding of the process, one has to run a first discovery, then use get_cpubind, then run a second one with the configuration. This is tedious, the API works this way.

hwloc_topology_restrict_to_cpuset(topology, cpuset);

(no need for a nodeset flavor, it won't be used often, and we have conversion functions anyway)

(3) Add a function to restrict a discovered topology to a given cpuset. This looks like within the scope of the functions we are thinking about for network use (extract part of a topology, merge topologies).

"./configure --enable-xml" doesn't fail if XML can't be built

Andreas Kupries noticed that if you configure hwloc with:

{{{
./configure --enable-xml ...
}}}

but configure fails to find XML support, it'll still continue and just give you an hwloc without XML support.

This violates the Law of Least Astonishment. If someone asks for --enable- and configure fails to find the Right Stuff for , then configure should abort.

--enable-xml is definitely broken in this regard; the other --enable- and --with- options should be checked for this kind of behavior as well.

add memory binding API

  • add hwloc_set_membind(topology, beginaddr, endaddr, HWLOC_MEMBIND_BIND/FIRSTTOUCH/INTERLEAVE, hwloc_cpuset_t)
    • size instead of endaddr?
    • if beginaddr=endaddr=NULL, setmempolicy?
    • reverse routine?
    • no level = empty mask, and we may want an easy alias for "whole machine"
  • allocation with a given policy
    • get Samuel's code from pm2's marcel_sysdep.c
  • apply a policy to a given area (not all OSes support that).

misc TODO

Tools

  • bind process on 2 cores "near" physical proc id 3 ?
    • hwloc-calc: add an option to request a cpuset containing of n close entries among the generated cpuset
  • internationalize the output of lstopo? object types and memory size units
  • hwloc-top, like lstopo, but keeps printing every 3s or so, and show bound threads as well as the used CPU%

Doc

  • automatically generate the pngs?
    • see doc/images/HACKING

Support

  • add info about supported instructions (sse, avx, ...)
  • add info about available execution units (fpu)
    • and say if they are shared between threads/cores
      • this could help improving the current ambiguity between two real cores, one hyperthreaded core, and AMD dual-fake-core compute units
  • reduce distance matrices so that parent objects get distances between them as well (just like we do when computing group distances after inserting groups)
  • parallelize the discovery ? :)

I/O

  • CCI interoperability to get cci_device and/or cci_device->name locality
    • use cci_device->pci.{domain,bus,dev,func} to retrieve the PCI device
    • wait for the CCI API to be stable
  • Add a ofed plugin to gather ofed device info without relying on Linux sysfs
    • Not sure whether ofed works the same on other OS anyway

Backends and Ports

  • Try to make the distance grouping code a separate component ?
  • QNX
    • _syspage_ptr() SYSPAGE_ENTRY(entry)
    • ThreadCtl/Thread_ctl_r(_NTO_TCTL_RUNMASK)
  • BSD
    • sys/sched.h: sched_bind/sched_unbind, but that's in-kernel only for now.
  • AIX
  • Cray Catamount?

distances vs multinode

Some user want distances in multinode topologies.

  1. It's currently disabled because we use cpusets/nodesets to find/create a common ancestor where the matrix is attached. Multinode objects have no cpusets/nodesets.

One way to solve this would be to add a hostset or machineset bitmap to each object to identify the hosts/machines it corresponds too.

  1. We'll need a better way to identified objects in multinode topologies. Most distance insertion routines currently use physical indexes, but those are meaningless in multinode systems (and even not always meaningfull in single-node systems because core ids are not always unique).

See also ticket #67 when people don't care about grouping.

distances

INTRO:
Some people want a fake/virtual/topological distance between random pairs of objects, from the tree point of view, not from the physical point of view. The distance between A and B is basically the depth difference between the highest of A and B and their lowest common ancestor. This is probably too simple to deserve some discussion here. At most, we'll add a new helper.

DONE:
Some people want to know the actual physical distance between objects (especially numa nodes). We already had the full matrices of distances between all pairs of objects of the same level (when given by the BIOS/OS). In the distances branch, we are now also exporting this "latency" matrix (after normalization to floats with 1.0 on the diagonal).

If we ever have the distances between a subset of objects, we store the matrix in the common ancestor instead of the root. No problem there, we can group objects and report distances the same.

Distance matrices may also be given by the user between init and load (as unsigned right now, maybe use float there too?).

TODO:
Some people may want the topological graph connecting objects, which means we have a number of hops (ou a route?) between peers instead of a latency. It could be another matrix (unsigned, 0 on the diagonal).

NUMA topology could also be exported as a series of proximity domains (like solaris's lgrps). ** TODO explain what this means **

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.