Giter VIP home page Giter VIP logo

compatibility-layer's People

Contributors

bedroge avatar boegel avatar ocaisa avatar omula avatar peterstol avatar terjekv avatar truib avatar trz42 avatar victorusu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

compatibility-layer's Issues

lib vs lib64 in Prefix installation differs between architectures

The /lib and /lib64 directories in the compat layer are handled differently on x86_64 vs aarch64:

$ ls -ld /cvmfs/pilot.eessi-hpc.org/2020.10/compat/{aarch64,x86_64}/{lib,lib64}                                 
drwxr-xr-x 2 ec2-user ec2-user 4096 Nov  3 20:35 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/lib
drwxr-xr-x 2 ec2-user ec2-user 4096 Nov  3 20:39 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/lib64
lrwxrwxrwx 1 ec2-user ec2-user    5 Oct 30 00:32 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/x86_64/lib -> lib64
drwxr-xr-x 2 ec2-user ec2-user 4096 Nov  3 15:20 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/x86_64/lib64

Likewise for /usr/{lib,lib64}.

This is a problem, because some tools (like CMake, cfr. easybuilders/easybuild-easyblocks#2248) only looks in lib, and then fail to find libraries located in lib64 if lib is not a symlink to it (like in compat/aarch64)...

Relevant links from Gentoo Prefix documentation:

Add Gentoo Prefix vs Nix comparison in documentation

Currently there is no documentation on why the project has chosen Gentoo Prefix over Nix.
This would be useful information to retain.

A few notes on the topic from a Slack conversation with Bart Oldeman at Compute Canada:

  • Nix does not provide a good development platform when used via environment variables. A better option would be to use nix-shell, but that would require users to learn how to use it. For example, see NixOS/nixpkgs#44144.
  • The lower number of symlinks and the ability to use $EPREFIX/etc/ld.so.cache with Gentoo Prefix might improve overall performance thanks to the lower overhead for the dynamic linker/loader $EPREFIX/lib/ld-linux-x86-64.so.2 to locate the required dynamic libraries.

MacOS prefix builds aren't guaranteed portable between MacOS (major) versions...

A fairly trivial test:

terjekv@minbar:~/projects/eessi$ ls -lad ~/Gentoo
lrwxr-xr-x 1 terjekv 11 des  4 18:08 /Users/terjekv/Gentoo -> Gentoo-11.0/
terjekv@minbar:~/projects/eessi$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.7
BuildVersion:	19H2
terjekv@minbar:~/projects/eessi$ git clone [email protected]:EESSI/compatibility-layer.git
Cloning into 'compatibility-layer'...
dyld: lazy symbol binding failed: Symbol not found: _strtonum
  Referenced from: /Users/terjekv/Gentoo/usr/bin/ssh (which was built for Mac OS X 11.0)
  Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: _strtonum
  Referenced from: /Users/terjekv/Gentoo/usr/bin/ssh (which was built for Mac OS X 11.0)
  Expected in: /usr/lib/libSystem.B.dylib

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

app-admin/Lmod misses to include dev-lang/tcl and dev-lang/lua

If I start with a fresh gentoo-prefix install and use the EESSI repository to install Lmod, I get

$ emerge app-admin/Lmod
Calculating dependencies... done!
>>> Verifying ebuild manifests
>>> Emerging (1 of 5) dev-lua/bit32-5.3.0::eessi
 * Fetching files in the background.
 * To view fetch progress, run in another terminal:
 * tail -f /opt/gentoo-prefix/var/log/emerge-fetch.log
 * bitlib-5.3.0.tar.gz BLAKE2B SHA512 size ;-) ...                                                                                 [ ok ]
>>> Unpacking source...
>>> Unpacking bitlib-5.3.0.tar.gz to /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work
>>> Source unpacked in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work
>>> Preparing source in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0 ...
>>> Source prepared.
>>> Configuring source in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0 ...
>>> Source configured.
>>> Compiling source in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0 ...
lbitlib.c:10:10: fatal error: lua.h: No such file or directory
   10 | #include "lua.h"
      |          ^~~~~~~
compilation terminated.
 * ERROR: dev-lua/bit32-5.3.0::eessi failed (compile phase):
 *   (no error message)
 * 
 * Call stack:
 *     ebuild.sh, line 125:  Called src_compile
 *   environment, line 809:  Called die
 * The specific snippet of code:
 *       $(tc-getCC) ${CFLAGS} -fPIC -Ic-api -c -o ${MY_PN}.o ${MY_PN}.c || die;
 * 
 * If you need support, post the output of `emerge --info '=dev-lua/bit32-5.3.0::eessi'`,
 * the complete build log and the output of `emerge -pqv '=dev-lua/bit32-5.3.0::eessi'`.
 * The complete build log is located at '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/temp/build.log'.
 * The ebuild environment file is located at '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/temp/environment'.
 * Working directory: '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0'
 * S: '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0

So, I needed to manually first run

$ emerge lua tcl

This is also reflected on the ansible roles being developed (

)

add PyYAML to compat layer

EasyBuild requires PyYAML for the easystack file support, so it would be nice to have this installed in the compat layer.

/etc/passwd userid issue

Gentoo user prompt is wrong, Name is not from the local /etc/passwd.

dell@Innopad-01:~$ /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020/startprefix
Entering Gentoo Prefix /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020

henkjan@Innopad-01:~$
User is coming from /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020/etc/passwd.

I have no name!@Innopad-01:/mnt/c/Users/Innopad$

User is not in the /etc/passwd file coming from /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020/etc/passwd.

Can we use the local /etc/passwd file?

Symlink $EPREFIX/etc/hosts to host

I noticed that we don't do this at the moment, which means that tools will now pick up the hosts file from the Prefix. I just did some tests, and this could lead to strange issues, e.g.:

$ /cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64/startprefix 
Entering Gentoo Prefix /cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64

$ hostname
dh-node12

$ which curl
/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64/usr/bin/curl

$ curl http://dh-node12:9100            
curl: (6) Couldn't resolve host 'dh-node12'

$ /usr/bin/curl http://dh-node12:9100
<html>
			<head><title>Node Exporter</title></head>
			<body>
			<h1>Node Exporter</h1>
			<p><a href="/metrics">Metrics</a></p>
			</body>
			</html>

$ strace curl http://dh-node12:9100 2>&1 | grep hosts
openat(AT_FDCWD, "/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64/etc/hosts", O_RDONLY|O_CLOEXEC) = 5

So I think we should change this by uncommenting:
https://github.com/EESSI/compatibility-layer/blob/main/ansible/playbooks/roles/compatibility_layer/defaults/main.yml#L72

Support for special interconnects

Slightly related to #47, but we need extra packages in the compatibility layer in order to make UCX support Infiniband (and others). All default fabric-related Gentoo packages can be found here: https://packages.gentoo.org/categories/sys-fabric
Which ones do we want/need?

With libibverbs and librdmacm installed, and passing --with-rdmacm=${EPREFIX}/usr to the configure of UCX, I get this:

checking cuda.h usability... no
checking cuda.h presence... no
checking for cuda.h... no
checking cuda_runtime.h usability... no
checking cuda_runtime.h presence... no
checking for cuda_runtime.h... no
configure: WARNING: CUDA not found
configure: ROCm path was not specified. Guessing ...
checking hsa.h usability... no
checking hsa.h presence... no
checking for hsa.h... no
configure: WARNING: ROCm not found
checking hip_runtime.h usability... no
checking hip_runtime.h presence... no
checking for hip_runtime.h... no
configure: WARNING: HIP Runtime not found
checking gdrapi.h usability... no
checking gdrapi.h presence... no
checking for gdrapi.h... no
configure: WARNING: GDR_COPY not found
configure: Compiling with verbs support from /usr
checking infiniband/verbs.h usability... yes
checking infiniband/verbs.h presence... yes
checking for infiniband/verbs.h... yes
checking for ibv_get_device_list in -libverbs... yes
checking whether ibv_wc_status_str is declared... yes
checking whether ibv_event_type_str is declared... yes
checking whether ibv_query_gid is declared... yes
checking whether ibv_get_device_name is declared... yes
checking whether ibv_create_srq is declared... yes
checking whether ibv_get_async_event is declared... yes
checking infiniband/verbs_exp.h usability... no
checking infiniband/verbs_exp.h presence... no
checking for infiniband/verbs_exp.h... no
checking for struct ibv_exp_device_attr.exp_device_cap_flags... no
checking for struct ibv_exp_device_attr.odp_caps... no
checking for struct ibv_exp_device_attr.odp_caps.per_transport_caps.dc_odp_caps... no
checking for struct ibv_exp_device_attr.odp_mr_max_size... no
checking for struct ibv_exp_qp_init_attr.max_inl_recv... no
checking for struct ibv_async_event.element.dct... no
checking whether IBV_CREATE_CQ_ATTR_IGNORE_OVERRUN is declared... no
checking whether IBV_EXP_CQ_IGNORE_OVERRUN is declared... no
configure: Checking for legacy bare-metal support
checking infiniband/mlx5_hw.h usability... no
checking infiniband/mlx5_hw.h presence... no
checking for infiniband/mlx5_hw.h... no
configure: Checking for DV bare-metal support
checking for mlx5dv_query_device in -lmlx5-rdmav2... no
checking for mlx5dv_query_device in -lmlx5... no
checking whether ibv_alloc_td is declared... no
checking whether MLX5DV_CONTEXT_FLAGS_DEVX is declared... no
checking whether IBV_LINK_LAYER_INFINIBAND is declared... yes
checking whether IBV_LINK_LAYER_ETHERNET is declared... yes
checking whether IBV_EVENT_GID_CHANGE is declared... yes
checking whether ibv_create_qp_ex is declared... yes
checking whether ibv_create_srq_ex is declared... yes
checking whether ibv_query_device_ex is declared... no
checking whether IBV_EXP_ACCESS_ALLOCATE_MR is declared... no
checking whether IBV_EXP_ACCESS_ON_DEMAND is declared... no
checking whether IBV_EXP_DEVICE_MR_ALLOCATE is declared... no
checking whether IBV_EXP_WR_NOP is declared... no
checking whether IBV_EXP_DEVICE_DC_TRANSPORT is declared... no
checking whether IBV_EXP_ATOMIC_HCA_REPLY_BE is declared... no
checking whether IBV_EXP_PREFETCH_WRITE_ACCESS is declared... no
checking whether IBV_EXP_QP_OOO_RW_DATA_PLACEMENT is declared... no
checking whether IBV_EXP_DCT_OOO_RW_DATA_PLACEMENT is declared... no
checking whether IBV_EXP_CQ_MODERATION is declared... no
checking whether IBV_EXP_DEVICE_ATTR_PCI_ATOMIC_CAPS is declared... no
checking whether ibv_exp_reg_mr is declared... no
checking whether ibv_exp_create_qp is declared... no
checking whether ibv_exp_prefetch_mr is declared... no
checking whether ibv_exp_create_srq is declared... no
checking whether ibv_exp_setenv is declared... no
checking whether ibv_exp_query_gid_attr is declared... no
checking whether ibv_exp_query_device is declared... no
checking whether ibv_exp_post_send is declared... no
checking whether IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP is declared... no
checking whether IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD is declared... no
checking whether IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG is declared... no
checking whether IBV_EXP_SEND_EXT_ATOMIC_INLINE is declared... no
checking whether IBV_EXP_DEVICE_ATTR_RESERVED_2 is declared... no
checking whether IBV_EXP_MR_INDIRECT_KLMS is declared... no
checking whether IBV_EXP_QP_CREATE_UMR is declared... no
checking for struct ibv_exp_qp_init_attr.umr_caps... no
checking whether IBV_EXP_MR_FIXED_BUFFER_SIZE is declared... no
configure: WARNING: Compiling without extended atomics support
checking for struct ibv_exp_masked_atomic_params.masked_log_atomic_arg_sizes_network_endianness... no
checking whether IBV_EXP_ODP_SUPPORT_IMPLICIT is declared... no
checking whether IBV_EXP_ACCESS_ON_DEMAND is declared... (cached) no
checking whether IBV_ACCESS_ON_DEMAND is declared... no
checking whether ibv_exp_prefetch_mr is declared... (cached) no
checking whether ibv_advise_mr is declared... no
checking for struct mlx5_wqe_av.base... no
checking for struct mlx5_grh_av.rmac... no
checking for struct mlx5_cqe64.ib_stride_index... no
checking whether IBV_EXP_QPT_DC_INI is declared... no
checking infiniband/tm_types.h usability... no
checking infiniband/tm_types.h presence... no
checking for infiniband/tm_types.h... no
checking for struct ibv_exp_tmh.tag... no
checking for struct ibv_tmh.tag... no
checking whether ibv_exp_alloc_dm is declared... no
checking whether ibv_alloc_dm is declared... no
checking whether ibv_cmd_modify_qp is declared... yes
configure: Checking OFED valgrind libs /usr/lib/mlnx_ofed/valgrind
checking for ib_cm_send_req in -libcm... no
configure: WARNING: CM support not found, skipping
checking /cvmfs/pilot.eessi-hpc.org/2020.09/compat/x86_64/usr/include/rdma/rdma_cma.h usability... yes
checking /cvmfs/pilot.eessi-hpc.org/2020.09/compat/x86_64/usr/include/rdma/rdma_cma.h presence... yes
checking for /cvmfs/pilot.eessi-hpc.org/2020.09/compat/x86_64/usr/include/rdma/rdma_cma.h... yes
checking for rdma_create_id in -lrdmacm... yes
checking whether rdma_establish is declared... no
checking whether rdma_init_qp_attr is declared... no
checking sys/uio.h usability... yes
checking sys/uio.h presence... yes
checking for sys/uio.h... yes
checking for process_vm_readv... yes
configure: KNEM path was not found, guessing ...
Package knem was not found in the pkg-config search path.
Perhaps you should add the directory containing `knem.pc'
to the PKG_CONFIG_PATH environment variable
Package 'knem', required by 'virtual:world', not found
checking whether KNEM_CMD_GET_INFO is declared... no
configure: WARNING: KNEM requested but required file (knem_io.h) could not be found
configure: XPMEM - failed to open the requested location (guess), guessing ...
checking cray-ugni... no
checking compiler flag -fno-exceptions... yes
checking compiler flag -fno-rtti... yes
checking compiler flag --no_exceptions... no
checking compiler flag -fno-tree-vectorize... yes
checking compiler flag --diag_suppress 236... no
checking that generated files are newer than configure... done

configure: =========================================================
configure: UCX build configuration:
configure:       Build prefix:   /home/bob/ucx/inst
configure: Preprocessor flags:   -DCPU_FLAGS="|avx" -I${abs_top_srcdir}/src -I${abs_top_builddir} -I${abs_top_builddir}/src
configure:         C compiler:   x86_64-pc-linux-gnu-gcc -O3 -g -Wall -Werror -mavx
configure:       C++ compiler:   x86_64-pc-linux-gnu-g++ -O3 -g -Wall -Werror -mavx
configure:       Multi-thread:   enabled
configure:          MPI tests:   disabled
configure:      Devel headers:   no
configure:           Bindings:   < >
configure:        UCT modules:   < ib rdmacm cma >
configure:       CUDA modules:   < >
configure:       ROCM modules:   < >
configure:         IB modules:   < >
configure:        UCM modules:   < >
configure:       Perf modules:   < >
configure: =========================================================

Security updates

How do we apply security updates to the compatibility layer in between releases and how often?
What is the impact on our design, which would like to keep all versions as fixedas possible to prevent issues higher up the stack?

Gentoo provides glsa-check (https://wiki.gentoo.org/wiki/Security_Handbook/Staying_up-to-date), which works well, but requires a sync of the repo.

emerge --sync
glsa-check -t all
This system is affected by the following GLSAs:
202009-01

Or more verbose..

glsa-check -p $(glsa-check -t all)

Apply security updates glsa-check -f $(glsa-check -t all)

The workflow might be added to our Ansible scripts

Unmasking packages for aarch64 support.

Lmod and archspec are not supported on aarch64 by default.

emerge --ask --autounmask=y --autounmask-write =app-admin/Lmod-8.3.17
emerge --ask --autounmask=y --autounmask-write =sys-apps/archspec-0.1.1
dispatch-conf

cache CernVM-FS package in GA workflow checking on pilot repo

It seems like the CernVM-FS package repo doesn't like being hammered by GitHub Actions:

Err:24 https://cvmrepo.web.cern.ch/cvmrepo/apt focal-prod InRelease
  403  Forbidden [IP: 188.185.90.87 443]
Reading package lists...
E: Failed to fetch http://cvmrepo.web.cern.ch/cvmrepo/apt/dists/focal-prod/InRelease  403  Forbidden [IP: 188.185.90.87 443]
E: The repository 'http://cvmrepo.web.cern.ch/cvmrepo/apt focal-prod InRelease' is not signed.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package cvmfs

So we need to try and avoid that the packages are downloaded again and again, there's some support for caching in GitHub Actions that may help here...

checklist for 2021.06 pilot

  • define 2021.06 set (PRs: EESSI/gentoo-overlay#40 + EESSI/gentoo-overlay#41)
  • update Prefix bootstrap snapshot + script (PR: #109)
  • use new build container (PR: #98)
  • switch to git rather than rsync for emerge --sync (issue: #100, PR: #106)
  • fix timezone issue (#93)
  • tweak trusted glibc dirs (cfr. EESSI/software-layer#108)
  • add test step to playbook (PR: #108)
  • generate env file for glibc (issue: #104, PR: #107)
  • build compat layer for:
    • aarch64
    • ppc64le
    • x86_64
  • update CI (.github/workflows/pilot_repo.yml )

CUDA support

I was experimenting with CUDA support within EESSI and ran into the issue that, when using CUDA compiled with the EESSI stack, the CUDA libraries from the host are not seen by the executables created by nvcc. This is because it looks for the CUDA driver libraries in the prefix, where they do not exist. There are a few viable solutions:

Locale failure warnings

I got the following warnings when starting R:

During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 

Probably related to this:
https://wiki.gentoo.org/wiki/Project:Prefix/FAQ#Add_an_en_US.UTF-8_locale

Prefix installation method

Here we can discuss how we are going to install Prefix. Do we, for instance, copy the bootstrap script to our repo and customize it to our needs, or do we use some other mechanism to install it?

/etc/localtime is missing in 2021.03 compat layer

This causes timezone confusion for some software, see EESSI/software-layer#79 .

We do have /etc/localtime in the 2020.12 compat layer:

cat /cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64/etc/localtime
TZif2-00TZif2-00
<-00>0

We should make this a symlink into the host, to get correct time zone information?

ln -s /etc/localtime /cvmfs/pilot.eessi-hpc.org/2021.03/compat/linux/x86_64/etc/localtime

Lmod installation broken due to Lua+Prefix issues

In #98 I noticed that the action for testing the Prefix installation playbook was failing, and this is (again) due to Lua stuff not installing correctly inside Gentoo Prefix.

It looks like they added a fix for luaposix, which requires us to revert a change in our eclass/lua-utils.eclass (which was taken from here and: https://bugs.gentoo.org/768909):

			LUA_INCLUDE_DIR)
				local val

				val=$($(tc-getPKG_CONFIG) --variable includedir ${impl}) || die
                                # ADD THE FOLLOWING LINE:
                                val="${val#${ESYSROOT#${SYSROOT}}}"

				export LUA_INCLUDE_DIR=${val}
				debug-print "${FUNCNAME}: LUA_INCLUDE_DIR = ${LUA_INCLUDE_DIR}"

But then dev-lua/lpeg breaks, for which I opened a PR here:
gentoo/gentoo#20576

So if this gets merged, we can add the change to the eclass file, and remove our custom luaposix ebuild.

Override eclass files

In case we need to modify/patch an eclass from the default gentoo overlay, for instance like the lua one from EESSI/gentoo-overlay#43, we either need to copy it to the gentoo directory ourselves (which may break/be replaced after a sync?) or we can use the eclass-overrides setting (see https://wiki.gentoo.org/wiki//etc/portage/repos.conf). The latter can be done by adding the following snippet to an existing/new file in ${EPREFIX}/etc/portage/repos.conf/:

[DEFAULT]
eclass-overrides = eessi

Make it possible to run install.yml locally without root permissions

Some steps now require root permissions, but in principle it should be possible to run this playbook as a regular user, e.g. by mounting some local folder as /cvmfs in the container. We should try to add some clever checks or use tags to skip certain steps to make this possible.

running Ansible playbook after bootstrapping Prefix fails on RHEL8/aarch64 AWS instance

After bootstrapping Prefix via the container image (singularity run --bind $HOME/cvmfs:/cvmfs docker://eessi/bootstrap-prefix:centos8-$(uname -m)), I tried running the Ansible playbook to take care of the remaining tasks to set up a new version of the compat layer.

Here's what I did:

  • symlink /cvmfs to $HOME/cvmfs on the build host where I performed the Prefix bootstrap
  • run the Ansible playbook from another host, as follows:
    • ensure passwordless ssh to build host works (by adding SSH key to SSH agent via ssh-add)
    • create hosts file:
      $ cd compatibility-layer/ansible/playbooks
      $ cat hosts
      [cvmfsstratum0servers]
      <IP_OF_BUILD_HOST>
      
      • run playbook:
      ansible-playbook -i hosts -K install.yml -e gentoo_prefix_path=/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64
      

The playbook failed when running emerge gentoolkit (see details below).

TASK [compatibility_layer : Install equery command (dependency for the portage module)] ***************************************************
fatal: [18.223.116.164]: FAILED! => {"changed": true, "cmd": ["emerge", "gentoolkit"], "delta": "0:00:01.930018", "end": "2020-11-03 18:41:41.190089", "msg": "non-zero return code", "rc": 1, "start": "2020-11-03 18:41:39.260071", "stderr": "portage: 'portage' user or group missing.\n         For the defaults, line 1 goes into passwd, and 2 into group.\n         portage:x:250:250:portage:/var/tmp/portage:/bin/false\n         portage::250:portage\n*** WARNING ***  For security reasons, only system administrators should be\n*** WARNING ***  allowed in the portage group.  Untrusted users or processes\n*** WARNING ***  can potentially exploit the portage group for attacks such as\n*** WARNING ***  local privilege escalation.\n\nmount: /proc: must be superuser to use mount.\nUnable to mark /proc slave: 32", "stderr_lines": ["portage: 'portage' user or group missing.", "         For the defaults, line 1 goes into passwd, and 2 into group.", "         portage:x:250:250:portage:/var/tmp/portage:/bin/false", "         portage::250:portage", "*** WARNING ***  For security reasons, only system administrators should be", "*** WARNING ***  allowed in the portage group.  Untrusted users or processes", "*** WARNING ***  can potentially exploit the portage group for attacks such as", "*** WARNING ***  local privilege escalation.", "", "mount: /proc: must be superuser to use mount.", "Unable to mark /proc slave: 32"], "stdout": "\nPerforming Global Updates\n(Could take a couple of minutes if you have a lot of binary packages.)\n  .='update pass'  *='binary update'  #='/var/db update'  @='/var/db move'\n  s='/var/db SLOT move'  %='binary move'  S='binary SLOT move'\n  p='update /etc/portage/package.*'\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2015.....................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2015..........................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2016............................................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2016.........................................................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2016.......................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2016.............................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2017..............................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2017....\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2017........................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2017.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2018.....................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2018..................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2018...\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2018.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2019.................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2019...\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2019.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2019.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2020............\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2020............\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2020....................................................................................................................................................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2020...\n\n\nCalculating dependencies  \n * IMPORTANT: 4 news items need reading for repository 'gentoo'.\n * Use eselect news read to view new items.\n\n... done!\n\n>>> Verifying ebuild manifests\n\n>>> Emerging (1 of 1) app-portage/gentoolkit-0.5.0-r2::gentoo\n * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR\n * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-\n * hpc.org/2020.10/compat/aarch64/var/tmp/portage/app-\n * portage/gentoolkit-0.5.0-r2'\n\n>>> Failed to emerge app-portage/gentoolkit-0.5.0-r2\n * Messages for package app-portage/gentoolkit-0.5.0-r2:", "stdout_lines": ["", "Performing Global Updates", "(Could take a couple of minutes if you have a lot of binary packages.)", "  .='update pass'  *='binary update'  #='/var/db update'  @='/var/db move'", "  s='/var/db SLOT move'  %='binary move'  S='binary SLOT move'", "  p='update /etc/portage/package.*'", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2015.....................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2015..........................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2016............................................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2016.........................................................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2016.......................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2016.............................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2017..............................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2017....", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2017........................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2017.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2018.....................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2018..................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2018...", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2018.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2019.................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2019...", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2019.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2019.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2020............", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2020............", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2020....................................................................................................................................................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2020...", "", "", "Calculating dependencies  ", " * IMPORTANT: 4 news items need reading for repository 'gentoo'.", " * Use eselect news read to view new items.", "", "... done!", "", ">>> Verifying ebuild manifests", "", ">>> Emerging (1 of 1) app-portage/gentoolkit-0.5.0-r2::gentoo", " * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR", " * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-", " * hpc.org/2020.10/compat/aarch64/var/tmp/portage/app-", " * portage/gentoolkit-0.5.0-r2'", "", ">>> Failed to emerge app-portage/gentoolkit-0.5.0-r2", " * Messages for package app-portage/gentoolkit-0.5.0-r2:"]}
TASK [compatibility_layer : Abort transaction] ********************************************************************************************
skipping: [18.223.116.164]
RUNNING HANDLER [compatibility_layer : Generate locales] **********************************************************************************
changed: [18.223.116.164]
PLAY RECAP ********************************************************************************************************************************
18.223.116.164             : ok=10   changed=4    unreachable=0    failed=0    skipped=3    rescued=1    ignored=0  

Running emerge gentoolkit in the startprefix environment manually works fine.

When I re-run the playbook after doing that, it fails when running emerge eselect-repository (see details below).

TASK [compatibility_layer : Install eselect-repository] ***********************************************************************************
fatal: [18.223.116.164]: FAILED! => {"changed": false, "cmd": ["/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/usr/bin/emerge", "--noreplace", "--ask=n", "eselect-repository"], "msg": "Packages not installed.", "rc": 1, "stderr": "portage: 'portage' user or group missing.\n         For the defaults, line 1 goes into passwd, and 2 into group.\n         portage:x:250:250:portage:/var/tmp/portage:/bin/false\n         portage::250:portage\n*** WARNING ***  For security reasons, only system administrators should be\n*** WARNING ***  allowed in the portage group.  Untrusted users or processes\n*** WARNING ***  can potentially exploit the portage group for attacks such as\n*** WARNING ***  local privilege escalation.\n\nmount: /proc: must be superuser to use mount.\nUnable to mark /proc slave: 32\n", "stderr_lines": ["portage: 'portage' user or group missing.", "         For the defaults, line 1 goes into passwd, and 2 into group.", "         portage:x:250:250:portage:/var/tmp/portage:/bin/false", "         portage::250:portage", "*** WARNING ***  For security reasons, only system administrators should be", "*** WARNING ***  allowed in the portage group.  Untrusted users or processes", "*** WARNING ***  can potentially exploit the portage group for attacks such as", "*** WARNING ***  local privilege escalation.", "", "mount: /proc: must be superuser to use mount.", "Unable to mark /proc slave: 32"], "stdout": "Calculating dependencies  \n * IMPORTANT: 4 news items need reading for repository 'gentoo'.\n * Use eselect news read to view new items.\n\n... done!\n\n>>> Verifying ebuild manifests\n\n>>> Emerging (1 of 8) dev-python/certifi-10001-r1::gentoo\n * Fetching files in the background.\n * To view fetch progress, run in another terminal:\n * tail -f /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/log/emerge-fetch.log\n * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR\n * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-\n * hpc.org/2020.10/compat/aarch64/var/tmp/portage/dev-\n * python/certifi-10001-r1'\n\n>>> Failed to emerge dev-python/certifi-10001-r1\n * Messages for package dev-python/certifi-10001-r1:\n\n\n", "stdout_lines": ["Calculating dependencies  ", " * IMPORTANT: 4 news items need reading for repository 'gentoo'.", " * Use eselect news read to view new items.", "", "... done!", "", ">>> Verifying ebuild manifests", "", ">>> Emerging (1 of 8) dev-python/certifi-10001-r1::gentoo", " * Fetching files in the background.", " * To view fetch progress, run in another terminal:", " * tail -f /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/log/emerge-fetch.log", " * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR", " * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-", " * hpc.org/2020.10/compat/aarch64/var/tmp/portage/dev-", " * python/certifi-10001-r1'", "", ">>> Failed to emerge dev-python/certifi-10001-r1", " * Messages for package dev-python/certifi-10001-r1:", "", ""]}

Running emerge eselect-repository works fine, so does emerge dev-vcs/git (which is the next step that fails in the playbook).

Re-running the playbook after running those 3 emerge commands that were failing manually makes it fail on emerge @2020.10 (which also fails manually due to another problem, see EESSI/gentoo-overlay#24).

Ansible version info (installed via pip3 install ansible):

$ ansible --version
ansible 2.10.3
  config file = None
  configured module search path = ['/home/ec2-user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/linuxbrew/.linuxbrew/opt/[email protected]/lib/python3.8/site-packages/ansible
  executable location = /home/linuxbrew/.linuxbrew/bin/ansible
  python version = 3.8.6 (default, Oct 10 2020, 07:54:55) [GCC 5.4.0 20160609]

Failed to emerge dev-lua/luafilesystem-1.7.0.2

double prefix error installing

* Messages for package dev-lua/luafilesystem-1.7.0.2:

 * ERROR: dev-lua/luafilesystem-1.7.0.2::gentoo failed:
 *   Aborting due to QA concerns: double prefix files installed
>>> Install dev-lua/luafilesystem-1.7.0.2 into /scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image
make -j16 DESTDIR=/scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image install
mkdir -p "/scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image/scratch/gentoo//scratch/gentoo/usr/lib64/lua/5.1"
cp src/lfs.so "/scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image/scratch/gentoo//scratch/gentoo/usr/lib64/lua/5.1"

Location to store a copy of gentoo snapshot for bootstrapping the prefix

The prefix installation script uses by default the latest portage and gentoo snapshots from http://distfiles.gentoo.org/snapshots/.

We would like to use a specific snapshot to install the prefix for reproducibility. As the snapshots are only kept for a limited timeframe on gentoo.org an alternative location is required. Due to space concerns, our github repo is not considered suitable. The location will be referenced by setting SNAPSHOT_URL in the prefix bootstrap script.

Intermediate aarch64 build notes

These are the notes during a novice installing aarch64 on a test machine.

Aarch64 compatibility layer compile

ssh some-machine.somewhe.re
[...]
Failed to set locale, defaulting to C.UTF-8
[...]

The machine is

uname -a
Linux ip-172-31-42-108.eu-west-1.compute.internal 4.18.0-240.15.1.el8_3.aarch64 #1 SMP Wed Feb 3 03:16:05 EST 2021 aarch64 aarch64 aarch64 GNU/Linux

cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.3"

Install attempt on machine itself

Seems to be very worksome...

  • screen is not installed, could help for long running installs from localhost
  • git is not installed required to clone repo locally
sudo yum install git

mkdir src; cd src
git clone https://github.com/EESSI/compatibility-layer
  • ansible needs to be installed but fails when running sudo yum install ansible
  • โ€ฆ

Remote install

On my local machine, fetch repository.

git clone https://github.com/EESSI/compatibility-layer
cd compatibility-layer/ansible/playbooks
cat README.md
vim hosts

Credentials as provided, key added to ssh-agent.

[cvmfsstratum0servers]
ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com ansible_ssh_user=ec2-user eessi_host_arch=aarch64 eessi_host_os=linux

Error /usr/bin/python not found

Error

fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "module_stderr": "Shared connection to ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python: No such file or directory\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 127}

Resolved by

sudo yum install python38

and adding ansible_python_interpreter=/usr/bin/python3 to hosts

Ansible I still had installed in a conda environment, so I reused that one.

ansible-playbook -i hosts -b install.yml

'ansible_os_family' is undefined

Error

TASK [compatibility_layer : Fail if host OS is not supported]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"msg": "The conditional check 'not(ansible_os_family == \"RedHat\" and ansible_distribution_major_version is version(\"8\", \"==\"))' failed. The error was: error while evaluating conditional (not(ansible_os_family == \"RedHat\" and ansible_distribution_major_version is version(\"8\", \"==\"))): 'ansible_os_family' is undefined\n\nThe error appears to have been in '/home/.../eessi/compatibility-layer/ansible/playbooks/roles/compatibility_layer/tasks/install_prefix.yml': line 4, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Fail if host OS is not supported\n  ^ here\n"}

when debug printing all {{ ansible_facts }} no distribution defining variables are defined, so they are also not injected into global ansible_ namespace.

Resolved by manual verification in /etc/os-release and commenting out the OS check in the playbook.

Could not detect which major revision of yum is in use

Error

TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": ["Could not detect which major revision of yum is in use, which is required to determine module backend.", "You can manually specify use_backend to tell the module whether to use the yum (yum3) or dnf (yum4) backend})"]}

Manually checked

yum --version
Failed to set locale, defaulting to C.UTF-8
4.2.23
Installed: dnf-0:4.2.23-4.el8.noarch at Tue Feb  9 15:46:00 2021
--- a/ansible/playbooks/roles/compatibility_layer/tasks/install_prefix.yml
+++ b/ansible/playbooks/roles/compatibility_layer/tasks/install_prefix.yml
- name: "Install EPEL"
 yum:
     - https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
   disable_gpg_check: yes
   state: present
+    use_backend: yum4
 tags:
   - build_prefix

python3-dnf package required

Error

TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": "Could not import the dnf python module. Please install `python3-dnf` package.", "results": []}

Notice however, that

sudo yum install python3-dnf
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 2:29:36 ago on Wed Feb 24 14:07:03 2021.
Package python3-dnf-4.2.23-4.el8.noarch is already installed.
Dependencies resolved.
Nothing to do.
Complete!

However the package is still not found. NOT SOLVED!

update ansible 2.7 -> 3

Maybe old Ansible does not detect packages right?

python3-dnf package required

Error

TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "cmd": "dnf install -y python3-dnf", "msg": "Could not import the dnf python module using /usr/bin/python3 (3.8.3 (default, Aug 18 2020, 13:06:44) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]). Please install `python3-dnf` package or ensure you have specified the correct ansible_python_interpreter.", "rc": 0, "results": [], "stderr": "", "stderr_lines": [], "stdout": "Last metadata expiration check: 2:37:06 ago on Wed Feb 24 14:07:03 2021.\nPackage python3-dnf-4.2.23-4.el8.noarch is already installed.\nDependencies resolved.\nNothing to do.\nComplete!\n", "stdout_lines": ["Last metadata expiration check: 2:37:06 ago on Wed Feb 24 14:07:03 2021.", "Package python3-dnf-4.2.23-4.el8.noarch is already installed.", "Dependencies resolved.", "Nothing to do.", "Complete!"]}

nope.

sudo pip3 install dnf

try again

TASK [compatibility_layer : Install EPEL] ****************************************************************
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "cmd": "dnf install -y python3-dnf", "msg": "Could not import the dnf python module using /usr/bin/python3 (3.8.3 (default, Aug 18 2020, 13:06:44) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]). Please install `python3-dnf` package or ensure you have specified the correct ansible_python_interpreter.", "rc": 0, "results": [], "stderr": "", "stderr_lines": [], "stdout": "Last metadata expiration check: 2:39:52 ago on Wed Feb 24 14:07:03 2021.\nPackage python3-dnf-4.2.23-4.el8.noarch is already installed.\nDependencies resolved.\nNothing to do.\nComplete!\n", "stdout_lines": ["Last metadata expiration check: 2:39:52 ago on Wed Feb 24 14:07:03 2021.", "Package python3-dnf-4.2.23-4.el8.noarch is already installed.", "Dependencies resolved.", "Nothing to do.", "Complete!"]}

Manually test install:

/usr/bin/python3
Python 3.8.3 (default, Aug 18 2020, 13:06:44)
[GCC 8.3.1 20191121 (Red Hat 8.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dnf
/usr/local/lib/python3.8/site-packages/dnf.py:15: UserWarning: The DNF Python API is not currently available via PyPI.

Please install it with your distro package manager (typically called
'python2-dnf' or 'python3-dnf'), and ensure that any virtual environments
needing the API are configured to be able to see the system site packages
directory.

  warnings.warn(warning_msg)

Really strange. Try overwriting already installed package...

sudo dnf reinstall python3-dnf
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 2:43:07 ago on Wed Feb 24 14:07:03 2021.
Dependencies resolved.
==========================================================================================================
 Package               Architecture     Version                   Repository                         Size
==========================================================================================================
Reinstalling:
 python3-dnf           noarch           4.2.23-4.el8              rhel-8-baseos-rhui-rpms           526 k

Transaction Summary
==========================================================================================================

Total download size: 526 k
Installed size: 1.8 M
Is this ok [y/N]: y
Downloading Packages:
python3-dnf-4.2.23-4.el8.noarch.rpm                                       4.7 MB/s | 526 kB     00:00----
----------------------------------------------------------------------------------------------------------
Total                                                                     2.3 MB/s | 526 kB     00:00-----
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                  1/1-
  Reinstalling     : python3-dnf-4.2.23-4.el8.noarch                                                  1/2-
  Cleanup          : python3-dnf-4.2.23-4.el8.noarch                                                  2/2-
  Running scriptlet: python3-dnf-4.2.23-4.el8.noarch                                                  2/2-
  Verifying        : python3-dnf-4.2.23-4.el8.noarch                                                  1/2-
  Verifying        : python3-dnf-4.2.23-4.el8.noarch                                                  2/2-

Reinstalled:
  python3-dnf-4.2.23-4.el8.noarch-------------------------------------------------------------------------

Complete!

Still the error

TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "cmd": "dnf install -y python3-dnf", "msg": "Could not import the dnf python module using /usr/bin/python3 (3.8.3 (default, Aug 18 2020, 13:06:44) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]). Please install `python3-dnf` package or ensure you have specified the correct ansible_python_interpreter.", "rc": 0, "results": [], "stderr": "", "stderr_lines": [], "stdout": "Last metadata expiration check: 2:43:32 ago on Wed Feb 24 14:07:03 2021.\nPackage python3-dnf-4.2.23-4.el8.noarch is already installed.\nDependencies resolved.\nNothing to do.\nComplete!\n", "stdout_lines": ["Last metadata expiration check: 2:43:32 ago on Wed Feb 24 14:07:03 2021.", "Package python3-dnf-4.2.23-4.el8.noarch is already installed.", "Dependencies resolved.", "Nothing to do.", "Complete!"]}

remove ansible_python_interpreter from hosts file.

RUN STARTS: Wed Feb 24 17:53:07 CET 2021
RUN BREAKS: Wed Feb 24 20:54:00 CET 2021 (very approximate time)

Error lmod masked by missing keyword

TASK [compatibility_layer : Install package set ['eessi-2021.02-linux-aarch64']]
failed: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com] (item=eessi-2021.02-linux-aarch64) => {"ansible_loop_var": "item", "changed": false, "cmd": ["/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64/usr/bin/emerge", "--noreplace", "--ask=n", "@eessi-2021.02-linux-aarch64"], "item": "eessi-2021.02-linux-aarch64", "msg": "Packages not installed.", "rc": 1, "stderr": "\n!!! All ebuilds that could satisfy \"sys-cluster/lmod\" have been masked.\n!!! One of the following masked packages is required to complete your request:\n- sys-cluster/lmod-9999::gentoo (masked by: missing keyword)\n- sys-cluster/lmod-8.4.20::gentoo (masked by: missing keyword)\n\n(dependency required by \"@eessi-2021.02-linux-aarch64\" [argument])\nFor more information, see the MASKED PACKAGES section in the emerge\nman page or refer to the Gentoo Handbook.\n\n", "stderr_lines": ["", "!!! All ebuilds that could satisfy \"sys-cluster/lmod\" have been masked.", "!!! One of the following masked packages is required to complete your request:", "- sys-cluster/lmod-9999::gentoo (masked by: missing keyword)", "- sys-cluster/lmod-8.4.20::gentoo (masked by: missing keyword)", "", "(dependency required by \"@eessi-2021.02-linux-aarch64\" [argument])", "For more information, see the MASKED PACKAGES section in the emerge", "man page or refer to the Gentoo Handbook.", ""], "stdout": "Calculating dependencies  \n * IMPORTANT: 4 news items need reading for repository 'gentoo'.\n * Use eselect news read to view new items.\n\n... done!\n", "stdout_lines": ["Calculating dependencies  ", " * IMPORTANT: 4 news items need reading for repository 'gentoo'.", " * Use eselect news read to view new items.", "", "... done!"]}

that is

!!! All ebuilds that could satisfy \"sys-cluster/lmod\" have been masked.
!!! One of the following masked packages is required to complete your request:
- sys-cluster/lmod-9999::gentoo (masked by: missing keyword)
- sys-cluster/lmod-8.4.20::gentoo (masked by: missing keyword)

(dependency required by \"@eessi-2021.02-linux-aarch64\"

The package should have a aarch64 or ~aarch64 keyword, but it doesn't:

/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64/usr/bin/equery meta lmod
 * sys-cluster/lmod [gentoo]
Maintainer:  [email protected] (Aisha Tammy)
Maintainer:  [email protected] (Gentoo Science Project)
Upstream:    Remote-ID:   TACC/Lmod ID: github
Homepage:    https://lmod.readthedocs.io/en/latest
Homepage:    https://github.com/TACC/Lmod
Location:    /cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64/var/db/repos/gentoo/sys-cluster/lmod
Keywords:    8.4.20:0: ~amd64 ~x86
Keywords:    9999:0:
License:     MIT

Suggestion by Bob:

EPREFIX=/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64
vi $EPREFIX/etc/portage/package.accept_keywords
+sys-cluster/lmod ~amd64
+dev-lua/luaposix ~amd64
+dev-lua/lua-bit32 amd64

RUN RESTART: Thu Feb 25 16:35:24 CET 2021

RUN FINISHED: Thu Feb 25 16:59:34 CET 2021
SUCCESS.

No bzip2 in overlay so `man` does not work

[user01@login1 ~]$ man echo
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
[user01@login1 ~]$ 

Duplicate repository name entries.

Very low priority, but worth noting anyway:

WARNING: One or more repositories have been ignored due to duplicate
  profiles/repo_name entries:

  /cvmfs/pilot.eessi-hpc.org/2020.09/compat/aarch64/, gentoo, /cvmfs/pilot.eessi-hpc.org/2020.09/compat/aarch64/var/db/repos/gentoo overrides
    /cvmfs/pilot.eessi-hpc.org/2020.09/compat/aarch64/tmp/var/db/repos/gentoo

  All profiles/repo_name entries must be unique in order to avoid having
  duplicates ignored. Set PORTAGE_REPO_DUPLICATE_WARN="0" in
  /etc/portage/make.conf if you would like to disable this warning.

bash: eval: line 32: syntax error near unexpected token `newline'

When you enter the login bash of Gentoo prefix under ubuntu, you see the following error message:

bash: eval: line 32: syntax error near unexpected token newline' bash: eval: line 32: Usage: lesspipe '

Checked because ubuntu .bashrc contains the following command by default:

make less more friendly for non-text input files, see lesspipe(1)

[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

The ubuntu under lesspipe parameterless runtime outputs environment variable configurations such as LESSOPEN, while the lesspipe under gentoo does not support parameter-free runs, so the output is misstated. The solution is to modify the above command to explicitly call the lesspipe command under ubuntu using the full path, as follows (or comment this line directly out of the line?). )ใ€‚

[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh /usr/bin/lesspipe)"

Gentoo Prefix issue?

Use git instead of rsync for gentoo repo

WSL2: WSL_INTEROP within Gentoo - ERROR: UtilConnectToInteropServer:300: connect failed 2

After running startprefix the following env information is gone and the interoperability with Windows 10 is lost.

You can't run Windows applications anymore from the Linux env

  • Default PATH is not longer available
  • WSL_INTEROP is not available

Is it possible to add this env settings in the gentoo environment

Workaround:

export PATH="old-path:${PATH}"
export WSL_INTEROP=/run/WSL/XXXX_interop (Check with pstree -p the console port XXXX)

Windows 10 commands / applications are working again.

Can we fix this within the startprefix?

Add `/opt/eessi/2021.03/lib` to `prefix_user_defined_trusted_dir`

prefix_user_defined_trusted_dir currently only takes /opt/eessi/lib as an option, but this does not allow the possibility of having release specifc changes (for example overriding the MPI with another ABI compatible one). We should also allow /opt/eessi/2021.03/lib to address this.

I guess the option should then be prefix_user_defined_trusted_dirs and some docs about how the list is separated

Magic Castle triggering usage of `dbus-launch` from compatability layer

One of things that Magic Castle does is allow you to start a Desktop session via JupyterHub. This triggers the command

dbus-launch websockify -v --web /opt/jupyterhub/lib64/python3.6/site-packages/jupyter_desktop/share/web/noVNC-1.1.0 --heartbeat 30 --unix-target /tmp/tmpmkuclos5/vnc-socket 38569 -- /bin/sh -c cd /home/ocaisa && vncserver -verbose -xstartup /opt/jupyterhub/lib64/python3.6/site-packages/jupyter_desktop/share/xstartup -SecurityTypes None -rfbunixpath /tmp/tmpmkuclos5/vnc-socket -fg -nolisten tcp

but for our compatability layer this command is failing with a bomb of /cvmfs/pilot.eessi-hpc.org/2021.03/compat/linux/x86_64/usr/bin/dbus-daemon processes and lots of errors in the logs about

Apr  8 14:37:15 gpu-node1 dbus-daemon[11779]: Cannot initialize inotify

Installation playbook should make a portage env file for glibc

This should make sure that we keep the user-defined-trusted-dirs whenever we have to reinstall glibc outside of the playbook, e.g. for security updates. The file could be generated by the playbook (instead of having a static file in our overlay, as we had in the past).

@amadio thanks for the suggestion!

Don't remove the gentoo overlay dir if it's already using git

I just noticed that the playbook always removes the gentoo repo dir in the compat layer. Though it's quite harmless, it only needs to do so in the first run, when we actually are changing from rsync to git. For other runs, it should check if this change has already been done.

bootstrapping Prefix on macOS Catalina (10.15.5)

I gave that a quick shot, didn't get very far:

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.5
BuildVersion:	19F101
$ ./bootstrap-prefix.sh
...

I'm excited!  Seems we can finally do something productive now.

Ok, I'm going to do a little bit of guesswork here.  Thing is, your
machine appears to be identified by CHOST=x86_64-apple-darwin19.

Great!  You appear to have a compiler in your PATH

You don't have /usr/include, this thwarts me to build stuff.
Please execute:
  xcode-select --install
or install /usr/include in another way and try running me again.

$ xcode-select --install
xcode-select: error: command line tools are already installed, use "Software Update" to install updates

It's a known issue in Gentoo Prefix, see https://bugs.gentoo.org/show_bug.cgi?id=730476 for details.

installation procedure for /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020

@bedroge @peterstol I think it makes sense to document how the Gentoo Prefix installation we have now at /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020 came to be: which procedure was taken, which "version" of the bootstrap script was used (the one in #4, I presume), which customizations were made compared to upstream, which problems did we run into (ideally incl. link to upstream bug report or fix), etc.

Also, if any additional commands were run (like "emerge ..."), we should document them somewhere for future reference.
That can be done in here, by adding comments to this issue...

When we do another Prefix installation later (we will probably start over at some point during the pilot, but also for production later), we can use this issue as a reference of lessons learned...

Validate the state of the compatibility layer.

We probably should have something validate the compatibility layer to ensure that it is ready to accept software. There are some installed pieces of software exists, but is there software we wish to ensure isn't present as well?

Also, this script may be extended to verify a new version of the compatibility layer for release, but stripping caches and such?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.