eessi / compatibility-layer Goto Github PK
View Code? Open in Web Editor NEWCompatibility layer of the EESSI project
Home Page: https://eessi.github.io/docs/compatibility_layer
License: GNU General Public License v2.0
Compatibility layer of the EESSI project
Home Page: https://eessi.github.io/docs/compatibility_layer
License: GNU General Public License v2.0
Ansible has a portage module which can be used to automate package install.
https://docs.ansible.com/ansible/latest/modules/portage_module.html
The /lib
and /lib64
directories in the compat layer are handled differently on x86_64
vs aarch64
:
$ ls -ld /cvmfs/pilot.eessi-hpc.org/2020.10/compat/{aarch64,x86_64}/{lib,lib64}
drwxr-xr-x 2 ec2-user ec2-user 4096 Nov 3 20:35 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/lib
drwxr-xr-x 2 ec2-user ec2-user 4096 Nov 3 20:39 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/lib64
lrwxrwxrwx 1 ec2-user ec2-user 5 Oct 30 00:32 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/x86_64/lib -> lib64
drwxr-xr-x 2 ec2-user ec2-user 4096 Nov 3 15:20 /cvmfs/pilot.eessi-hpc.org/2020.10/compat/x86_64/lib64
Likewise for /usr/{lib,lib64}
.
This is a problem, because some tools (like CMake
, cfr. easybuilders/easybuild-easyblocks#2248) only looks in lib
, and then fail to find libraries located in lib64
if lib
is not a symlink to it (like in compat/aarch64
)...
Relevant links from Gentoo Prefix documentation:
We now add /opt/eessi/lib
as (primary) search path for libraries, by baking it into glibc
(see EESSI/gentoo-overlay#8).
@amadio pointed out to us that a better alternative may be simply adding these paths to $EPREFIX/etc/ld.so.conf
instead.
@bartoldeman: Is there a particular reason you didn't take that approach for the ComputeCanada software stack?
Currently there is no documentation on why the project has chosen Gentoo Prefix over Nix.
This would be useful information to retain.
A few notes on the topic from a Slack conversation with Bart Oldeman at Compute Canada:
nix-shell
, but that would require users to learn how to use it. For example, see NixOS/nixpkgs#44144.$EPREFIX/etc/ld.so.cache
with Gentoo Prefix might improve overall performance thanks to the lower overhead for the dynamic linker/loader $EPREFIX/lib/ld-linux-x86-64.so.2
to locate the required dynamic libraries.A fairly trivial test:
terjekv@minbar:~/projects/eessi$ ls -lad ~/Gentoo
lrwxr-xr-x 1 terjekv 11 des 4 18:08 /Users/terjekv/Gentoo -> Gentoo-11.0/
terjekv@minbar:~/projects/eessi$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.15.7
BuildVersion: 19H2
terjekv@minbar:~/projects/eessi$ git clone [email protected]:EESSI/compatibility-layer.git
Cloning into 'compatibility-layer'...
dyld: lazy symbol binding failed: Symbol not found: _strtonum
Referenced from: /Users/terjekv/Gentoo/usr/bin/ssh (which was built for Mac OS X 11.0)
Expected in: /usr/lib/libSystem.B.dylib
dyld: Symbol not found: _strtonum
Referenced from: /Users/terjekv/Gentoo/usr/bin/ssh (which was built for Mac OS X 11.0)
Expected in: /usr/lib/libSystem.B.dylib
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
If I start with a fresh gentoo-prefix install and use the EESSI repository to install Lmod, I get
$ emerge app-admin/Lmod
Calculating dependencies... done!
>>> Verifying ebuild manifests
>>> Emerging (1 of 5) dev-lua/bit32-5.3.0::eessi
* Fetching files in the background.
* To view fetch progress, run in another terminal:
* tail -f /opt/gentoo-prefix/var/log/emerge-fetch.log
* bitlib-5.3.0.tar.gz BLAKE2B SHA512 size ;-) ... [ ok ]
>>> Unpacking source...
>>> Unpacking bitlib-5.3.0.tar.gz to /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work
>>> Source unpacked in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work
>>> Preparing source in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0 ...
>>> Source prepared.
>>> Configuring source in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0 ...
>>> Source configured.
>>> Compiling source in /opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0 ...
lbitlib.c:10:10: fatal error: lua.h: No such file or directory
10 | #include "lua.h"
| ^~~~~~~
compilation terminated.
* ERROR: dev-lua/bit32-5.3.0::eessi failed (compile phase):
* (no error message)
*
* Call stack:
* ebuild.sh, line 125: Called src_compile
* environment, line 809: Called die
* The specific snippet of code:
* $(tc-getCC) ${CFLAGS} -fPIC -Ic-api -c -o ${MY_PN}.o ${MY_PN}.c || die;
*
* If you need support, post the output of `emerge --info '=dev-lua/bit32-5.3.0::eessi'`,
* the complete build log and the output of `emerge -pqv '=dev-lua/bit32-5.3.0::eessi'`.
* The complete build log is located at '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/temp/build.log'.
* The ebuild environment file is located at '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/temp/environment'.
* Working directory: '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0'
* S: '/opt/gentoo-prefix/var/tmp/portage/dev-lua/bit32-5.3.0/work/lua-compat-5.2-bitlib-5.3.0
So, I needed to manually first run
$ emerge lua tcl
This is also reflected on the ansible roles being developed (
)EasyBuild requires PyYAML
for the easystack file support, so it would be nice to have this installed in the compat layer.
Gentoo user prompt is wrong, Name is not from the local /etc/passwd.
dell@Innopad-01:~$ /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020/startprefix
Entering Gentoo Prefix /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020
henkjan@Innopad-01:~$
User is coming from /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020/etc/passwd.
I have no name!@Innopad-01:/mnt/c/Users/Innopad$
User is not in the /etc/passwd file coming from /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020/etc/passwd.
Can we use the local /etc/passwd file?
All our playbooks should be ready to support this, but the Prefix installation still needs some manual intervention to work around the issues described here:
https://bugs.gentoo.org/755551
Furthermore, the Lmod ebuild needs a ppc64
keyword:
https://bugs.gentoo.org/773313
I noticed that we don't do this at the moment, which means that tools will now pick up the hosts
file from the Prefix. I just did some tests, and this could lead to strange issues, e.g.:
$ /cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64/startprefix
Entering Gentoo Prefix /cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64
$ hostname
dh-node12
$ which curl
/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64/usr/bin/curl
$ curl http://dh-node12:9100
curl: (6) Couldn't resolve host 'dh-node12'
$ /usr/bin/curl http://dh-node12:9100
<html>
<head><title>Node Exporter</title></head>
<body>
<h1>Node Exporter</h1>
<p><a href="/metrics">Metrics</a></p>
</body>
</html>
$ strace curl http://dh-node12:9100 2>&1 | grep hosts
openat(AT_FDCWD, "/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/x86_64/etc/hosts", O_RDONLY|O_CLOEXEC) = 5
So I think we should change this by uncommenting:
https://github.com/EESSI/compatibility-layer/blob/main/ansible/playbooks/roles/compatibility_layer/defaults/main.yml#L72
Slightly related to #47, but we need extra packages in the compatibility layer in order to make UCX support Infiniband (and others). All default fabric-related Gentoo packages can be found here: https://packages.gentoo.org/categories/sys-fabric
Which ones do we want/need?
With libibverbs
and librdmacm
installed, and passing --with-rdmacm=${EPREFIX}/usr
to the configure
of UCX, I get this:
checking cuda.h usability... no
checking cuda.h presence... no
checking for cuda.h... no
checking cuda_runtime.h usability... no
checking cuda_runtime.h presence... no
checking for cuda_runtime.h... no
configure: WARNING: CUDA not found
configure: ROCm path was not specified. Guessing ...
checking hsa.h usability... no
checking hsa.h presence... no
checking for hsa.h... no
configure: WARNING: ROCm not found
checking hip_runtime.h usability... no
checking hip_runtime.h presence... no
checking for hip_runtime.h... no
configure: WARNING: HIP Runtime not found
checking gdrapi.h usability... no
checking gdrapi.h presence... no
checking for gdrapi.h... no
configure: WARNING: GDR_COPY not found
configure: Compiling with verbs support from /usr
checking infiniband/verbs.h usability... yes
checking infiniband/verbs.h presence... yes
checking for infiniband/verbs.h... yes
checking for ibv_get_device_list in -libverbs... yes
checking whether ibv_wc_status_str is declared... yes
checking whether ibv_event_type_str is declared... yes
checking whether ibv_query_gid is declared... yes
checking whether ibv_get_device_name is declared... yes
checking whether ibv_create_srq is declared... yes
checking whether ibv_get_async_event is declared... yes
checking infiniband/verbs_exp.h usability... no
checking infiniband/verbs_exp.h presence... no
checking for infiniband/verbs_exp.h... no
checking for struct ibv_exp_device_attr.exp_device_cap_flags... no
checking for struct ibv_exp_device_attr.odp_caps... no
checking for struct ibv_exp_device_attr.odp_caps.per_transport_caps.dc_odp_caps... no
checking for struct ibv_exp_device_attr.odp_mr_max_size... no
checking for struct ibv_exp_qp_init_attr.max_inl_recv... no
checking for struct ibv_async_event.element.dct... no
checking whether IBV_CREATE_CQ_ATTR_IGNORE_OVERRUN is declared... no
checking whether IBV_EXP_CQ_IGNORE_OVERRUN is declared... no
configure: Checking for legacy bare-metal support
checking infiniband/mlx5_hw.h usability... no
checking infiniband/mlx5_hw.h presence... no
checking for infiniband/mlx5_hw.h... no
configure: Checking for DV bare-metal support
checking for mlx5dv_query_device in -lmlx5-rdmav2... no
checking for mlx5dv_query_device in -lmlx5... no
checking whether ibv_alloc_td is declared... no
checking whether MLX5DV_CONTEXT_FLAGS_DEVX is declared... no
checking whether IBV_LINK_LAYER_INFINIBAND is declared... yes
checking whether IBV_LINK_LAYER_ETHERNET is declared... yes
checking whether IBV_EVENT_GID_CHANGE is declared... yes
checking whether ibv_create_qp_ex is declared... yes
checking whether ibv_create_srq_ex is declared... yes
checking whether ibv_query_device_ex is declared... no
checking whether IBV_EXP_ACCESS_ALLOCATE_MR is declared... no
checking whether IBV_EXP_ACCESS_ON_DEMAND is declared... no
checking whether IBV_EXP_DEVICE_MR_ALLOCATE is declared... no
checking whether IBV_EXP_WR_NOP is declared... no
checking whether IBV_EXP_DEVICE_DC_TRANSPORT is declared... no
checking whether IBV_EXP_ATOMIC_HCA_REPLY_BE is declared... no
checking whether IBV_EXP_PREFETCH_WRITE_ACCESS is declared... no
checking whether IBV_EXP_QP_OOO_RW_DATA_PLACEMENT is declared... no
checking whether IBV_EXP_DCT_OOO_RW_DATA_PLACEMENT is declared... no
checking whether IBV_EXP_CQ_MODERATION is declared... no
checking whether IBV_EXP_DEVICE_ATTR_PCI_ATOMIC_CAPS is declared... no
checking whether ibv_exp_reg_mr is declared... no
checking whether ibv_exp_create_qp is declared... no
checking whether ibv_exp_prefetch_mr is declared... no
checking whether ibv_exp_create_srq is declared... no
checking whether ibv_exp_setenv is declared... no
checking whether ibv_exp_query_gid_attr is declared... no
checking whether ibv_exp_query_device is declared... no
checking whether ibv_exp_post_send is declared... no
checking whether IBV_EXP_WR_EXT_MASKED_ATOMIC_CMP_AND_SWP is declared... no
checking whether IBV_EXP_WR_EXT_MASKED_ATOMIC_FETCH_AND_ADD is declared... no
checking whether IBV_EXP_QP_INIT_ATTR_ATOMICS_ARG is declared... no
checking whether IBV_EXP_SEND_EXT_ATOMIC_INLINE is declared... no
checking whether IBV_EXP_DEVICE_ATTR_RESERVED_2 is declared... no
checking whether IBV_EXP_MR_INDIRECT_KLMS is declared... no
checking whether IBV_EXP_QP_CREATE_UMR is declared... no
checking for struct ibv_exp_qp_init_attr.umr_caps... no
checking whether IBV_EXP_MR_FIXED_BUFFER_SIZE is declared... no
configure: WARNING: Compiling without extended atomics support
checking for struct ibv_exp_masked_atomic_params.masked_log_atomic_arg_sizes_network_endianness... no
checking whether IBV_EXP_ODP_SUPPORT_IMPLICIT is declared... no
checking whether IBV_EXP_ACCESS_ON_DEMAND is declared... (cached) no
checking whether IBV_ACCESS_ON_DEMAND is declared... no
checking whether ibv_exp_prefetch_mr is declared... (cached) no
checking whether ibv_advise_mr is declared... no
checking for struct mlx5_wqe_av.base... no
checking for struct mlx5_grh_av.rmac... no
checking for struct mlx5_cqe64.ib_stride_index... no
checking whether IBV_EXP_QPT_DC_INI is declared... no
checking infiniband/tm_types.h usability... no
checking infiniband/tm_types.h presence... no
checking for infiniband/tm_types.h... no
checking for struct ibv_exp_tmh.tag... no
checking for struct ibv_tmh.tag... no
checking whether ibv_exp_alloc_dm is declared... no
checking whether ibv_alloc_dm is declared... no
checking whether ibv_cmd_modify_qp is declared... yes
configure: Checking OFED valgrind libs /usr/lib/mlnx_ofed/valgrind
checking for ib_cm_send_req in -libcm... no
configure: WARNING: CM support not found, skipping
checking /cvmfs/pilot.eessi-hpc.org/2020.09/compat/x86_64/usr/include/rdma/rdma_cma.h usability... yes
checking /cvmfs/pilot.eessi-hpc.org/2020.09/compat/x86_64/usr/include/rdma/rdma_cma.h presence... yes
checking for /cvmfs/pilot.eessi-hpc.org/2020.09/compat/x86_64/usr/include/rdma/rdma_cma.h... yes
checking for rdma_create_id in -lrdmacm... yes
checking whether rdma_establish is declared... no
checking whether rdma_init_qp_attr is declared... no
checking sys/uio.h usability... yes
checking sys/uio.h presence... yes
checking for sys/uio.h... yes
checking for process_vm_readv... yes
configure: KNEM path was not found, guessing ...
Package knem was not found in the pkg-config search path.
Perhaps you should add the directory containing `knem.pc'
to the PKG_CONFIG_PATH environment variable
Package 'knem', required by 'virtual:world', not found
checking whether KNEM_CMD_GET_INFO is declared... no
configure: WARNING: KNEM requested but required file (knem_io.h) could not be found
configure: XPMEM - failed to open the requested location (guess), guessing ...
checking cray-ugni... no
checking compiler flag -fno-exceptions... yes
checking compiler flag -fno-rtti... yes
checking compiler flag --no_exceptions... no
checking compiler flag -fno-tree-vectorize... yes
checking compiler flag --diag_suppress 236... no
checking that generated files are newer than configure... done
configure: =========================================================
configure: UCX build configuration:
configure: Build prefix: /home/bob/ucx/inst
configure: Preprocessor flags: -DCPU_FLAGS="|avx" -I${abs_top_srcdir}/src -I${abs_top_builddir} -I${abs_top_builddir}/src
configure: C compiler: x86_64-pc-linux-gnu-gcc -O3 -g -Wall -Werror -mavx
configure: C++ compiler: x86_64-pc-linux-gnu-g++ -O3 -g -Wall -Werror -mavx
configure: Multi-thread: enabled
configure: MPI tests: disabled
configure: Devel headers: no
configure: Bindings: < >
configure: UCT modules: < ib rdmacm cma >
configure: CUDA modules: < >
configure: ROCM modules: < >
configure: IB modules: < >
configure: UCM modules: < >
configure: Perf modules: < >
configure: =========================================================
How do we apply security updates to the compatibility layer in between releases and how often?
What is the impact on our design, which would like to keep all versions as fixedas possible to prevent issues higher up the stack?
Gentoo provides glsa-check (https://wiki.gentoo.org/wiki/Security_Handbook/Staying_up-to-date), which works well, but requires a sync of the repo.
emerge --sync
glsa-check -t all
This system is affected by the following GLSAs:
202009-01
Or more verbose..
glsa-check -p $(glsa-check -t all)
Apply security updates glsa-check -f $(glsa-check -t all)
The workflow might be added to our Ansible scripts
Lmod and archspec are not supported on aarch64 by default.
emerge --ask --autounmask=y --autounmask-write =app-admin/Lmod-8.3.17
emerge --ask --autounmask=y --autounmask-write =sys-apps/archspec-0.1.1
dispatch-conf
It seems like the CernVM-FS package repo doesn't like being hammered by GitHub Actions:
Err:24 https://cvmrepo.web.cern.ch/cvmrepo/apt focal-prod InRelease
403 Forbidden [IP: 188.185.90.87 443]
Reading package lists...
E: Failed to fetch http://cvmrepo.web.cern.ch/cvmrepo/apt/dists/focal-prod/InRelease 403 Forbidden [IP: 188.185.90.87 443]
E: The repository 'http://cvmrepo.web.cern.ch/cvmrepo/apt focal-prod InRelease' is not signed.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package cvmfs
So we need to try and avoid that the packages are downloaded again and again, there's some support for caching in GitHub Actions that may help here...
We're currently serving them from the Stratum 0, but that should be changed...
git
rather than rsync
for emerge --sync
(issue: #100, PR: #106)aarch64
ppc64le
x86_64
.github/workflows/pilot_repo.yml
)I was experimenting with CUDA support within EESSI and ran into the issue that, when using CUDA compiled with the EESSI stack, the CUDA libraries from the host are not seen by the executables created by nvcc
. This is because it looks for the CUDA driver libraries in the prefix, where they do not exist. There are a few viable solutions:
LD_PRELOAD
...but this requires that you know exactly which libraries to preload.LD_LIBRARY_PATH
...but in the system that I was on, the CUDA libraries are in /usr/lib64
so this would drag in a lot of unwanted libraries. This could be worked around by symlinking the libraries to a more unique location in the host (Compute Canada uses /usr/lib64/nvidia
and has a script that creates the necessary symlinks).emerge Lmod
fails with:
Error: The follow lua module(s) are missing: posix
You can not run Lmod without: posix
fixed after emerge dev-lua/luaposix
Right now https://hub.docker.com/r/eessi/client-pilot/tags has a bit of a hodgepodge usage of tags. We should standardise on both the format and have architecture tags for all archs, not just "non-x86_64".
I got the following warnings when starting R:
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"
6: Setting LC_PAPER failed, using "C"
7: Setting LC_MEASUREMENT failed, using "C"
Probably related to this:
https://wiki.gentoo.org/wiki/Project:Prefix/FAQ#Add_an_en_US.UTF-8_locale
Here we can discuss how we are going to install Prefix. Do we, for instance, copy the bootstrap script to our repo and customize it to our needs, or do we use some other mechanism to install it?
This causes timezone confusion for some software, see EESSI/software-layer#79 .
We do have /etc/localtime
in the 2020.12
compat layer:
cat /cvmfs/pilot.eessi-hpc.org/2020.12/compat/linux/x86_64/etc/localtime
TZif2-00TZif2-00
<-00>0
We should make this a symlink into the host, to get correct time zone information?
ln -s /etc/localtime /cvmfs/pilot.eessi-hpc.org/2021.03/compat/linux/x86_64/etc/localtime
In #98 I noticed that the action for testing the Prefix installation playbook was failing, and this is (again) due to Lua stuff not installing correctly inside Gentoo Prefix.
It looks like they added a fix for luaposix
, which requires us to revert a change in our eclass/lua-utils.eclass
(which was taken from here and: https://bugs.gentoo.org/768909):
LUA_INCLUDE_DIR)
local val
val=$($(tc-getPKG_CONFIG) --variable includedir ${impl}) || die
# ADD THE FOLLOWING LINE:
val="${val#${ESYSROOT#${SYSROOT}}}"
export LUA_INCLUDE_DIR=${val}
debug-print "${FUNCNAME}: LUA_INCLUDE_DIR = ${LUA_INCLUDE_DIR}"
But then dev-lua/lpeg
breaks, for which I opened a PR here:
gentoo/gentoo#20576
So if this gets merged, we can add the change to the eclass file, and remove our custom luaposix
ebuild.
startprefix
was missing in 2021.02
compat layer for aarch64
for some reason, and this was not detected by the playbook...
In case we need to modify/patch an eclass from the default gentoo overlay, for instance like the lua one from EESSI/gentoo-overlay#43, we either need to copy it to the gentoo directory ourselves (which may break/be replaced after a sync
?) or we can use the eclass-overrides
setting (see https://wiki.gentoo.org/wiki//etc/portage/repos.conf). The latter can be done by adding the following snippet to an existing/new file in ${EPREFIX}/etc/portage/repos.conf/
:
[DEFAULT]
eclass-overrides = eessi
Some steps now require root permissions, but in principle it should be possible to run this playbook as a regular user, e.g. by mounting some local folder as /cvmfs
in the container. We should try to add some clever checks or use tags to skip certain steps to make this possible.
After bootstrapping Prefix via the container image (singularity run --bind $HOME/cvmfs:/cvmfs docker://eessi/bootstrap-prefix:centos8-$(uname -m)
), I tried running the Ansible playbook to take care of the remaining tasks to set up a new version of the compat layer.
Here's what I did:
/cvmfs
to $HOME/cvmfs
on the build host where I performed the Prefix bootstrapssh
to build host works (by adding SSH key to SSH agent via ssh-add
)hosts
file:
$ cd compatibility-layer/ansible/playbooks
$ cat hosts
[cvmfsstratum0servers]
<IP_OF_BUILD_HOST>
ansible-playbook -i hosts -K install.yml -e gentoo_prefix_path=/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64
The playbook failed when running emerge gentoolkit
(see details below).
TASK [compatibility_layer : Install equery command (dependency for the portage module)] ***************************************************
fatal: [18.223.116.164]: FAILED! => {"changed": true, "cmd": ["emerge", "gentoolkit"], "delta": "0:00:01.930018", "end": "2020-11-03 18:41:41.190089", "msg": "non-zero return code", "rc": 1, "start": "2020-11-03 18:41:39.260071", "stderr": "portage: 'portage' user or group missing.\n For the defaults, line 1 goes into passwd, and 2 into group.\n portage:x:250:250:portage:/var/tmp/portage:/bin/false\n portage::250:portage\n*** WARNING *** For security reasons, only system administrators should be\n*** WARNING *** allowed in the portage group. Untrusted users or processes\n*** WARNING *** can potentially exploit the portage group for attacks such as\n*** WARNING *** local privilege escalation.\n\nmount: /proc: must be superuser to use mount.\nUnable to mark /proc slave: 32", "stderr_lines": ["portage: 'portage' user or group missing.", " For the defaults, line 1 goes into passwd, and 2 into group.", " portage:x:250:250:portage:/var/tmp/portage:/bin/false", " portage::250:portage", "*** WARNING *** For security reasons, only system administrators should be", "*** WARNING *** allowed in the portage group. Untrusted users or processes", "*** WARNING *** can potentially exploit the portage group for attacks such as", "*** WARNING *** local privilege escalation.", "", "mount: /proc: must be superuser to use mount.", "Unable to mark /proc slave: 32"], "stdout": "\nPerforming Global Updates\n(Could take a couple of minutes if you have a lot of binary packages.)\n .='update pass' *='binary update' #='/var/db update' @='/var/db move'\n s='/var/db SLOT move' %='binary move' S='binary SLOT move'\n p='update /etc/portage/package.*'\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2015.....................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2015..........................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2016............................................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2016.........................................................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2016.......................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2016.............................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2017..............................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2017....\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2017........................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2017.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2018.....................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2018..................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2018...\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2018.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2019.................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2019...\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2019.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2019.......\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2020............\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2020............\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2020....................................................................................................................................................\n/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2020...\n\n\nCalculating dependencies \n * IMPORTANT: 4 news items need reading for repository 'gentoo'.\n * Use eselect news read to view new items.\n\n... done!\n\n>>> Verifying ebuild manifests\n\n>>> Emerging (1 of 1) app-portage/gentoolkit-0.5.0-r2::gentoo\n * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR\n * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-\n * hpc.org/2020.10/compat/aarch64/var/tmp/portage/app-\n * portage/gentoolkit-0.5.0-r2'\n\n>>> Failed to emerge app-portage/gentoolkit-0.5.0-r2\n * Messages for package app-portage/gentoolkit-0.5.0-r2:", "stdout_lines": ["", "Performing Global Updates", "(Could take a couple of minutes if you have a lot of binary packages.)", " .='update pass' *='binary update' #='/var/db update' @='/var/db move'", " s='/var/db SLOT move' %='binary move' S='binary SLOT move'", " p='update /etc/portage/package.*'", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2015.....................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2015..........................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2016............................................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2016.........................................................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2016.......................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2016.............................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2017..............................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2017....", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2017........................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2017.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2018.....................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2018..................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2018...", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2018.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2019.................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2019...", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2019.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2019.......", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/1Q-2020............", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/2Q-2020............", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/3Q-2020....................................................................................................................................................", "/home/ec2-user/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/db/repos/gentoo/profiles/updates/4Q-2020...", "", "", "Calculating dependencies ", " * IMPORTANT: 4 news items need reading for repository 'gentoo'.", " * Use eselect news read to view new items.", "", "... done!", "", ">>> Verifying ebuild manifests", "", ">>> Emerging (1 of 1) app-portage/gentoolkit-0.5.0-r2::gentoo", " * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR", " * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-", " * hpc.org/2020.10/compat/aarch64/var/tmp/portage/app-", " * portage/gentoolkit-0.5.0-r2'", "", ">>> Failed to emerge app-portage/gentoolkit-0.5.0-r2", " * Messages for package app-portage/gentoolkit-0.5.0-r2:"]}
TASK [compatibility_layer : Abort transaction] ********************************************************************************************
skipping: [18.223.116.164]
RUNNING HANDLER [compatibility_layer : Generate locales] **********************************************************************************
changed: [18.223.116.164]
PLAY RECAP ********************************************************************************************************************************
18.223.116.164 : ok=10 changed=4 unreachable=0 failed=0 skipped=3 rescued=1 ignored=0
Running emerge gentoolkit
in the startprefix
environment manually works fine.
When I re-run the playbook after doing that, it fails when running emerge eselect-repository
(see details below).
TASK [compatibility_layer : Install eselect-repository] ***********************************************************************************
fatal: [18.223.116.164]: FAILED! => {"changed": false, "cmd": ["/cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/usr/bin/emerge", "--noreplace", "--ask=n", "eselect-repository"], "msg": "Packages not installed.", "rc": 1, "stderr": "portage: 'portage' user or group missing.\n For the defaults, line 1 goes into passwd, and 2 into group.\n portage:x:250:250:portage:/var/tmp/portage:/bin/false\n portage::250:portage\n*** WARNING *** For security reasons, only system administrators should be\n*** WARNING *** allowed in the portage group. Untrusted users or processes\n*** WARNING *** can potentially exploit the portage group for attacks such as\n*** WARNING *** local privilege escalation.\n\nmount: /proc: must be superuser to use mount.\nUnable to mark /proc slave: 32\n", "stderr_lines": ["portage: 'portage' user or group missing.", " For the defaults, line 1 goes into passwd, and 2 into group.", " portage:x:250:250:portage:/var/tmp/portage:/bin/false", " portage::250:portage", "*** WARNING *** For security reasons, only system administrators should be", "*** WARNING *** allowed in the portage group. Untrusted users or processes", "*** WARNING *** can potentially exploit the portage group for attacks such as", "*** WARNING *** local privilege escalation.", "", "mount: /proc: must be superuser to use mount.", "Unable to mark /proc slave: 32"], "stdout": "Calculating dependencies \n * IMPORTANT: 4 news items need reading for repository 'gentoo'.\n * Use eselect news read to view new items.\n\n... done!\n\n>>> Verifying ebuild manifests\n\n>>> Emerging (1 of 8) dev-python/certifi-10001-r1::gentoo\n * Fetching files in the background.\n * To view fetch progress, run in another terminal:\n * tail -f /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/log/emerge-fetch.log\n * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR\n * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-\n * hpc.org/2020.10/compat/aarch64/var/tmp/portage/dev-\n * python/certifi-10001-r1'\n\n>>> Failed to emerge dev-python/certifi-10001-r1\n * Messages for package dev-python/certifi-10001-r1:\n\n\n", "stdout_lines": ["Calculating dependencies ", " * IMPORTANT: 4 news items need reading for repository 'gentoo'.", " * Use eselect news read to view new items.", "", "... done!", "", ">>> Verifying ebuild manifests", "", ">>> Emerging (1 of 8) dev-python/certifi-10001-r1::gentoo", " * Fetching files in the background.", " * To view fetch progress, run in another terminal:", " * tail -f /cvmfs/pilot.eessi-hpc.org/2020.10/compat/aarch64/var/log/emerge-fetch.log", " * The ebuild phase 'die_hooks' has been aborted since PORTAGE_BUILDDIR", " * does not exist: '/home/ec2-user/cvmfs/pilot.eessi-", " * hpc.org/2020.10/compat/aarch64/var/tmp/portage/dev-", " * python/certifi-10001-r1'", "", ">>> Failed to emerge dev-python/certifi-10001-r1", " * Messages for package dev-python/certifi-10001-r1:", "", ""]}
Running emerge eselect-repository
works fine, so does emerge dev-vcs/git
(which is the next step that fails in the playbook).
Re-running the playbook after running those 3 emerge
commands that were failing manually makes it fail on emerge @2020.10
(which also fails manually due to another problem, see EESSI/gentoo-overlay#24).
Ansible version info (installed via pip3 install ansible
):
$ ansible --version
ansible 2.10.3
config file = None
configured module search path = ['/home/ec2-user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/linuxbrew/.linuxbrew/opt/[email protected]/lib/python3.8/site-packages/ansible
executable location = /home/linuxbrew/.linuxbrew/bin/ansible
python version = 3.8.6 (default, Oct 10 2020, 07:54:55) [GCC 5.4.0 20160609]
double prefix error installing
* Messages for package dev-lua/luafilesystem-1.7.0.2:
* ERROR: dev-lua/luafilesystem-1.7.0.2::gentoo failed:
* Aborting due to QA concerns: double prefix files installed
>>> Install dev-lua/luafilesystem-1.7.0.2 into /scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image
make -j16 DESTDIR=/scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image install
mkdir -p "/scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image/scratch/gentoo//scratch/gentoo/usr/lib64/lua/5.1"
cp src/lfs.so "/scratch/gentoo/var/tmp/portage/dev-lua/luafilesystem-1.7.0.2/image/scratch/gentoo//scratch/gentoo/usr/lib64/lua/5.1"
The prefix installation script uses by default the latest portage and gentoo snapshots from http://distfiles.gentoo.org/snapshots/.
We would like to use a specific snapshot to install the prefix for reproducibility. As the snapshots are only kept for a limited timeframe on gentoo.org an alternative location is required. Due to space concerns, our github repo is not considered suitable. The location will be referenced by setting SNAPSHOT_URL in the prefix bootstrap script.
These are the notes during a novice installing aarch64 on a test machine.
ssh some-machine.somewhe.re
[...]
Failed to set locale, defaulting to C.UTF-8
[...]
The machine is
uname -a
Linux ip-172-31-42-108.eu-west-1.compute.internal 4.18.0-240.15.1.el8_3.aarch64 #1 SMP Wed Feb 3 03:16:05 EST 2021 aarch64 aarch64 aarch64 GNU/Linux
cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.3"
Seems to be very worksome...
screen
is not installed, could help for long running installs from localhostgit
is not installed required to clone repo locallysudo yum install git
mkdir src; cd src
git clone https://github.com/EESSI/compatibility-layer
sudo yum install ansible
On my local machine, fetch repository.
git clone https://github.com/EESSI/compatibility-layer
cd compatibility-layer/ansible/playbooks
cat README.md
vim hosts
Credentials as provided, key added to ssh-agent.
[cvmfsstratum0servers]
ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com ansible_ssh_user=ec2-user eessi_host_arch=aarch64 eessi_host_os=linux
Error
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "module_stderr": "Shared connection to ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python: No such file or directory\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 127}
Resolved by
sudo yum install python38
and adding ansible_python_interpreter=/usr/bin/python3
to hosts
Ansible I still had installed in a conda environment, so I reused that one.
ansible-playbook -i hosts -b install.yml
Error
TASK [compatibility_layer : Fail if host OS is not supported]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"msg": "The conditional check 'not(ansible_os_family == \"RedHat\" and ansible_distribution_major_version is version(\"8\", \"==\"))' failed. The error was: error while evaluating conditional (not(ansible_os_family == \"RedHat\" and ansible_distribution_major_version is version(\"8\", \"==\"))): 'ansible_os_family' is undefined\n\nThe error appears to have been in '/home/.../eessi/compatibility-layer/ansible/playbooks/roles/compatibility_layer/tasks/install_prefix.yml': line 4, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Fail if host OS is not supported\n ^ here\n"}
when debug printing all {{ ansible_facts }}
no distribution defining variables are defined, so they are also not injected into global ansible_
namespace.
Resolved by manual verification in /etc/os-release
and commenting out the OS check in the playbook.
Error
TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": ["Could not detect which major revision of yum is in use, which is required to determine module backend.", "You can manually specify use_backend to tell the module whether to use the yum (yum3) or dnf (yum4) backend})"]}
Manually checked
yum --version
Failed to set locale, defaulting to C.UTF-8
4.2.23
Installed: dnf-0:4.2.23-4.el8.noarch at Tue Feb 9 15:46:00 2021
--- a/ansible/playbooks/roles/compatibility_layer/tasks/install_prefix.yml
+++ b/ansible/playbooks/roles/compatibility_layer/tasks/install_prefix.yml
- name: "Install EPEL"
yum:
- https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
disable_gpg_check: yes
state: present
+ use_backend: yum4
tags:
- build_prefix
python3-dnf
package requiredError
TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": "Could not import the dnf python module. Please install `python3-dnf` package.", "results": []}
Notice however, that
sudo yum install python3-dnf
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 2:29:36 ago on Wed Feb 24 14:07:03 2021.
Package python3-dnf-4.2.23-4.el8.noarch is already installed.
Dependencies resolved.
Nothing to do.
Complete!
However the package is still not found. NOT SOLVED!
Maybe old Ansible does not detect packages right?
python3-dnf
package requiredError
TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "cmd": "dnf install -y python3-dnf", "msg": "Could not import the dnf python module using /usr/bin/python3 (3.8.3 (default, Aug 18 2020, 13:06:44) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]). Please install `python3-dnf` package or ensure you have specified the correct ansible_python_interpreter.", "rc": 0, "results": [], "stderr": "", "stderr_lines": [], "stdout": "Last metadata expiration check: 2:37:06 ago on Wed Feb 24 14:07:03 2021.\nPackage python3-dnf-4.2.23-4.el8.noarch is already installed.\nDependencies resolved.\nNothing to do.\nComplete!\n", "stdout_lines": ["Last metadata expiration check: 2:37:06 ago on Wed Feb 24 14:07:03 2021.", "Package python3-dnf-4.2.23-4.el8.noarch is already installed.", "Dependencies resolved.", "Nothing to do.", "Complete!"]}
nope.
sudo pip3 install dnf
try again
TASK [compatibility_layer : Install EPEL] ****************************************************************
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "cmd": "dnf install -y python3-dnf", "msg": "Could not import the dnf python module using /usr/bin/python3 (3.8.3 (default, Aug 18 2020, 13:06:44) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]). Please install `python3-dnf` package or ensure you have specified the correct ansible_python_interpreter.", "rc": 0, "results": [], "stderr": "", "stderr_lines": [], "stdout": "Last metadata expiration check: 2:39:52 ago on Wed Feb 24 14:07:03 2021.\nPackage python3-dnf-4.2.23-4.el8.noarch is already installed.\nDependencies resolved.\nNothing to do.\nComplete!\n", "stdout_lines": ["Last metadata expiration check: 2:39:52 ago on Wed Feb 24 14:07:03 2021.", "Package python3-dnf-4.2.23-4.el8.noarch is already installed.", "Dependencies resolved.", "Nothing to do.", "Complete!"]}
Manually test install:
/usr/bin/python3
Python 3.8.3 (default, Aug 18 2020, 13:06:44)
[GCC 8.3.1 20191121 (Red Hat 8.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dnf
/usr/local/lib/python3.8/site-packages/dnf.py:15: UserWarning: The DNF Python API is not currently available via PyPI.
Please install it with your distro package manager (typically called
'python2-dnf' or 'python3-dnf'), and ensure that any virtual environments
needing the API are configured to be able to see the system site packages
directory.
warnings.warn(warning_msg)
Really strange. Try overwriting already installed package...
sudo dnf reinstall python3-dnf
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 2:43:07 ago on Wed Feb 24 14:07:03 2021.
Dependencies resolved.
==========================================================================================================
Package Architecture Version Repository Size
==========================================================================================================
Reinstalling:
python3-dnf noarch 4.2.23-4.el8 rhel-8-baseos-rhui-rpms 526 k
Transaction Summary
==========================================================================================================
Total download size: 526 k
Installed size: 1.8 M
Is this ok [y/N]: y
Downloading Packages:
python3-dnf-4.2.23-4.el8.noarch.rpm 4.7 MB/s | 526 kB 00:00----
----------------------------------------------------------------------------------------------------------
Total 2.3 MB/s | 526 kB 00:00-----
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1-
Reinstalling : python3-dnf-4.2.23-4.el8.noarch 1/2-
Cleanup : python3-dnf-4.2.23-4.el8.noarch 2/2-
Running scriptlet: python3-dnf-4.2.23-4.el8.noarch 2/2-
Verifying : python3-dnf-4.2.23-4.el8.noarch 1/2-
Verifying : python3-dnf-4.2.23-4.el8.noarch 2/2-
Reinstalled:
python3-dnf-4.2.23-4.el8.noarch-------------------------------------------------------------------------
Complete!
Still the error
TASK [compatibility_layer : Install EPEL]
fatal: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com]: FAILED! => {"changed": false, "cmd": "dnf install -y python3-dnf", "msg": "Could not import the dnf python module using /usr/bin/python3 (3.8.3 (default, Aug 18 2020, 13:06:44) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]). Please install `python3-dnf` package or ensure you have specified the correct ansible_python_interpreter.", "rc": 0, "results": [], "stderr": "", "stderr_lines": [], "stdout": "Last metadata expiration check: 2:43:32 ago on Wed Feb 24 14:07:03 2021.\nPackage python3-dnf-4.2.23-4.el8.noarch is already installed.\nDependencies resolved.\nNothing to do.\nComplete!\n", "stdout_lines": ["Last metadata expiration check: 2:43:32 ago on Wed Feb 24 14:07:03 2021.", "Package python3-dnf-4.2.23-4.el8.noarch is already installed.", "Dependencies resolved.", "Nothing to do.", "Complete!"]}
remove ansible_python_interpreter
from hosts file.
RUN STARTS: Wed Feb 24 17:53:07 CET 2021
RUN BREAKS: Wed Feb 24 20:54:00 CET 2021 (very approximate time)
TASK [compatibility_layer : Install package set ['eessi-2021.02-linux-aarch64']]
failed: [ec2-aaa-bbb-ccc-ddd.eu-west-1.compute.amazonaws.com] (item=eessi-2021.02-linux-aarch64) => {"ansible_loop_var": "item", "changed": false, "cmd": ["/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64/usr/bin/emerge", "--noreplace", "--ask=n", "@eessi-2021.02-linux-aarch64"], "item": "eessi-2021.02-linux-aarch64", "msg": "Packages not installed.", "rc": 1, "stderr": "\n!!! All ebuilds that could satisfy \"sys-cluster/lmod\" have been masked.\n!!! One of the following masked packages is required to complete your request:\n- sys-cluster/lmod-9999::gentoo (masked by: missing keyword)\n- sys-cluster/lmod-8.4.20::gentoo (masked by: missing keyword)\n\n(dependency required by \"@eessi-2021.02-linux-aarch64\" [argument])\nFor more information, see the MASKED PACKAGES section in the emerge\nman page or refer to the Gentoo Handbook.\n\n", "stderr_lines": ["", "!!! All ebuilds that could satisfy \"sys-cluster/lmod\" have been masked.", "!!! One of the following masked packages is required to complete your request:", "- sys-cluster/lmod-9999::gentoo (masked by: missing keyword)", "- sys-cluster/lmod-8.4.20::gentoo (masked by: missing keyword)", "", "(dependency required by \"@eessi-2021.02-linux-aarch64\" [argument])", "For more information, see the MASKED PACKAGES section in the emerge", "man page or refer to the Gentoo Handbook.", ""], "stdout": "Calculating dependencies \n * IMPORTANT: 4 news items need reading for repository 'gentoo'.\n * Use eselect news read to view new items.\n\n... done!\n", "stdout_lines": ["Calculating dependencies ", " * IMPORTANT: 4 news items need reading for repository 'gentoo'.", " * Use eselect news read to view new items.", "", "... done!"]}
that is
!!! All ebuilds that could satisfy \"sys-cluster/lmod\" have been masked.
!!! One of the following masked packages is required to complete your request:
- sys-cluster/lmod-9999::gentoo (masked by: missing keyword)
- sys-cluster/lmod-8.4.20::gentoo (masked by: missing keyword)
(dependency required by \"@eessi-2021.02-linux-aarch64\"
The package should have a aarch64
or ~aarch64
keyword, but it doesn't:
/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64/usr/bin/equery meta lmod
* sys-cluster/lmod [gentoo]
Maintainer: [email protected] (Aisha Tammy)
Maintainer: [email protected] (Gentoo Science Project)
Upstream: Remote-ID: TACC/Lmod ID: github
Homepage: https://lmod.readthedocs.io/en/latest
Homepage: https://github.com/TACC/Lmod
Location: /cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64/var/db/repos/gentoo/sys-cluster/lmod
Keywords: 8.4.20:0: ~amd64 ~x86
Keywords: 9999:0:
License: MIT
Suggestion by Bob:
EPREFIX=/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64
vi $EPREFIX/etc/portage/package.accept_keywords
+sys-cluster/lmod ~amd64
+dev-lua/luaposix ~amd64
+dev-lua/lua-bit32 amd64
RUN RESTART: Thu Feb 25 16:35:24 CET 2021
RUN FINISHED: Thu Feb 25 16:59:34 CET 2021
SUCCESS.
[user01@login1 ~]$ man echo
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
man: can't execute bzip2: No such file or directory
[user01@login1 ~]$
Very low priority, but worth noting anyway:
WARNING: One or more repositories have been ignored due to duplicate
profiles/repo_name entries:
/cvmfs/pilot.eessi-hpc.org/2020.09/compat/aarch64/, gentoo, /cvmfs/pilot.eessi-hpc.org/2020.09/compat/aarch64/var/db/repos/gentoo overrides
/cvmfs/pilot.eessi-hpc.org/2020.09/compat/aarch64/tmp/var/db/repos/gentoo
All profiles/repo_name entries must be unique in order to avoid having
duplicates ignored. Set PORTAGE_REPO_DUPLICATE_WARN="0" in
/etc/portage/make.conf if you would like to disable this warning.
When you enter the login bash of Gentoo prefix under ubuntu, you see the following error message:
bash: eval: line 32: syntax error near unexpected token newline' bash: eval: line 32:
Usage: lesspipe '
Checked because ubuntu .bashrc contains the following command by default:
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"
The ubuntu under lesspipe parameterless runtime outputs environment variable configurations such as LESSOPEN, while the lesspipe under gentoo does not support parameter-free runs, so the output is misstated. The solution is to modify the above command to explicitly call the lesspipe command under ubuntu using the full path, as follows (or comment this line directly out of the line?). )ใ
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh /usr/bin/lesspipe)"
Gentoo Prefix issue?
Running emerge --sync
can be incredibly slow in, for instance, Github Actions (usually takes ~3-4h). It could be worth trying to use git instead, see:
https://forums.gentoo.org/viewtopic-t-1101600-start-0.html
https://wiki.gentoo.org/wiki/Project:Portage/Repository_verification
https://wiki.gentoo.org/wiki/Portage_Security#git-mirror_repositories
https://www.reddit.com/r/Gentoo/comments/k9b8zx/switched_from_rsync_to_git_for_gentoo_repo_got/
https://www.serra.me/en/2020/01/gentoo-use-git-for-portage-synchronization/
https://www.reddit.com/r/Gentoo/comments/7xa6p9/emerge_sync_taking_too_long/
After running startprefix the following env information is gone and the interoperability with Windows 10 is lost.
You can't run Windows applications anymore from the Linux env
Is it possible to add this env settings in the gentoo environment
Workaround:
export PATH="old-path:${PATH}"
export WSL_INTEROP=/run/WSL/XXXX_interop (Check with pstree -p the console port XXXX)
Windows 10 commands / applications are working again.
Can we fix this within the startprefix?
10GB (default in AWS) is not sufficient, and leads to bootstrap failures like:
/cvmfs/pilot.eessi-hpc.org/2021.02/compat/linux/aarch64/usr/aarch64-unknown-linux-gnu/bin/ld: final link failed: No space left on device
See https://bugs.gentoo.org/728682
For now we can use an older version of the script, e.g. this one:
https://gitweb.gentoo.org/repo/proj/prefix.git/plain/scripts/bootstrap-prefix.sh?id=ad5ab57d7ce52784d597298c13265c617dbf507c
pip
(to install ReFrame on top of "system" Python)GitPython
+ keyring
+ keyrings.alt
(for EasyBuild GitHub integration)pycodestyle
(for eb --check-style
)We should try to remove Python 2 from our compatibility layer.
prefix_user_defined_trusted_dir
currently only takes /opt/eessi/lib
as an option, but this does not allow the possibility of having release specifc changes (for example overriding the MPI with another ABI compatible one). We should also allow /opt/eessi/2021.03/lib
to address this.
I guess the option should then be prefix_user_defined_trusted_dirs
and some docs about how the list is separated
One of things that Magic Castle does is allow you to start a Desktop session via JupyterHub. This triggers the command
dbus-launch websockify -v --web /opt/jupyterhub/lib64/python3.6/site-packages/jupyter_desktop/share/web/noVNC-1.1.0 --heartbeat 30 --unix-target /tmp/tmpmkuclos5/vnc-socket 38569 -- /bin/sh -c cd /home/ocaisa && vncserver -verbose -xstartup /opt/jupyterhub/lib64/python3.6/site-packages/jupyter_desktop/share/xstartup -SecurityTypes None -rfbunixpath /tmp/tmpmkuclos5/vnc-socket -fg -nolisten tcp
but for our compatability layer this command is failing with a bomb of /cvmfs/pilot.eessi-hpc.org/2021.03/compat/linux/x86_64/usr/bin/dbus-daemon
processes and lots of errors in the logs about
Apr 8 14:37:15 gpu-node1 dbus-daemon[11779]: Cannot initialize inotify
This should make sure that we keep the user-defined-trusted-dirs
whenever we have to reinstall glibc outside of the playbook, e.g. for security updates. The file could be generated by the playbook (instead of having a static file in our overlay, as we had in the past).
@amadio thanks for the suggestion!
To help spot problems like the one fixed by EESSI/gentoo-overlay#39 early on...
I just noticed that the playbook always removes the gentoo
repo dir in the compat layer. Though it's quite harmless, it only needs to do so in the first run, when we actually are changing from rsync to git. For other runs, it should check if this change has already been done.
I gave that a quick shot, didn't get very far:
$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.15.5
BuildVersion: 19F101
$ ./bootstrap-prefix.sh
...
I'm excited! Seems we can finally do something productive now.
Ok, I'm going to do a little bit of guesswork here. Thing is, your
machine appears to be identified by CHOST=x86_64-apple-darwin19.
Great! You appear to have a compiler in your PATH
You don't have /usr/include, this thwarts me to build stuff.
Please execute:
xcode-select --install
or install /usr/include in another way and try running me again.
$ xcode-select --install
xcode-select: error: command line tools are already installed, use "Software Update" to install updates
It's a known issue in Gentoo Prefix, see https://bugs.gentoo.org/show_bug.cgi?id=730476 for details.
What extra tools do we want/need here? vi(m)
was requested by @boegel.
@bedroge @peterstol I think it makes sense to document how the Gentoo Prefix installation we have now at /cvmfs/pilot.eessi-hpc.org/test/gentoo/2020
came to be: which procedure was taken, which "version" of the bootstrap script was used (the one in #4, I presume), which customizations were made compared to upstream, which problems did we run into (ideally incl. link to upstream bug report or fix), etc.
Also, if any additional commands were run (like "emerge ...
"), we should document them somewhere for future reference.
That can be done in here, by adding comments to this issue...
When we do another Prefix installation later (we will probably start over at some point during the pilot, but also for production later), we can use this issue as a reference of lessons learned...
We probably should have something validate the compatibility layer to ensure that it is ready to accept software. There are some installed pieces of software exists, but is there software we wish to ensure isn't present as well?
Also, this script may be extended to verify a new version of the compatibility layer for release, but stripping caches and such?
Rebuilding the Docker images available at https://hub.docker.com/r/eessi/bootstrap-prefix/tags is now done "manually" (using the docker_build_bootstrap_prefix.sh
script from this repo), this should be done automatically (for example when a new version is tagged in this repo).
We should also consider switching to the GitHub Container Registry.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.