Giter VIP home page Giter VIP logo

tilde-lab / yascheduler Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 3.0 520 KB

Yet another cloud computing scheduler for the high-throughput cloud scientific simulations

Home Page: https://mpds.io/search/ab%20initio%20calculations

License: MIT License

Python 92.87% Shell 5.32% PowerShell 1.81%
ab-initio materials-informatics materials-science scheduler hetzner hetzner-cloud hetzner-api upscale queues python

yascheduler's Introduction

Yet another computing scheduler & cloud orchestration engine

DOI PyPI FOSSA Status

Yascheduler is a simple job scheduler designed for submitting scientific calculations and copying back the results from the computing clouds.

Currently it supports several scientific simulation codes in chemistry and solid state physics. Any other scientific simulation code can be supported via the declarative control template system (see yascheduler.conf settings file). There is an example dummy C++ code with its configuration template.

Installation

Use pip and PyPI: pip install yascheduler.

The last updates and bugfixes can be obtained cloning the repository:

git clone https://github.com/tilde-lab/yascheduler.git
pip install yascheduler/

The installation procedure creates the configuration file located at /etc/yascheduler/yascheduler.conf. The file contains credentials for Postgres database access, used directories, cloud providers and scientific simulation codes (called engines). Please check and amend this file with the correct credentials. The database and the system service should then be initialized with yainit script.

Usage

from yascheduler import Yascheduler

yac = Yascheduler()
label = "test assignment"
engine = "pcrystal"
struct_input = str(...)  # simulation control file: crystal structure
setup_input = str(...)  # simulation control file: main setup, can include struct_input
result = yac.queue_submit_task(
    label, {"fort.34": struct_input, "INPUT": setup_input}, engine
)
print(result)

Or run directly in console with yascheduler (use a key -l DEBUG to change the log level).

Supervisor config reads e.g.:

[program:scheduler]
command=/usr/local/bin/yascheduler
user=root
autostart=true
autorestart=true
stderr_logfile=/data/yascheduler.log
stdout_logfile=/data/yascheduler.log

File paths can be set using the environment variables:

  • YASCHEDULER_CONF_PATH

    Configuration file.

    Default: /etc/yascheduler/yascheduler.conf

  • YASCHEDULER_LOG_PATH

    Log file path.

    Default: /var/log/yascheduler.log

  • YASCHEDULER_PID_PATH

    PID file.

    Default: /var/run/yascheduler.pid

Configuration File Reference

Database Configuration [db]

Connection to a PostgreSQL database.

  • user

    The username to connect to the PostgreSQL server with.

  • password

    The user password to connect to the server with. This parameter is optional

  • host

    The hostname of the PostgreSQL server to connect with.

  • port

    The TCP/IP port of the PostgreSQL server instance.

    Default: 5432

  • database

    The name of the database instance to connect with.

    Default: Same as user

Local Settings [local]

  • data_dir

    Path to root directory of local data files. Can be relative to the current working directory.

    Default: ./data (but it's always a good idea to set up explicitly!)

    Example: /srv/yadata

  • tasks_dir

    Path to directory with tasks results.

    Default: tasks under data_dir

    Example: %(data_dir)s/tasks

  • keys_dir

    Path to directory with SSH keys.

    Default: keys under data_dir

    Example: %(data_dir)s/keys

  • engines_dir

    Path to directory with engines repository.

    Default: engines under data_dir

    Example: %(data_dir)s/engines

  • webhook_reqs_limit

    Maximum number of in-flight webhook http requests.

    Default: 5

  • conn_machine_limit

    Maximum number of concurrent SSH connection's connect requests.

    Default: 10

  • conn_machine_pending

    Maximum number of pending SSH connection's connect requests.

    Default: 10

  • allocate_limit

    Maximum number of concurrent task or node allocation requests.

    Default: 20

  • allocate_pending

    Maximum number of pending task or node allocation requests.

    Default: 1

  • consume_limit

    Maximum number of concurrent task's results downloads.

    Default: 20

  • consume_pending

    Maximum number of pending task's results downloads.

    Default: 1

  • deallocate_limit

    Maximum number of concurrent node deallocation requests.

    Default: 5

  • deallocate_pending

    Maximum number of pending node deallocation requests.

    Default: 1

Remote Settings [remote]

  • data_dir

    Path to root directory of data files on remote node. Can be relative to the remote current working directory (usually $HOME).

    Default: ./data

    Example: /src/yadata

  • tasks_dir

    Path to directory with tasks results on remote node.

    Default: tasks under data_dir

    Example: %(data_dir)s/tasks

  • engines_dir

    Path to directory with engines on remote node.

    Default: engines under data_dir

    Example: %(data_dir)s/engines

  • user

    Default ssh username.

    Default: root

  • jump_user

    Username of default SSH jump host (if used).

  • jump_host

    Host of default SSH jump host (if used).

Providers [clouds]

All cloud providers settings are set in the [cloud] group. Each provider has its own settings prefix.

These settings are common to all the providers:

  • *_max_nodes

    The maximum number of nodes for a given provider. The provider is not used if the value is less than 1.

  • *_user

    Per provider override of remote.user.

  • *_priority

    Per provider priority of node allocation. Sorted in descending order, so the cloud with the highest value is the first.

  • *_idle_tolerance

    Per provider idle tolerance (in seconds) for deallocation of nodes.

    Default: different for providers, starting from 120 seconds.

  • *_jump_user

    Username of this cloud SSH jump host (if used).

  • *_jump_host

    Host of this cloud SSH jump host (if used).

Hetzner

Settings prefix is hetzner.

  • hetzner_token

    API token with Read & Write permissions for the project.

  • hetzner_server_type

    Server type (size).

    Default: cx51

  • hetzner_image_name

    Image name for new nodes.

    Default: debian-10

Azure

Azure Cloud should be pre-configured for yascheduler. See Cloud Providers.

Settings prefix is az.

  • az_tenant_id

    Tenant ID of Azure Active Directory.

  • az_client_id

    Application ID.

  • az_client_secret

    Client Secret value from the Application Registration.

  • az_subscription_id

    Subscription ID

  • az_resource_group

    Resource Group name.

    Default: yascheduler-rg

  • az_user

    SSH username. root is not supported.

  • az_location

    Default location for resources.

    Default: westeurope

  • az_vnet

    Virtual network name.

    Default: yascheduler-vnet

  • az_subnet

    Subnet name.

    Default: yascheduler-subnet

  • az_nsg

    Network security group name.

    Default: yascheduler-nsg

  • az_vm_image

    OS image name.

    Default: Debian

  • az_vm_size

    Machine size.

    Default: Standard_B1s

UpCloud

Settings prefix is upcloud.

  • upcloud_login

    Username.

  • upcloud_password

    Password.

Engines [engine.*]

Supported engines should be defined in the section(s) [engine.name]. The name is alphanumeric string to represent the real engine name. Once set, it cannot be changed later.

  • platforms

    List of supported platform, separated by space or newline.

    Default: debian-10 Example: mY-cOoL-OS another-cool-os

  • platform_packages

    A list of required packages, separated by space or newline, which will be installed by the system package manager.

    Default: [] Example: openmpi-bin wget

  • deploy_local_files

    A list of filenames, separated by space or newline, which will be copied from local %(engines_dir)s/%(engine_name)s to remote %(engines_dir)s/%(engine_name)s. Conflicts with deploy_local_archive and deploy_remote_archive.

    Example: dummyengine

  • deploy_local_archive

    A name of the local archive (.tar.gz) which will be copied from local %(engines_dir)s/%(engine_name)s to the remote machine and then unarchived to the %(engines_dir)s/%(engine_name)s. Conflicts with deploy_local_archive and deploy_remote_archive.

    Example: dummyengine.tar.gz

  • deploy_remote_archive

    The url to the engine arhive (.tar.gz) which will be downloaded to the remote machine and then unarchived to the %(engines_dir)s/%(engine_name)s. Conflicts with deploy_local_archive and deploy_remote_archive.

    Example: https://example.org/dummyengine.tar.gz

    Example:

    cp {task_path}/INPUT OUTPUT && mpirun -np {ncpus} --allow-run-as-root \
      -wd {task_path} {engine_path}/Pcrystal >> OUTPUT 2>&1
    

    Example: {engine_path}/gulp < INPUT > OUTPUT

  • check_pname

    Process name used to check that the task is still running. Conflicts with check_cmd.

    Example: dummyengine

  • check_cmd

    Command used to check that the task is still running. Conflicts with check_pname. See also check_cmd_code.

    Example: ps ax -ocomm= | grep -q dummyengine

  • check_cmd_code

    Expected exit code of command from check_cmd. If code matches than task is running.

    Default: 0

  • sleep_interval

    Interval in seconds between the task checks. Set to a higher value if you are expecting long running jobs.

    Default: 10

  • input_files

    A list of task input file names, separated by a space or new line, that will be copied to the remote directory of the task before it is started. The first input is considered as the main input.

    Example: INPUT sibling.file

  • output_files

    A list of task output file names, separated by a space or new line, that will be copied from the remote directory of the task after it is finished.

    Example: INPUT OUTPUT

Aiida Integration

See the detailed instructions for the MPDS-AiiDA-CRYSTAL workflows as well as the ansible-mpds repository. In essence:

ssh aiidauser@localhost # important
reentry scan
verdi computer setup
verdi computer test $COMPUTER
verdi code setup

License

FOSSA Status

yascheduler's People

Contributors

ansobolev avatar blokhin avatar fossabot avatar github-actions[bot] avatar knopki avatar mend-bolt-for-github[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

knopki fossabot

yascheduler's Issues

CloudAPIManager.(De)AllocatorThreads should not be started if no active cloud APIs

# python yascheduler/yascheduler/scheduler.py 
INFO:yascheduler.CloudAPIManager:Active cloud APIs: -
INFO:yascheduler.CloudAPIManager.AllocatorThread[0]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[1]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[2]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[3]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[4]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[5]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[6]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[7]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[8]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[9]:Thread started
INFO:yascheduler.CloudAPIManager.DellocatorThread[0]:Thread started
INFO:yascheduler.CloudAPIManager.DellocatorThread[1]:Thread started
INFO:Yascheduler.WebhookThread[0]:Thread started
INFO:Yascheduler.WebhookThread[1]:Thread started

CVE-2023-23931 (Medium) detected in cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl

CVE-2023-23931 - Medium Severity Vulnerability

Vulnerable Library - cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/37/19/234484df6fc7bdf4cf81cd4a89f600fce9f8f7a4bc1b307d7abbcd382b64/cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt,/tmp/ws-scm/yascheduler

Dependency Hierarchy:

  • cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers. In affected versions Cipher.update_into would accept Python objects which implement the buffer protocol, but provide only immutable buffers. This would allow immutable objects (such as bytes) to be mutated, thus violating fundamental rules of Python and resulting in corrupted output. This now correctly raises an exception. This issue has been present since update_into was originally introduced in cryptography 1.8.

Publish Date: 2023-02-07

URL: CVE-2023-23931

CVSS 3 Score Details (4.8)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: High
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: Low
    • Availability Impact: Low

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2023-23931

Release Date: 2023-02-07

Fix Resolution: cryptography - 39.0.1


Step up your Open Source Security Game with Mend here

CVE-2022-23491 (High) detected in certifi-2022.9.24-py3-none-any.whl

CVE-2022-23491 - High Severity Vulnerability

Vulnerable Library - certifi-2022.9.24-py3-none-any.whl

Python package for providing Mozilla's CA Bundle.

Library home page: https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec779265a028e156ce509630e/certifi-2022.9.24-py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt,/tmp/ws-scm/yascheduler

Dependency Hierarchy:

  • certifi-2022.9.24-py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

Certifi is a curated collection of Root Certificates for validating the trustworthiness of SSL certificates while verifying the identity of TLS hosts. Certifi 2022.12.07 removes root certificates from "TrustCor" from the root store. These are in the process of being removed from Mozilla's trust store. TrustCor's root certificates are being removed pursuant to an investigation prompted by media reporting that TrustCor's ownership also operated a business that produced spyware. Conclusions of Mozilla's investigation can be found in the linked google group discussion.

Publish Date: 2022-12-07

URL: CVE-2022-23491

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: High
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2022-23491

Release Date: 2022-12-07

Fix Resolution: certifi - 2022.12.07


Step up your Open Source Security Game with Mend here

CVE-2022-24302 (Medium) detected in paramiko-2.9.2-py2.py3-none-any.whl

CVE-2022-24302 - Medium Severity Vulnerability

Vulnerable Library - paramiko-2.9.2-py2.py3-none-any.whl

SSH2 protocol library

Library home page: https://files.pythonhosted.org/packages/60/3e/84c52fb09db84548c5d366bac8863125c6db099b87495e04c8af5527e6f1/paramiko-2.9.2-py2.py3-none-any.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt,/tmp/ws-scm/yascheduler

Dependency Hierarchy:

  • fabric-2.6.0-py2.py3-none-any.whl (Root Library)
    • paramiko-2.9.2-py2.py3-none-any.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

In Paramiko before 2.10.1, a race condition (between creation and chmod) in the write_private_key_file function could allow unauthorized information disclosure.

Publish Date: 2022-03-17

URL: CVE-2022-24302

CVSS 3 Score Details (5.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Local
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.paramiko.org/changelog.html

Release Date: 2022-03-17

Fix Resolution: paramiko - 2.10.1


Step up your Open Source Security Game with WhiteSource here

WS-2022-0365 (High) detected in cryptography-37.0.4-cp36-abi3-manylinux_2_24_x86_64.whl - autoclosed

WS-2022-0365 - High Severity Vulnerability

Vulnerable Library - cryptography-37.0.4-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/86/82/5e81dbf8a94c011e5240595149626d92e78a110f01311face1ab08431566/cryptography-37.0.4-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt,/tmp/ws-scm/yascheduler

Dependency Hierarchy:

  • cryptography-37.0.4-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

pyca/cryptography's wheels include a statically linked copy of OpenSSL. The versions of OpenSSL included in cryptography 37.0.0-38.0.3 are vulnerable to a number of security issues. If you are building cryptography source ("sdist") then you are responsible for upgrading your copy of OpenSSL. Only users installing from wheels built by the cryptography project (i.e., those distributed on PyPI) need to update their cryptography versions.

Publish Date: 2022-11-02

URL: WS-2022-0365

CVSS 3 Score Details (9.8)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: High
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: GHSA-39hc-v87j-747x

Release Date: 2022-11-02

Fix Resolution: cryptography - 38.0.3


Step up your Open Source Security Game with Mend here

PlatformGuessFailed MacOS

When I tried to create a worker using the command:
yasetnode [email protected] --skip-setup
I get the next error:

File "/Users/my_user_name/anaconda3/lib/python3.11/site-packages/yascheduler/remote_machine/remote_machine.py", line 228, in create
    raise PlatformGuessFailed()
yascheduler.remote_machine.exc.PlatformGuessFailed

I'm running on MacOS. The platform was not specified in the config (as it was not suitable).
@knopki , I'm guessing it wasn't tested well enough on MacOS.

paramiko.ssh_exception.SSHException: No authentication methods available

On my Linux laptop there is an issue starting scheduler via supervisor:

INFO:yascheduler.CloudAPIManager:Active cloud APIs: -
INFO:yascheduler.CloudAPIManager.AllocatorThread[0]:Thread started
INFO:yascheduler.CloudAPIManager.AllocatorThread[1]:Thread started
INFO:yascheduler.CloudAPIManager.DeallocatorThread[0]:Thread started
INFO:yascheduler.CloudAPIManager.DeallocatorThread[1]:Thread started
INFO:Yascheduler.WebhookThread[0]:Thread started
DEBUG:yascheduler:Available computing engines: dummy
Traceback (most recent call last):
  File "/usr/bin/yascheduler", line 619, in <module>
    daemonize()
  File "/usr/bin/yascheduler", line 595, in daemonize
    step()
  File "/usr/bin/yascheduler", line 491, in step
    yac.ssh_connect(all_nodes)
  File "/usr/bin/yascheduler", line 261, in ssh_connect
    self.remote_machines[ip] = MyParamikoMachine.create_machine(
  File "/usr/local/lib/python3.8/dist-packages/yascheduler/ssh.py", line 40, in create_machine
    return connect()
  File "/usr/local/lib/python3.8/dist-packages/plumbum/machines/paramiko_machine.py", line 270, in __init__
    self._client.connect(host, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/paramiko/client.py", line 435, in connect
    self._auth(
  File "/usr/local/lib/python3.8/dist-packages/paramiko/client.py", line 767, in _auth
    raise SSHException("No authentication methods available")
paramiko.ssh_exception.SSHException: No authentication methods available

Supervisor config is

[supervisord]
logfile=/var/log/supervisor/supervisord.log
pidfile=/var/run/supervisord.pid
childlogdir=/var/log/supervisor

[inet_http_server]
port = 127.0.0.1:7060

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=http://127.0.0.1:7060
username=
password=

[program:scheduler]
command=/usr/bin/yascheduler
user=postgres
autostart=false
autorestart=true
stderr_logfile=/data/yascheduler.log
stdout_logfile=/data/yascheduler.log
$ yanodes
ip=127.0.0.1 ncpus=MAX enabled=True occupied_by=- (task_id=-)

CVE-2023-0286 (High) detected in cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl - autoclosed

CVE-2023-0286 - High Severity Vulnerability

Vulnerable Library - cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl

cryptography is a package which provides cryptographic recipes and primitives to Python developers.

Library home page: https://files.pythonhosted.org/packages/37/19/234484df6fc7bdf4cf81cd4a89f600fce9f8f7a4bc1b307d7abbcd382b64/cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl

Path to dependency file: /requirements.txt

Path to vulnerable library: /requirements.txt,/tmp/ws-scm/yascheduler

Dependency Hierarchy:

  • cryptography-38.0.3-cp36-abi3-manylinux_2_24_x86_64.whl (Vulnerable Library)

Found in base branch: master

Vulnerability Details

There is a type confusion vulnerability relating to X.400 address processing inside an X.509 GeneralName. X.400 addresses were parsed as an ASN1_STRING but the public structure definition for GENERAL_NAME incorrectly specified the type of the x400Address field as ASN1_TYPE. This field is subsequently interpreted by the OpenSSL function GENERAL_NAME_cmp as an ASN1_TYPE rather than an ASN1_STRING. When CRL checking is enabled (i.e. the application sets the X509_V_FLAG_CRL_CHECK flag), this vulnerability may allow an attacker to pass arbitrary pointers to a memcmp call, enabling them to read memory contents or enact a denial of service. In most cases, the attack requires the attacker to provide both the certificate chain and CRL, neither of which need to have a valid signature. If the attacker only controls one of these inputs, the other input must already contain an X.400 address as a CRL distribution point, which is uncommon. As such, this vulnerability is most likely to only affect applications which have implemented their own functionality for retrieving CRLs over a network.

Publish Date: 2023-02-08

URL: CVE-2023-0286

CVSS 3 Score Details (7.4)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: High
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.openssl.org/news/vulnerabilities.html

Release Date: 2023-02-08

Fix Resolution: openssl-3.0.8


Step up your Open Source Security Game with Mend here

Keys directory was not automatically created

root@metis:/data/metis-gui# yasetnode 10.99.245.5 --skip-setup
Traceback (most recent call last):
  File "/usr/local/bin/yasetnode", line 8, in <module>
    sys.exit(manage_node())
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 430, in manage_node
    asyncio.run(_manage_node())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 414, in _manage_node
    client_keys=config.local.get_private_keys(),
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/config/local.py", line 43, in get_private_keys
    return list(filepaths)
  File "/usr/lib/python3.9/pathlib.py", line 1149, in iterdir
    for name in self._accessor.listdir(self):
FileNotFoundError: [Errno 2] No such file or directory: '/data/yadata/keys'

Keys are not automatically loaded

A normal connection succeeds:

root@aiida9:~# ssh X.X.X.X
Linux labs 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64

...
root@labs:~# logout
Connection to X.X.X.X closed.

but scheduler connection fails:

root@aiida9:~# yasetnode X.X.X.X~4
Traceback (most recent call last):
  File "/usr/local/bin/yasetnode", line 8, in <module>
    sys.exit(manage_node())
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 430, in manage_node
    asyncio.run(_manage_node())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 411, in _manage_node
    machine = await RemoteMachine.create(
  File "/usr/local/lib/python3.9/dist-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/remote_machine/remote_machine.py", line 192, in create
    conn = await asyncssh.connection.connect(
  File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 7834, in connect
    return await asyncio.wait_for(
  File "/usr/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 447, in _connect
    await options.waiter
asyncssh.misc.PermissionDenied: Permission denied

Hetzner: calc engine gets asleep and outputs nothing approx. in 10% cases

Manifested as:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1872 root      20   0  269344  20472  13680 S   7.0   0.1 174:17.89 Pcrystal
 1874 root      20   0  269344  20172  13380 S   6.7   0.1 173:49.32 Pcrystal
 1875 root      20   0  269344  20128  13336 S   6.7   0.1 174:53.19 Pcrystal
 1876 root      20   0  269344  20388  13600 S   6.7   0.1 173:32.07 Pcrystal
 1886 root      20   0  269344  20204  13412 S   6.3   0.1 174:08.48 Pcrystal
 1877 root      20   0  269344  20132  13340 S   6.0   0.1 174:18.18 Pcrystal
 1881 root      20   0  269344  20404  13612 S   5.7   0.1 175:58.19 Pcrystal

and

root@node-dwsxhftb:~# cat /data/20201212_194555_dury/OUTPUT
[node-dwsxhftb][[36298,1],6][btl_tcp.c:559:mca_btl_tcp_recv_blocking] recv(16) failed: Connection reset by peer (104)
[node-dwsxhftb][[36298,1],7][btl_tcp.c:559:mca_btl_tcp_recv_blocking] recv(16) failed: Connection reset by peer (104)
[node-dwsxhftb][[36298,1],7][btl_tcp.c:559:mca_btl_tcp_recv_blocking] recv(16) failed: Connection reset by peer (104)

Fail to handle connection issues

The option -v (and -o) for yastatus lists the excerpts from the output logs at each the active machine. The problem is that sometimes the status changes right at the time of the listing, so the machine gets no more available. Then the errors like below occur (should be easy to handle).

..................................................ID2456 aiida-33160 at [email protected]:hetzner:data/tasks/20221231_044118_2456
INFO:backoff:Backing off create(...) for 0.8s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 0.5s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 1.9s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 2.5s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 4.8s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 6.3s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 4.5s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 4.2s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 1.9s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 3.7s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
INFO:backoff:Backing off create(...) for 1.1s (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
ERROR:backoff:Giving up create(...) after 12 tries (OSError: [Errno 113] Connect call failed ('65.109.143.81', 22))
Traceback (most recent call last):
  File "/usr/local/bin/yastatus", line 8, in <module>
    sys.exit(check_status())
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 237, in check_status
    asyncio.run(_check_status())
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/utils.py", line 148, in _check_status
    machine = await RemoteMachine.create(
  File "/usr/local/lib/python3.9/dist-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/yascheduler/remote_machine/remote_machine.py", line 192, in create
    conn = await asyncssh.connection.connect(
  File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 7834, in connect
    return await asyncio.wait_for(
  File "/usr/lib/python3.9/asyncio/tasks.py", line 442, in wait_for
    return await fut
  File "/usr/local/lib/python3.9/dist-packages/asyncssh/connection.py", line 437, in _connect
    _, session = await loop.create_connection(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1056, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.9/asyncio/base_events.py", line 1041, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib/python3.9/asyncio/base_events.py", line 955, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 502, in sock_connect
    return await fut
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 537, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
OSError: [Errno 113] Connect call failed ('65.109.143.81', 22)

win32 version of dummy engine works differently

The arguments are explicitly stated:

root@debian:~/data/tasks/20220708_052032_106# ~/data/engines/dummy/dummyengine 1.input 2.input 3.input
Dummy engine output
processing arg as a file: 1.input
filename to be used: 1.input.out
processing arg as a file: 2.input
filename to be used: 2.input.out
processing arg as a file: 3.input
filename to be used: 3.input.out
sleeping 5 seconds

but win32 version in powershell...

PS C:\Users\yascheduler\data\tasks\20220708_061242_114> C:\Users\yascheduler\data\engines\dummy-win\dummy.exe 1.input 2.input 3.input
Dummy engine output
processing arg as a file: 1.input
filename to be used: └1.input.out
processing arg as a file: 2.input
filename to be used: Éü▄2.input.out
processing arg as a file: 3.input
filename to be used: `↕▄3.input.out
sleeping 0 seconds

^^^ File names are broken.

With full paths:

PS C:\Users\yascheduler\data\tasks\20220708_034324_71> ~/data/engines/dummy-win/dummy (Resolve-Path '*')
Dummy engine output
processing arg as a file: C:\Users\yascheduler\data\tasks\20220708_034324_71\1.input
filename to be used: └C:\Users\yascheduler\data\tasks\20220708_034324_71\1.input.out
processing arg as a file: C:\Users\yascheduler\data\tasks\20220708_034324_71\2.input
filename to be used: ╚éùC:\Users\yascheduler\data\tasks\20220708_034324_71\2.input.out
processing arg as a file: C:\Users\yascheduler\data\tasks\20220708_034324_71\3.input
filename to be used: αûùC:\Users\yascheduler\data\tasks\20220708_034324_71\3.input.out
sleeping 3 seconds

AiiDA-Yascheduler consistency CI check

#!/bin/bash

if [ $( verdi process list | grep RUNNING | wc -l ) -ne $( yastatus | grep RUNNING | wc -l ) ]; then
    echo "AiiDA-Yascheduler consistency error!";
fi

CVE-2022-40899 (High) detected in future-0.18.2.tar.gz

CVE-2022-40899 - High Severity Vulnerability

Vulnerable Library - future-0.18.2.tar.gz

Clean single-source support for Python 3 and 2

Library home page: https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz

Path to dependency file: /tmp/ws-scm/yascheduler

Path to vulnerable library: /tmp/ws-scm/yascheduler,/requirements.txt

Dependency Hierarchy:

  • future-0.18.2.tar.gz (Vulnerable Library)

Found in base branch: master

Vulnerability Details

An issue discovered in Python Charmers Future 0.18.2 and earlier allows remote attackers to cause a denial of service via crafted Set-Cookie header from malicious web server.

Publish Date: 2022-12-23

URL: CVE-2022-40899

CVSS 3 Score Details (7.5)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.


Step up your Open Source Security Game with Mend here

Azure installation is fragile

root@corvus:/home/eb# pip install --upgrade azure-mgmt-network
Requirement already up-to-date...

but

root@corvus:/home/eb# yascheduler
Traceback (most recent call last):
  File "/usr/bin/yascheduler", line 335, in <module>
    daemonize()
  File "/usr/bin/yascheduler", line 254, in daemonize
    yac, clouds = clouds.yascheduler, yac.clouds = Yascheduler(config), CloudAPIManager(config)
  File "/usr/local/lib/python3.8/dist-packages/yascheduler/clouds/__init__.py", line 182, in __init__
    self.apis[name] = load_cloudapi(name)(config)
  File "/usr/local/lib/python3.8/dist-packages/yascheduler/clouds/__init__.py", line 217, in load_cloudapi
    cloudapi_mod = import_module('.' + name, package='yascheduler.clouds')
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/yascheduler/clouds/az.py", line 48, in <module>
    from azure.mgmt.network.v2021_03_01 import NetworkManagementClient
ModuleNotFoundError: No module named 'azure.mgmt.network.v2021_03_01'

Daemon exits on Enter press

(may be this is not a bug, but feature?)
Pressing ENTER leads to:

"/lib/python3.6/site-packages/invoke/terminals.py", line 231, in bytes_to_read
    fionread = fcntl.ioctl(input_, termios.FIONREAD, "  ")
AttributeError: module 'termios' has no attribute 'FIONREAD'

CVE-2020-14422 (Medium) detected in ipaddress-1.0.23-py2.py3-none-any.whl

CVE-2020-14422 - Medium Severity Vulnerability

Vulnerable Library - ipaddress-1.0.23-py2.py3-none-any.whl

IPv4/IPv6 manipulation library

Library home page: https://files.pythonhosted.org/packages/c2/f8/49697181b1651d8347d24c095ce46c7346c37335ddc7d255833e7cde674d/ipaddress-1.0.23-py2.py3-none-any.whl

Path to dependency file: yascheduler

Path to vulnerable library: yascheduler,yascheduler/requirements.txt

Dependency Hierarchy:

  • ipaddress-1.0.23-py2.py3-none-any.whl (Vulnerable Library)

Found in HEAD commit: 0de1620738529bbd5f2928bd299fbe3304a41ee4

Found in base branch: master

Vulnerability Details

Lib/ipaddress.py in Python through 3.8.3 improperly computes hash values in the IPv4Interface and IPv6Interface classes, which might allow a remote attacker to cause a denial of service if an application is affected by the performance of a dictionary containing IPv4Interface or IPv6Interface objects, and this attacker can cause many dictionary entries to be created. This is fixed in: v3.5.10, v3.5.10rc1; v3.6.12; v3.7.9; v3.8.4, v3.8.4rc1, v3.8.5, v3.8.6, v3.8.6rc1; v3.9.0, v3.9.0b4, v3.9.0b5, v3.9.0rc1, v3.9.0rc2.

Publish Date: 2020-06-18

URL: CVE-2020-14422

CVSS 3 Score Details (5.9)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: High
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: None
    • Integrity Impact: None
    • Availability Impact: High

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://security-tracker.debian.org/tracker/CVE-2020-14422

Release Date: 2020-06-18

Fix Resolution: 3.5.3-1+deb9u2, 3.7.3-2+deb10u2, 3.8.4~rc1-1


Step up your Open Source Security Game with WhiteSource here

Option provider_max_nodes was not respected

root@aiida-cloud-rev-2:~# yanodes | grep hetzner | wc -l
48
root@aiida-cloud-rev-2:~# yanodes | grep upcloud | wc -l
42
root@aiida-cloud-rev-2:~# cat /etc/yascheduler/yascheduler.conf | grep max_
upcloud_max_nodes = 40
hetzner_max_nodes = 50

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.