Giter VIP home page Giter VIP logo

Comments (29)

alexander-veit avatar alexander-veit commented on July 18, 2024 2

Please use v3.2.1 (or higher) from now on. We are looking at the outdated instance types next.

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024 1

I see. It's probably coming from here.

Since mem and cpu is set, a suitable instance type is determined by our Benchmark package. It hasn't been updated in quite a while and it probably does not take the region into account where you are trying to run the job. Therefore, it might suggest instances that are not available in your region, which would cause the error you are seeing.

I will bring this up internally. This is certainly something we need to look at.

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024 1

Yeah, I briefly looked at it. Definitely needs an update. Thanks for bringing the issue to our attention.

from tibanna.

nhartwic avatar nhartwic commented on July 18, 2024 1

Can I also recommend that when instance_type is set, tibanna should skip trying to automatically determine instance type? This current behavior strikes me as counterintuitive and undesirable.

from tibanna.

nhartwic avatar nhartwic commented on July 18, 2024 1

I've upgraded to 3.3 and think the update has resolved all outstanding issues. I'm not the person who opened this issue, but I'd call it closed.

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

It looks like you are trying to launch one of these instance types: [m1.medium, m3.medium, t4g.medium.search]. Are you sure these are valid? I can't find them here. Could you just try t4g.medium?

from tibanna.

Bioinf-usr avatar Bioinf-usr commented on July 18, 2024

Thanks for reverting.

Here is what I tried now

snakemake --tibanna --tibanna-config spot_instance=true behavior_on_capacity_limit=retry_without_spot instance_type=t4g.medium availability_zone=ap-south-1 --default-remote-prefix=<bucketname> -s test.yaml --jobs 1

The error

"errorMessage": "An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [m1.medium, m3.medium]"
Could it be that "m1.medium, m3.medium" are hardcoded somewhere?

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

Hm... not in Tibanna. How does your test.yaml look like?

from tibanna.

Bioinf-usr avatar Bioinf-usr commented on July 18, 2024

Here is what I have

rule a:
    output:
        "test.pdf"
    shell:
        "https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

Looking at the Snakemake docs, do you have a config.json in your folder or anything that specifies instance_type?

from tibanna.

Bioinf-usr avatar Bioinf-usr commented on July 18, 2024

No, nothing on my side. I don't have a config file.

from tibanna.

Bioinf-usr avatar Bioinf-usr commented on July 18, 2024

Hi,

As a follow-up when I tried to use the api. I get this following error.

Traceback (most recent call last):
  File "/home/ec2-user/.local/bin/tibanna", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/.local/lib/python3.7/site-packages/tibanna/__main__.py", line 580, in main
    subcommandf(*sc_args)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/tibanna/__main__.py", line 449, in log
    top=top, top_latest=top_latest))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 844-845: ordinal not in range(256)

Here is my Snakefile

rule a:
    output:
        "test.pdf"
    retries: 3
    shell:
        "https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

Here is my config json

{
  "args": {
    "language": "snakemake",
    "container_image": "snakemake/snakemake",
    "command": "snakemake",
    "snakemake_main_filename": "Snakefile",
    "snakemake_directory_local":"/home/ec2-user",
    "output_S3_bucket": "dummy"
  },
  "config": {
    "instance_type": "t3.micro",
    "ebs_size": 10,
    "EBS_optimized": true,
    "log_bucket": "dummy"
  }

}

Please let me know if it helps.

Thank you.

from tibanna.

Bioinf-usr avatar Bioinf-usr commented on July 18, 2024

Hi,

Just wondering if you had a chance to take a look at the errors. Please let me know if you need any further information.

Thank you.

from tibanna.

SooLee avatar SooLee commented on July 18, 2024

Hi @Bioinf-usr, your shell command doesn't look executable: https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output} You might have missed a binary or something in the command?

from tibanna.

Bioinf-usr avatar Bioinf-usr commented on July 18, 2024

Hi,

Thanks you are right, that was an issue but even after fixing it. I am running into another error while using the api. I used the following command to get the error log.

tibanna log --job-id=4Gh64ovZcaXq

Below is the error.

Error: you need to specify the maximum number of CPU cores to be used at the same time. If you want to use N cores, say --cores N or -cN. For all cores on your system (be sure that this is appropriate) use --cores all. For no parallelization use --cores 1 or -c1. <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>

Here is my command

API().run_workflow(input_json="test.json")

My test.json

{
  "args": {
    "language": "snakemake",
    "container_image": "snakemake/snakemake",
    "command": "snakemake",
    "snakemake_main_filename": "Snakefile",
    "snakemake_directory_local":"/home/ec2-user",
    "output_S3_bucket": "dummy"
  },
  "config": {
    "instance_type": "t3.micro",
    "ebs_size": 10,
    "EBS_optimized": true,
    "log_bucket": "dummy",
    "cores": 1
  }

}

Here is my snakefile

rule a:
    output:
        "test.pdf"
    retries: 3
    shell:
        "wget https://www.cyberciti.biz/files/sticker/sticker_book.pdf -o {output}"

Hope it helps. Please note that this is only for the api. The problem with using snakemake as a standalone is still the same.

Thank you.

from tibanna.

SooLee avatar SooLee commented on July 18, 2024

Try putting an old tag to "container_image": "snakemake/snakemake", e.g. "container_image": "snakemake/snakemake:v6.1.0" (or some other version) - this may be related to a newer version of snakemake.

from tibanna.

nhartwic avatar nhartwic commented on July 18, 2024

I'm encountering a similar error. It seems like snakemake/tibanna is trying to use an instance_type that doesn't exist. It isn't obvious where that specific instance_type is coming from though. Relevant messages from cloudwatch logs below

[tibanna.ec2_utils] DEBUG: 23-02-27 20:01:16 - self.cfg.as_dict() = {
    "run_name": "snakemake-job-frADkYJMPBna-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2a",
    "use_benchmark": False,
    "instance_type": "",
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[DEBUG]	2023-02-27T20: 01: 16.522Z	342b9645-c2d3-48d4-82f9-0e0e47bd99da	self.cfg.as_dict() = {
    "run_name": "snakemake-job-frADkYJMPBna-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2a",
    "use_benchmark": False,
    "instance_type": "",
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [cr1.8xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 374, in __init__
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 426, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)

...It sure doesn't seem like "cr1.8xlarge" is coming from snakemake, as it isn't in any of the configs, but I can't see where it would be coming from within Tibanna either.

from tibanna.

nhartwic avatar nhartwic commented on July 18, 2024

Attempting to manually set instance_type didn't work. Relevant cloudwatch log messages below....

[tibanna.ec2_utils
] DEBUG: 23-02-27 20: 52: 23 - self.cfg.as_dict() = {
    "run_name": "snakemake-job-nEyH0pfXEE8l-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2",
    "instance_type": "m5a.4xlarge",
    "use_benchmark": False,
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[DEBUG
]	2023-02-27T20: 52: 23.936Z	25305e82-9267-4340-9d7d-ec3cc9769018	self.cfg.as_dict() = {
    "run_name": "snakemake-job-nEyH0pfXEE8l-group-racon.ecoli.v4.miniasm",
    "mem": 234.375,
    "cpu": 32,
    "ebs_size": 3907,
    "log_bucket": "salk-tm-logs",
    "root_ebs_size": 32,
    "availability_zone": "us-west-2",
    "instance_type": "m5a.4xlarge",
    "use_benchmark": False,
    "EBS_optimized": False,
    "ebs_iops": "",
    "ebs_throughput": "",
    "password": "",
    "key_name": "",
    "spot_duration": "",
    "security_group": "",
    "subnet": "",
    "ebs_type": "gp3",
    "shutdown_min": "now",
    "spot_instance": False,
    "behavior_on_capacity_limit": "fail",
    "cloudwatch_dashboard": False,
    "public_postrun_json": False,
    "encrypt_s3_upload": False,
    "awsf_image": "4dndcic/tibanna-awsf:3.1.0",
    "mem_as_is": False,
    "ebs_size_as_is": False,
    "ami_id": "",
    "ami_per_region": {
        "x86": {
            "us-east-1": "ami-06e2266f85063aabc",
            "us-east-2": "ami-03a4e3e84b6a1813d",
            "us-west-1": "ami-0c5e8147be760a354",
            "us-west-2": "ami-068589fed9c8d5950",
            "ap-south-1": "ami-05ef59bc4f359c93b",
            "ap-northeast-2": "ami-0d8618a76aece8a8e",
            "ap-southeast-1": "ami-0c22dc3b05714bda1",
            "ap-southeast-2": "ami-03dc109bbf412aac5",
            "ap-northeast-1": "ami-0f4c520515c41ff46",
            "ca-central-1": "ami-01af127710fadfe74",
            "eu-central-1": "ami-0887bcb1c901c1769",
            "eu-west-1": "ami-08db59692e4371ea6",
            "eu-west-2": "ami-036d3ce7a21e07012",
            "eu-west-3": "ami-0cad0ec4160a6b940",
            "eu-north-1": "ami-00a6f0f9fee951aa0",
            "sa-east-1": "ami-0b2164f9680f97099",
            "me-south-1": "ami-03479b7a590f97945",
            "af-south-1": "ami-080baa4ec59c456aa",
            "ap-east-1": "ami-0a9056eb817bc3928",
            "eu-south-1": "ami-0a72279e56849415e"
        },
        "Arm": {
            "us-east-1": "ami-0f3e90ad8e76c7a32",
            "us-east-2": "ami-03359d89f311a015e",
            "us-west-1": "ami-00ffd20b39dbfb6ea",
            "us-west-2": "ami-08ab3015c1bc36d24",
            "ap-south-1": "ami-01af9ec07fed38a38",
            "ap-northeast-2": "ami-0ee2af459355dd917",
            "ap-southeast-1": "ami-0d74dc5af4bf74ea8",
            "ap-southeast-2": "ami-08ab7201c83209fe8",
            "ap-northeast-1": "ami-07227003bfa0565e3",
            "ca-central-1": "ami-0cbf87c80362a058e",
            "eu-central-1": "ami-09cfa59f75e88ad54",
            "eu-west-1": "ami-0804bdeafd8af01f8",
            "eu-west-2": "ami-0db05a333dc02c1c8",
            "eu-west-3": "ami-0ceab436f882fe36a",
            "eu-north-1": "ami-04ba962c974ddd374",
            "sa-east-1": "ami-0fc9a9dec0f3df318",
            "me-south-1": "ami-0211bc858eb163594",
            "af-south-1": "ami-0d6a4af087f83899d",
            "ap-east-1": "ami-0d375f2ce688d16b9",
            "eu-south-1": "ami-0b1db84f31597a70f"
        }
    },
    "script_url": "https://raw.githubusercontent.com/4dn-dcic/tibanna/master/awsf3/",
    "json_bucket": "salk-tm-logs",
    "language": "snakemake",
    "job_tag": ""
}
[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [cr1.8xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 374, in __init__
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 426, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

Hm... Could you remove

"mem": 234.375,
"cpu": 32,

from the input and only specify the instance type? Also, make sure the instance type you choose actually exists in us-west-2 (not all instance types are available in all regions).

from tibanna.

nhartwic avatar nhartwic commented on July 18, 2024

The config is being automatically generated by snakemake. I can't remove those fields.

Mostly I want to know where "cr1.8xlarge" is coming from. Is it snakemake or tibanna that is trying to match ec2 type based on resources?

from tibanna.

nhartwic avatar nhartwic commented on July 18, 2024

It may be worth clarifying that "cr1.8xlarge" does not appear to be an instance type anymore. It doesn't exist in any of the regions I've checked and Amazon now lists it as a "previous generation instance"

After poking around, looks like this is the csv that needs to be updated...

https://github.com/SooLee/Benchmark/blob/master/Benchmark/aws/Amazon%20EC2%20Instance%20Comparison.csv

from tibanna.

Bioinf-usr avatar Bioinf-usr commented on July 18, 2024

Awesome!! great to see this issue being addressed. Would be happy to do some debugging if needed.

Thanks!!

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

Please do not use version 3.0.0 or 3.1.0. We identified a critical bug that can cause inflated costs when running spot. We are working on a solution.

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

Version 3.3.0 should fix this issue. Furthermore, when instance_type is set, Tibanna will use only that instance type for the workflow.

from tibanna.

trahsemaj avatar trahsemaj commented on July 18, 2024

I am running on 4.0.0 and running into the same issue described above, an outdated CSV (even in the latest benchmark release Benchmark-4dn-0.5.23. Using snakemake 7.3.1, with the --tibanna option.
My pipeline has pretty diverse resource needs for different rules, so setting a single instance_type for all steps is not a viable option.
My current workaround is to manually play with the mem_gb and threads until a valid instance type is selected, but that doesn't seem ideal.

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

Which instance type that is causing issues?

from tibanna.

trahsemaj avatar trahsemaj commented on July 18, 2024

from the run_task_awsem_* cloudwatch logs:

[ERROR] ClientError: An error occurred (InvalidInstanceType) when calling the DescribeInstanceTypes operation: The following supplied instance types do not exist: [r6a.xlarge, r6id.xlarge]
Traceback (most recent call last):
  File "/var/task/service.py", line 20, in handler
    return run_task(event)
  File "/var/task/tibanna/run_task.py", line 63, in run_task
    execution = Execution(input_json)
  File "/var/task/tibanna/ec2_utils.py", line 375, in init
    self.create_instance_type_list()
  File "/var/task/tibanna/ec2_utils.py", line 427, in create_instance_type_list
    results = ec2.describe_instance_types(
  File "/var/task/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/task/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)

fwiw, the file on the benchmark github (https://github.com/SooLee/Benchmark/blob/master/Benchmark/aws/Amazon%20EC2%20Instance%20Comparison.csv) does seem to be up-to-date (at least seems to contain valid instance types), but the package version with Benchmark-4dn-0.5.23 contains an outdated version. Unsure how/if this is tweakable without manually uploading my own lambda function with a corrected version of this file.
I see r6g* and r6i* instance types available, but not r6a* or r6id* instances in my region.

from tibanna.

alexander-veit avatar alexander-veit commented on July 18, 2024

I think the Benchmark-4dn-0.5.23 list is fine but the problem is that this list is not region specific. It returns instance types that are valid in us-east-1. Currently, Tibanna does not cross check what's actually available in active region and just takes the list from Benchmark. This certainly needs to be improved. I will add it to my todo list.

from tibanna.

trahsemaj avatar trahsemaj commented on July 18, 2024

ah, great to know, might consider migrating to us-east-1 if that list will be kept up-to-date. Maybe this is an issue better raised in Benchmark, but could imagine a fix might take some adjustments to both.

from tibanna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.