Giter VIP home page Giter VIP logo

Comments (9)

krokoko avatar krokoko commented on August 24, 2024

Hi @airmonitor , thank you for reporting this. Did you have a look at the associated sample we have for this construct ? https://github.com/aws-samples/generative-ai-cdk-constructs-samples/blob/main/samples/sagemaker_custom_endpoint/lib/sagemaker_custom_endpoint-stack.ts

from generative-ai-cdk-constructs.

airmonitor avatar airmonitor commented on August 24, 2024

Hello @krokoko

Thx for reaching out.

I will try with your suggestion.

Do you have also a working example for adding application autoscaling?

I need to have scalability added there so the number of endpoint instances will scale base on metric.

I've tried with application autoscaling but it was failing due to not working add_dependency method.

In short the endpoint creation was still in progress when application autoscaling tried to add required functionality.

AWS support added DependsOn section to the synthesised template but I was not able to implement the same using CDK due to missing add dependency method.

Regards
Tom

from generative-ai-cdk-constructs.

airmonitor avatar airmonitor commented on August 24, 2024

@krokoko

I've tested the proposed solution to use the IAM role that is created with CustomSageMakerEndpoint but apparently, I'm doing something wrong here

For the below scenario I'm getting errors from the cloudformation:

        custom_sagemaker_endpoint = CustomSageMakerEndpoint(
            self,
            id=model_name,
            container=ContainerImage.from_ecr_repository(repository=self.whisper_ecr_repo, tag=config_vars.image_tag),
            enable_operational_metric=True,
            endpoint_name=f"{config_vars.stage}-{config_vars.project}-{model_name}",
            instance_count=1,
            instance_type=SageMakerInstanceType.ML_G5_2_XLARGE  # type: ignore
            if config_vars.stage == "prod"
            else SageMakerInstanceType.ML_G5_XLARGE,  # type: ignore
            model_data_url=f"s3://{self.model_data_s3_bucket.bucket_name}/model.tar.gz",
            model_id=f"{config_vars.stage}-{config_vars.project}-{model_name}",
            environment={"SAGEMAKER_TRITON_DEFAULT_MODEL_NAME": f"whisper-{model_name}"},
        )

        self.whisper_ecr_repo.grant_pull(custom_sagemaker_endpoint.role) # ECR Repository
        self.model_data_s3_bucket.grant_read_write(custom_sagemaker_endpoint.role) # S3 bucket

Error from CloudFormation

9:56:57 AM | CREATE_FAILED        | AWS::SageMaker::Model                       | devbackendwhisperl...odellargev2vanilla
Could not access model data at s3://s3-bucket/model.tar.gz. Please ensure that the role "arn:aws:iam::11223344:role/dev-backend-whisper-core--largev2vanilla
RoleC7CE1EE-xaSDM2XaG6fE" exists and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com". Also ensure that the
role has "s3:GetObject" permissions and that the object is located in us-east-1. If your Model uses multiple models or uncompressed models, please ensure that the role has "s3:ListB
ucket" permission. (Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: 591d86b1-0e24-4d83-a7ec-f3f12b30402d; Proxy: null)


 ❌  backend-whisper-pipeline/dev-backend-whisper-core-stage/core-stack (dev-backend-whisper-core-stage-core-stack) failed: Error: The stack named dev-backend-whisper-core-stage-core-stack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Could not access model data at s3://s3-bucket/model.tar.gz. Please ensure that the role "arn:aws:iam::11223344:role/dev-backend-whisper-core--largev2vanillaRoleC7CE1EE-xaSDM2XaG6fE" exists and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com". Also ensure that the role has "s3:GetObject" permissions and that the object is located in us-east-1. If your Model uses multiple models or uncompressed models, please ensure that the role has "s3:ListBucket" permission. (Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: 591d86b1-0e24-4d83-a7ec-f3f12b30402d; Proxy: null)
    at FullCloudFormationDeployment.monitorDeployment (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:435:10568)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:438:199515)
    at async /opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:438:181237

 ❌ Deployment failed: Error: The stack named dev-backend-whisper-core-stage-core-stack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Could not access model data at s3://s3-bucket/model.tar.gz. Please ensure that the role "arn:aws:iam::11223344:role/dev-backend-whisper-core--largev2vanillaRoleC7CE1EE-xaSDM2XaG6fE" exists and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com". Also ensure that the role has "s3:GetObject" permissions and that the object is located in us-east-1. If your Model uses multiple models or uncompressed models, please ensure that the role has "s3:ListBucket" permission. (Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: 591d86b1-0e24-4d83-a7ec-f3f12b30402d; Proxy: null)
    at FullCloudFormationDeployment.monitorDeployment (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:435:10568)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:438:199515)
    at async /opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:438:181237

The stack named dev-backend-whisper-core-stage-core-stack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Could not access model data at s3://s3-bucket/model.tar.gz. Please ensure that the role "arn:aws:iam::11223344:role/dev-backend-whisper-core--largev2vanillaRoleC7CE1EE-xaSDM2XaG6fE" exists and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com". Also ensure that the role has "s3:GetObject" permissions and that the object is located in us-east-1. If your Model uses multiple models or uncompressed models, please ensure that the role has "s3:ListBucket" permission. (Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: 591d86b1-0e24-4d83-a7ec-f3f12b30402d; Proxy: null)

from generative-ai-cdk-constructs.

krokoko avatar krokoko commented on August 24, 2024

Hi @airmonitor , did you provide your role with permissions to access the S3 bucket storing your artifacts ? We have this demonstrated in the sample here: https://github.com/aws-samples/generative-ai-cdk-constructs-samples/blob/main/samples/sagemaker_custom_endpoint/lib/sagemaker_custom_endpoint-stack.ts#L43C5-L57C7

from generative-ai-cdk-constructs.

airmonitor avatar airmonitor commented on August 24, 2024

Hey @krokoko.

I'm not sure if I get your intention here.

The example for extending IAM permissions is using non optimal way for granting necessary permissions

    customEndpoint.addToRolePolicy(
      new iam.PolicyStatement({
        effect: iam.Effect.ALLOW,
        actions: [
          's3:GetObject',
          's3:GetObject*',
          's3:GetBucket*',
          's3:List*',
        ],
        resources: [
          'BUCKET_ARN',
          'BUCKET_ARN/*',
        ],
      }),
    );

In CDK we have a better way, shorter way to achieve the required results using the following examples

        self.whisper_ecr_repo.grant_pull(custom_sagemaker_endpoint.role) # ECR Repository
        self.model_data_s3_bucket.grant_read_write(custom_sagemaker_endpoint.role) # S3 bucket

grant_pull, grant_read_write method for ecr repository and s3 bucket should do the trick.
I'm not sure why we need to get back mentally a few years later when those methods weren't available in CDK?

Regards
Tom

from generative-ai-cdk-constructs.

krokoko avatar krokoko commented on August 24, 2024

Hi @airmonitor , sorry I just missed the last line of your previous example, I was thinking that permissions were missing on the role to access the model artifacts in S3.
Is the endpoint created in the same region where the bucket was created ?

from generative-ai-cdk-constructs.

airmonitor avatar airmonitor commented on August 24, 2024

Hello,

Yes, all resources are deployed in us-east-1

from generative-ai-cdk-constructs.

github-actions avatar github-actions commented on August 24, 2024

This issue is now marked as stale because it hasn't seen activity for a while. Add a comment or it will be closed soon. If you wish to exclude this issue from being marked as stale, add the "backlog" label.

from generative-ai-cdk-constructs.

krokoko avatar krokoko commented on August 24, 2024

I am not able to reproduce the issue. Tested both with the provided role by the construct, and by creating a role outside of the construct and passing it through constructor:

const roleSG = new iam.Role(this, 'Role', {
      assumedBy: new iam.ServicePrincipal('sagemaker.amazonaws.com'),
      managedPolicies: [
        iam.ManagedPolicy.fromAwsManagedPolicyName('AmazonSageMakerFullAccess'),
      ],
    });

const endpoint = new genai.CustomSageMakerEndpoint(this, 'testllavaendpoint', {
...
role: roleSG
...
}

In both cases, the construct is successfully deployed.

One thing is for the samples we have and the tests I did, I am using the default bucket created by the SageMaker SDK (see https://sagemaker.readthedocs.io/en/stable/api/utility/session.html, default_bucket) when preparing the model. It might be worth on your end to create the same default session bucket and compare it with the bucket you provide to the construct, and see if any configuration difference stands out.

I will close the issue, if you get more details and/or can upload a sample stack to a public repo which we can use to consistently reproduce the issue, please reopen this ticket. Thank you !

from generative-ai-cdk-constructs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.