- For Deploying Encoder-Decoder Model like T5/BART with Triton TensorRT-LLM see enc_dec_sagemaker.ipynb
- For Deploying Decoder-only Model like Mistral-7B v0.2 with Triton TensorRT-LLM see mistral_sagemaker.ipynb. Deployment using FP16 and INT4 AWQ Quantization is shown
-
Login to AWS and navigate to the Amazon Sagemaker service
-
Configure a SageMaker notebook using instance type
g5.xlarge
. Other instance types like g6, g6e, p4d, p4de, p5, p5e are also supported.
- Configure the instance with enough storage to accommodate container image pull(s) and model weights -
100GB
should be adequate
- Ensure IAM role
AmazonSageMakerServiceCatalogProductsUseRole
is associated with your notebook- Note you may need to associate additional permissions with this role to permit ECR
CreateRepository
and image push operations
- Note you may need to associate additional permissions with this role to permit ECR
- Configure the Default repository and reference this repo: https://github.com/aws-samples/awsome-inference.git
- Click Create notebook instance
- Within the notebook instance navigate to 2.projects and example notebooks under triton-trtllm-sagemaker project.