Name		Name	Last commit message	Last commit date
parent directory ..
img		img
workspace		workspace
README.md		README.md
enc_dec_sagemaker.ipynb		enc_dec_sagemaker.ipynb
mistral_sagemaker.ipynb		mistral_sagemaker.ipynb
push_ecr.sh		push_ecr.sh

README.md

Triton TensorRT-LLM on Amazon SageMaker Examples

Examples

For Deploying Encoder-Decoder Model like T5/BART with Triton TensorRT-LLM see enc_dec_sagemaker.ipynb
For Deploying Decoder-only Model like Mistral-7B v0.2 with Triton TensorRT-LLM see mistral_sagemaker.ipynb. Deployment using FP16 and INT4 AWQ Quantization is shown

AWS Sagemaker Notebook Configuration

Login to AWS and navigate to the Amazon Sagemaker service
Configure a SageMaker notebook using instance type g5.xlarge. Other instance types like g6, g6e, p4d, p4de, p5, p5e are also supported.

Configure the instance with enough storage to accommodate container image pull(s) and model weights - 100GB should be adequate

Ensure IAM role AmazonSageMakerServiceCatalogProductsUseRole is associated with your notebook
- Note you may need to associate additional permissions with this role to permit ECR CreateRepository and image push operations
Configure the Default repository and reference this repo: https://github.com/aws-samples/awsome-inference.git
Click Create notebook instance

Within the notebook instance navigate to 2.projects and example notebooks under triton-trtllm-sagemaker project.