You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the awesome work!
currently, I've been struggling with an issue while working with speedster which I will lay out below: 1. I've been able to optimize onnx model ( from HuggingFace, and is based on Donut https://github.com/clovaai/donut )
code used:
import numpy as np
from speedster import optimize_model
from speedster import save_model
import numpy as np
import torch
import os
Provide input data for the model
input_data = [((np.array(torch.randn(5, 3),dtype=np.int64), np.array(torch.randn(5, 3, 1024),dtype=np.float32), ), torch.tensor([0, 1, 0, 1, 1])) for _ in range(100)]
Run Speedster optimization
optimized_model = optimize_model(
"./models/onnx/decoder_model.onnx",
input_data=input_data,
optimization_time="unconstrained",
device="gpu:0",
metric_drop_ths=0.8
)
save_model(optimized_model, "./models/speedster")
output:
2023-07-19 14:22:43 | INFO | Running Speedster on GPU:0
2023-07-19 14:25:33 | INFO | Benchmark performance of original model
2023-07-19 14:26:10 | INFO | Original model latency: 0.023933820724487305 sec/iter
2023-07-19 14:26:11 | INFO | [1/1] Running ONNX Optimization Pipeline
2023-07-19 14:26:11 | INFO | Optimizing with ONNXCompiler and q_type: None.
2023-07-19 14:26:14 | WARNING | TensorrtExecutionProvider for onnx is not available. If you want to use it, please add the path to tensorrt to the LD_LIBRARY_PATH environment variable. CUDA provider will be used instead.
2023-07-19 14:26:16 | INFO | Optimized model latency: 0.02505326271057129 sec/iter
2023-07-19 14:26:16 | INFO | Optimizing with ONNXCompiler and q_type: QuantizationType.HALF.
2023-07-19 14:26:44 | INFO | Optimized model latency: 0.3438906669616699 sec/iter
2023-07-19 14:26:44 | INFO | Optimizing with ONNXTensorRTCompiler and q_type: None.
2023-07-19 14:28:18 | INFO | Optimized model latency: 0.004456996917724609 sec/iter
2023-07-19 14:28:18 | INFO | Optimizing with ONNXTensorRTCompiler and q_type: QuantizationType.HALF.
2023-07-19 14:28:51 | INFO | Optimized model latency: 0.003861665725708008 sec/iter
2023-07-19 14:28:51 | INFO | Optimizing with ONNXTensorRTCompiler and q_type: QuantizationType.STATIC.
2023-07-19 14:33:56 | INFO | Optimized model latency: 0.004480838775634766 sec/iter
[Speedster results on Tesla V100-SXM2-16GB]
Metric Original Model Optimized Model Improvement
----------- ---------------- ----------------- -------------
backend NUMPY TensorRT
latency 0.0239 sec/batch 0.0039 sec/batch 6.20x
throughput 208.91 data/sec 1294.78 data/sec 6.20x
model size 743.98 MB 254.43 MB -65%
metric drop 0.5291
techniques fp16
I am just hitting a wall when trying to perform inference. code used:
from speedster import load_model
from nebullvm.tools.benchmark import benchmark
import numpy
import tensorflow as tf
optimized_model = load_model("../opt/models/speedster/")
print('speedster onnx model loaded')
device = "cuda" if torch.cuda.is_available() else "cpu"
dummy_input = torch.randn(1, 3, 300, 400, dtype=torch.float).to(device)
print(type(dummy_input))
output = optimized_model(dummy_input)
print(output)
observation:
2023-07-19 14:35:43 | WARNING | Debug: Got extra keywords in NvidiaInferenceLearner::from_engine_path: {'class_name': 'NumpyONNXTensorRTInferenceLearner', 'module_name': 'nebullvm.operations.inference_learners.tensor_rt'}
speedster onnx model loaded
<class 'torch.Tensor'>
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[<ipython-input-9-ea33d0034b2d>](https://localhost:8080/#) in <cell line: 20>()
18
19 # Use the accelerated version of your ONNX model in production
---> 20 output = optimized_model(dummy_input)
21 print(output)
5 frames
[/usr/local/lib/python3.10/dist-packages/polygraphy/cuda/cuda.py](https://localhost:8080/#) in dtype(self, new)
296 def dtype(self, new):
297 self._dtype = new
--> 298 self.itemsize = np.dtype(new).itemsize
299
300 @property
TypeError: Cannot interpret 'torch.float32' as a data type
So my question would be what are the types of parameters I did to include for optimized_model() method here . Previously, I've been passing the following to original model to get it working
@mfumanelli sorry for interruption, but I was hoping you can point me in the right direction. been struggling with an issue while trying to optimize onnx model via speedster. I might be doing something wrong here.
I already have my script to replicate the issue on my google colab account if you want to have a look. thanks.
Thanks for the awesome work!
currently, I've been struggling with an issue while working with speedster which I will lay out below:
1. I've been able to optimize onnx model ( from HuggingFace, and is based on Donut https://github.com/clovaai/donut )
code used:
output:
code used:
observation:
So my question would be what are the types of parameters I did to include for optimized_model() method here . Previously, I've been passing the following to original model to get it working
Please let me know if you require additional information. thanks.
The text was updated successfully, but these errors were encountered: