[Quant tool] Handle input models with pre-quantized weights #22633

adrianlizarraga · 2024-10-28T23:59:28Z

Description

Allows the QDQ quantizer to handle input models that already have some pre-quantized weights. In this case, the qdq quantizer will properly skip/handle the pre-quantized weights.

Also handles an operator (e.g., Conv) with a pre-quantized weight and a float bias. The tool will read the pre-quantized weight's quantization scale to compute the bias's scale (bias_scale = input_scale * weight_scale).

Input model (pre-quantized Conv weight):

Output QDQ model (everything is quantized):

Motivation and Context

Customers may use external tools to quantize some weights (e.g., int4 for Conv/MatMul). The qdq quantizer should still be able to quantize the rest of the model (float weights and activations) in this case.

adrianlizarraga · 2024-10-29T00:24:34Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

adrianlizarraga · 2024-10-29T03:34:39Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

…pecifically.

…ts after quantization

adrianlizarraga added 4 commits October 28, 2024 16:51

First working version that handles pre-quantized weights

e104ef4

Remove extra options; just handle pre-quantized weights if they exist

7af7f3a

Undo import changes to run unittests locally

33b0266

Use temp files to store unittest models

b653ccc

adrianlizarraga marked this pull request as ready for review October 29, 2024 00:34

Merge branch 'main' into adrianl/quant-tool-handle-prequantized-weights

5d65b4c

sophies927 added release:1.20.1 triage:approved Approved for cherrypicks for release release:1.20.0 and removed triage:approved Approved for cherrypicks for release release:1.20.1 labels Nov 5, 2024

adrianlizarraga added 8 commits November 12, 2024 21:09

Merge main and fix conflicts

29b839b

Check for pre-quantized tensor instead of pre-quantized initializer s…

e87b53f

…pecifically.

Update unit tests for pre-quantized weights

eb8981e

Merge branch 'main' into adrianl/quant-tool-handle-prequantized-weights

7adcbd7

Check that the scale/zero-point do not change for pre-quantized weigh…

0f0a3e0

…ts after quantization

import annotations in onnx_model.py

b70729d

Simplify and reduce diff

5096e7e

Merge branch 'main' into adrianl/quant-tool-handle-prequantized-weights

3df2ddb

adrianlizarraga requested review from baijumeswani, fajin-corp and yihonglyu November 14, 2024 19:17

baijumeswani approved these changes Nov 14, 2024

View reviewed changes

adrianlizarraga merged commit 0733733 into main Nov 14, 2024
128 checks passed

adrianlizarraga deleted the adrianl/quant-tool-handle-prequantized-weights branch November 14, 2024 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quant tool] Handle input models with pre-quantized weights #22633

[Quant tool] Handle input models with pre-quantized weights #22633

adrianlizarraga commented Oct 28, 2024 •

edited

Loading

adrianlizarraga commented Oct 29, 2024

adrianlizarraga commented Oct 29, 2024

[Quant tool] Handle input models with pre-quantized weights #22633

[Quant tool] Handle input models with pre-quantized weights #22633

Conversation

adrianlizarraga commented Oct 28, 2024 • edited Loading

Description

Motivation and Context

adrianlizarraga commented Oct 29, 2024

adrianlizarraga commented Oct 29, 2024

adrianlizarraga commented Oct 28, 2024 •

edited

Loading