Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add framework for version converter API #1926

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

shubhambhokare1
Copy link
Contributor

No description provided.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link

codecov bot commented Oct 31, 2024

❌ 12 Tests Failed:

Tests completed Failed Passed Skipped
12381 12 12369 1201
View the full list of 3 ❄️ flaky tests
tests.eager_mode_test.TestEagerModeArguments_0_reference_runtime::test_function_all_input_by_kwargs

Flake rate in main: 39.05% (Passed 7265 times, Failed 4655 times)

Stack Traces | 0.002s run time
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:91: in run
    res = self._run(x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:139: in _run
    res = (convert_from_ml_dtypes(res[0]),)
..../test_torch_nightly/lib/python3.12.../onnx/reference/custom_element_types.py:50: in convert_from_ml_dtypes
    return array.view(dtype=dtype)
E   ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

The above exception was the direct cause of the following exception:
tests/eager_mode_test.py:109: in test_function_all_input_by_kwargs
    self.assertEqual(add_with_alpha(this=1.0, other=2.0), 3.0)
onnxscript/values.py:576: in __call__
    return evaluator.default().eval_function(self, args, kwargs)
onnxscript/evaluator.py:307: in eval_function
    result = function.function(*adapted_args, **adapted_kwargs)
tests/eager_mode_test.py:59: in add_with_alpha
    other = op.Mul(other, alpha)
.../onnx_opset/_impl/opset14.py:696: in Mul
    return op(*self._prepare_inputs(schema, A, B))
onnxscript/values.py:304: in __call__
    return evaluator.default().eval(schema, args, kwargs)
onnxscript/evaluator.py:194: in eval
    outputs = self._eval(schema, inputs, attributes, closure)
onnxscript/evaluator.py:524: in _eval
    result = session.run(None, session_run_input)
..../test_torch_nightly/lib/python3.12.../onnx/reference/reference_evaluator.py:599: in run
    outputs = node.run(*inputs, **linked_attributes)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:114: in run
    res = OpRunBinary.run(self, x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:93: in run
    raise TypeError(
E   TypeError: Issues with types <class 'numpy.ndarray'>, <class 'numpy.ndarray'> (binary operator 'Mul').
tests.eager_mode_test.TestEagerModeArguments_0_reference_runtime::test_function_input_and_attribute_by_kwargs_out_of_order

Flake rate in main: 39.05% (Passed 7265 times, Failed 4655 times)

Stack Traces | 0.003s run time
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:91: in run
    res = self._run(x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:139: in _run
    res = (convert_from_ml_dtypes(res[0]),)
..../test_torch_nightly/lib/python3.12.../onnx/reference/custom_element_types.py:50: in convert_from_ml_dtypes
    return array.view(dtype=dtype)
E   ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

The above exception was the direct cause of the following exception:
tests/eager_mode_test.py:115: in test_function_input_and_attribute_by_kwargs_out_of_order
    self.assertEqual(add_with_alpha(alpha=3.0, other=2.0, this=1.0), 7.0)
onnxscript/values.py:576: in __call__
    return evaluator.default().eval_function(self, args, kwargs)
onnxscript/evaluator.py:307: in eval_function
    result = function.function(*adapted_args, **adapted_kwargs)
tests/eager_mode_test.py:59: in add_with_alpha
    other = op.Mul(other, alpha)
.../onnx_opset/_impl/opset14.py:696: in Mul
    return op(*self._prepare_inputs(schema, A, B))
onnxscript/values.py:304: in __call__
    return evaluator.default().eval(schema, args, kwargs)
onnxscript/evaluator.py:194: in eval
    outputs = self._eval(schema, inputs, attributes, closure)
onnxscript/evaluator.py:524: in _eval
    result = session.run(None, session_run_input)
..../test_torch_nightly/lib/python3.12.../onnx/reference/reference_evaluator.py:599: in run
    outputs = node.run(*inputs, **linked_attributes)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:114: in run
    res = OpRunBinary.run(self, x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:93: in run
    raise TypeError(
E   TypeError: Issues with types <class 'numpy.ndarray'>, <class 'numpy.ndarray'> (binary operator 'Mul').
tests.eager_mode_test.TestEagerModeArguments_0_reference_runtime::test_function_some_input_by_kwargs

Flake rate in main: 39.05% (Passed 7265 times, Failed 4655 times)

Stack Traces | 0.003s run time
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:91: in run
    res = self._run(x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:139: in _run
    res = (convert_from_ml_dtypes(res[0]),)
..../test_torch_nightly/lib/python3.12.../onnx/reference/custom_element_types.py:50: in convert_from_ml_dtypes
    return array.view(dtype=dtype)
E   ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

The above exception was the direct cause of the following exception:
tests/eager_mode_test.py:106: in test_function_some_input_by_kwargs
    self.assertEqual(add_with_alpha(1.0, other=2.0), 3.0)
onnxscript/values.py:576: in __call__
    return evaluator.default().eval_function(self, args, kwargs)
onnxscript/evaluator.py:307: in eval_function
    result = function.function(*adapted_args, **adapted_kwargs)
tests/eager_mode_test.py:59: in add_with_alpha
    other = op.Mul(other, alpha)
.../onnx_opset/_impl/opset14.py:696: in Mul
    return op(*self._prepare_inputs(schema, A, B))
onnxscript/values.py:304: in __call__
    return evaluator.default().eval(schema, args, kwargs)
onnxscript/evaluator.py:194: in eval
    outputs = self._eval(schema, inputs, attributes, closure)
onnxscript/evaluator.py:524: in _eval
    result = session.run(None, session_run_input)
..../test_torch_nightly/lib/python3.12.../onnx/reference/reference_evaluator.py:599: in run
    outputs = node.run(*inputs, **linked_attributes)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:114: in run
    res = OpRunBinary.run(self, x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:93: in run
    raise TypeError(
E   TypeError: Issues with types <class 'numpy.ndarray'>, <class 'numpy.ndarray'> (binary operator 'Mul').

To view more test analytics, go to the Test Analytics Dashboard
Got feedback? Let us know on Github

@gramalingam
Copy link
Collaborator

One of the main design question relates to the "adapter" signature: what form should it take? Essentially it is a function that takes a single node as a parameter, and modifies it in some form. The changes are typically a simple mutation of a node along with potentially other changes (such as the insertion of extra nodes).

For now, I think it might be fine to follow the pattern used in the optimizer and rewriter, which are based on node-transformers that, given an input node, return a sequence of replacement nodes or None (if no replacement is required). This allows a simple loop over all nodes in the graph that transforms each node in sequence. This can be generalized later if necessary.

@shubhambhokare1 shubhambhokare1 self-assigned this Nov 6, 2024
@shubhambhokare1 shubhambhokare1 added enhancement New feature or request topic: api topic: IR Intermediate representation labels Nov 6, 2024
self.custom_adapters = custom_adapter_list

def graph_version_convert(self, graph: ir.Graph, target_version: int) -> None:
if self.model_version == target_version:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Extension) I think we will need to soon support the case where the incoming model has nodes with different opset versions. At that point, such checks should happen at the node level, not at the model level.

@justinchuby
Copy link
Collaborator

justinchuby commented Nov 7, 2024

Questions that are related to the design doc that comes to mind

  1. How do we ensure down conversion can be supported in the future
  2. What invariants do we preserve in the nodes; an opset version may or may not be associated with an ir.Node. How are both cases handled
  3. How does the design relate to the rewriter? Conceptually version conversion is a model rewriting process. How do we plan to maintain a consistent dev/user experience when authoring and debugging subgraph replacement logic?
  4. When version conversion fails, how should it fail? Succeed partially, abort, etc.? What are the guarantees/invariants of the model state when conversion is not possible?
  5. Any performance considerations?
  6. What is the path for supporting new opsets?
  7. How is the conversion logic tested for them to be robust? How is it designed so that future maintenance is simple and scalable?

@justinchuby
Copy link
Collaborator

justinchuby commented Nov 11, 2024

An op can stay the same for many versions. For example, Acosh-9 is the same all the way to 21. Would it make sense to mark the adapter for it to support the "base to next" version (e.g. 9->22) so that it is clear on the range of opsets it supports? IMO this is a clearer notation than what the ONNX version converter uses, which is 21->22 and can be confusing, because in this case Acosh-21 is implicitly defined by Acosh-9.

We can also identify if an adapter is usable for a given node with opset version without having to consult with the ONNX defs to know if Acosh-9 is the same as Acosh-21, because from the 9->22 adapter we know the supported range is 9-21. Of course this assumes we don't skip any opset versions in between. If we want to do that we can also be more explicit with <supported rage> -> <target> aka. ((9,21), 22)

@register("DFT", node_version=19, upgrade_version=20)
def dft_19_20(node: ir.Node, op):
input = node.inputs[0]
inverse = _get_int_attribute(node, "inverse", 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a complication here, relating to handling of functions. In functions, we can have attribute-parameters, which mean we don't know the value of the attribute. The version-conversion, as written, will be wrong in such cases.

For now, I think it is okay to inline all functions (see relevant comment below) and ignore this issue.

Copy link
Collaborator

@justinchuby justinchuby Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to just promote the attribute to input and use them in the upgraded op?

def __init__(self, target_version: int):
self.target_version = target_version

def process_node(self, node: ir.Node, opset_version, up_conversion: bool = True):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
def __init__(self, target_version: int):
self.target_version = target_version

def process_node(self, node: ir.Node, opset_version, up_conversion: bool = True):

Check notice

Code scanning / lintrunner

PYLINT/R1710 Note

Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements)
See inconsistent-return-statements. To disable, use # pylint: disable=inconsistent-return-statements
for graph in attr.value:
self.visit_graph(graph, opset_version, up_conversion) # type: ignore[arg-type]

def visit_node(

Check notice

Code scanning / lintrunner

PYLINT/R1710 Note

Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements)
See inconsistent-return-statements. To disable, use # pylint: disable=inconsistent-return-statements
self.visit_node(node, graph, opset_version, up_conversion)
node.version = self.target_version

def visit_model(self, model: ir.Model) -> None:

Check notice

Code scanning / lintrunner

PYLINT/R1710 Note

Either all return statements in a function should return an expression, or none of them should. (inconsistent-return-statements)
See inconsistent-return-statements. To disable, use # pylint: disable=inconsistent-return-statements
@shubhambhokare1 shubhambhokare1 changed the title [WIP] Add framework for version converter API Add framework for version converter API Nov 14, 2024
__all__ = [
# Functions
"convert_version",
"inline",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"inline",

inline is not part of the converter

]

from onnxscript import ir
from onnxscript.optimizer._inliner import inline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import modules only

Suggested change
from onnxscript.optimizer._inliner import inline
from onnxscript.optimizer import _inliner


from onnxscript import ir
from onnxscript.optimizer._inliner import inline
from onnxscript.version_converter.version_converter import version_convert
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perfer calling it convert_version. Prefer using a verb as a function name

import logging
from typing import Callable, Sequence, Union

import onnxscript.ir._convenience as _convenience
Copy link
Collaborator

@justinchuby justinchuby Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import onnxscript.ir._convenience as _convenience
import onnxscript.ir.convenience as ir_convenience

conventional naming; import the public module

# A version-adapter function takes a node, a RewriterContext and returns
# a Replacement for the node or None (if no replacement is needed).

ReturnValue = Union[Replacement, Sequence[ir.Value], ir.Value, None]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would a function return a Replacement over a Value?

Copy link
Collaborator

@justinchuby justinchuby Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module can be private. All public functions are exposed in __init__

opname: str,
original_version: int,
up_conversion: bool = True,
) -> Union[AdapterFunction, None]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> Union[AdapterFunction, None]:
) -> AdapterFunction | None:

original_version: int,
up_conversion: bool = True,
) -> Union[AdapterFunction, None]:
adapter = self.op_adapters.get((domain, opname, original_version, up_conversion), None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
adapter = self.op_adapters.get((domain, opname, original_version, up_conversion), None)
adapter = self.op_adapters.get((domain, opname, original_version, up_conversion))

default is None

Comment on lines +65 to +66
else:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
return None
return None

else:
return None

def register(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring?

def register(
self, opname: str, domain: str = "", node_version=None, up_conversion=True
) -> Callable[[AdapterFunction], AdapterFunction]:
def decorator(function: AdapterFunction) -> AdapterFunction:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decorate with functools.wraps so the typing information is preserved

return None


def _get_output(node: ir.Node, index: int) -> ir.Value | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this used?

def __init__(self, target_version: int):
self.target_version = target_version

def process_node(self, node: ir.Node, opset_version, up_conversion: bool = True):
Copy link
Collaborator

@justinchuby justinchuby Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def process_node(self, node: ir.Node, opset_version, up_conversion: bool = True):
def process_node(self, node: ir.Node, opset_version: int, up_conversion: bool = True) -> Replacement | None:


def process_node(self, node: ir.Node, opset_version, up_conversion: bool = True):
if node.domain not in self.opset_imports:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a test to cover this line?

output = adapter(node, context)
if output is not None:
if isinstance(output, Replacement):
return output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a test to cover this line?

output = [output]
return Replacement(output, context.nodes)

def replace_node(self, node: ir.Node, replacement, root: ir.Graph | ir.Function):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def replace_node(self, node: ir.Node, replacement, root: ir.Graph | ir.Function):
def replace_node(self, node: ir.Node, replacement, root: ir.Graph | ir.Function) -> None:

scale = _get_input(node, 1)
bias = _get_input(node, 2)
if x is None or scale is None or bias is None:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some branches are missing coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request topic: api topic: IR Intermediate representation
Projects
Development

Successfully merging this pull request may close these issues.

3 participants