-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tuner] Reduce maintenance burden and prepare for more codegen pipelines #453
Comments
Thanks @kuhar . Could you also add a pointer to where the tuner lives today. The tasks you described are related to making it easy to maintain the tuner. One more thing we need to think about is how to make it easy to maintain and update the tuner script. Specifically questions to answer are
For now just noting some top-of-mind questions. |
@MaheshRavishankar The tuner lives under the |
…ources (#19069) This PR aims to address the first task in nod-ai/SHARK-Platform#453: adding an iree-opt pass that removes configuration from executable sources. The corresponding test is also included to ensure its correct functionality. --------- Signed-off-by: Bangtian Liu <[email protected]>
…9124) This PR aims to address the task listed in nod-ai/shark-ai#453: add a utility function (`QueryMMAIntrinsics`) to query supported MMA intrinsics. A new test pass `TestLLVMGPUQueryMMAPass` has been added to validate the correctness of this utility function, along with a corresponding test to ensure reliable functionality. TODO: The function will be exposed to both the C API and Python in a follow-up PR. --------- Signed-off-by: Bangtian Liu <[email protected]>
This is an uber-issue for making the tuner easier to maintain. The current implementation has a few issues that make the tuner library fragile and prone to getting out of sync with the IREE compiler. Specifically, we identified the following issues:
There are two ways to (re-)configure executable sources:
a. By updating the lowering config and translation info in-situ. This is used when producing candidate dispatches using executable benchmarks as the source-of-truth.
b. By using the transform dialect library script to match root ops and apply compilation info attributes to them. This is used during the model candidate compilation and benchmarking stage.
As a result, we have duplicate logic to apply configurations found by the constraint solver. The fix is to write a pass that strips existing configuration from executable sources, and then use transform dialect to re-configure them. This can be done as a separate invocation of
iree-opt
.The MLIR processing is mostly string-based. While this allowed us to quickly prototype, it makes the code prone to getting out of sync with the IREE compiler. The lowering configs and translation info attributes are considered compiler-internals and there's no stability guarantee as to the exact structure and format of these attributes. As a result, every time the format changes, we have to update the parsing and printing logic in the tuner to match the new format in the compiler.
Here, the proposed solution is to expose these key attributes (translation info, compilation info, and MFMA intrinsic info) through python bindings. We already have it for the GPU pipeline options that can be used as a template for future bindings: Reland #18804 iree-org/iree#18840.
Make it easier to identify 'root ops'. We can make the IREE compiler annotate the root lingalg ops with a new attribute that the tuner can use to recognize them, without having to duplicate the compiler logic.
The
Configuration
representation is modeled after the requirements of theLLVMGPUVectorDistribute
pipeline. This made it so that the surrounding code makes implicit assumptions about the problem representation. Instead, we should define an interface that allows us to support multiple compilation pipelines, such that the generated SMT constraints are specific to both the pipeline and the dispatch kind. Further, the constraint generation code should be decoupled from the parsing/printing code, such that projects like TKW can use just the constraint generation and benchmarking infra.Move from two stages of compile-and-benchmark to just one. This made sense for SDXL where the best isolated dispatch does not necessarily perform best across the whole model, but it may not be necessary or even sufficiently general for other applications. This is related to the
libtuner.TuningClient
class; clients should be able to define their own tuning stages with libtuner providing the interface to specify the compilation and benchmarking commands.Tasks
iree_gpu
Python bindings (GPUPipelineOptionsAttr
) iree-org/iree#18804iree_codegen.translation_info
iree-org/iree#19128iree_codegen.compilation_info
iree-org/iree#19129ir.(Integer|Float)Type
for element types #554TileAndFuse
pipeline. @bangtianliu and @Max191candidate_gen.py
. @kuharTuningCandidate
. Update the existing example to adapt to this change.The text was updated successfully, but these errors were encountered: