-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensor.cat
of dynamic length tensors always results in slow memcopys
#19092
Comments
Well yes. The concats arent contiguous? If the concat cannot be made contiguous on ingestion, you can try |
I thought they are right? The concat is happening on the outermost dim. I think the issue is that when this gets decomposed into a |
The choice to decompose |
func.func @main(%arg0: tensor<4x?x4x64xf32>, %arg1: tensor<4x?x4x64xf32>) -> tensor<8x?x4x64xf32> {
%1 = tensor.concat dim(0) %arg0, %arg1 : (tensor<4x?x4x64xf32>, tensor<4x?x4x64xf32>) -> tensor<8x?x4x64xf32>
return %1 : tensor<8x?x4x64xf32>
} Wait isn't this sample degenerate? We always have to copy because the only thing here is a concat. I think we need some surrounding dispatches to see if this actually always happens. |
@qedawkins I think the problem is that it gets lowered to a iree/compiler/src/iree/compiler/Dialect/Flow/Conversion/TensorToFlow/Utils.cpp Lines 111 to 120 in 915b06b
ValueBoundsOpInterface doesn't work with hal buffer dim ops
|
Sorry, I missed that it is contiguous. @IanWood1 lets look at this deeper today in our 1:1 |
Again this might be a degenerate case. |
I havent looked at this in a while. I'll need some context. Even for this degenerate case we should be able to not generate slow memcpys, even if it means we keep the |
Ok, for now I am taking this from Ian. Ill start with trying to push |
…pdate`. These are in preparation to delay to decomposition of `tensor.concat` into `tensor.insert_slice`s. This patch just adds the patterns to lower a `tensor.concat` along the outer dimension to `flow.tensor.update`. Future changes will delay the decomposition of `tensor.concat` to allow for non-outer dimension concatenation to be conveted into `tensor.insert_slice`s before dispatch formation with the `tensor.insert_slice` fused into its producers. Towards iree-org#19092 Signed-off-by: MaheshRavishankar <[email protected]>
…pdate`. These are in preparation to delay to decomposition of `tensor.concat` into `tensor.insert_slice`s. This patch just adds the patterns to lower a `tensor.concat` along the outer dimension to `flow.tensor.update`. Future changes will delay the decomposition of `tensor.concat` to allow for non-outer dimension concatenation to be conveted into `tensor.insert_slice`s before dispatch formation with the `tensor.insert_slice` fused into its producers. Towards iree-org#19092 Signed-off-by: MaheshRavishankar <[email protected]>
…pdate`. These are in preparation to delay to decomposition of `tensor.concat` into `tensor.insert_slice`s. This patch just adds the patterns to lower a `tensor.concat` along the outer dimension to `flow.tensor.update`. Future changes will delay the decomposition of `tensor.concat` to allow for non-outer dimension concatenation to be conveted into `tensor.insert_slice`s before dispatch formation with the `tensor.insert_slice` fused into its producers. Towards iree-org#19092 Signed-off-by: MaheshRavishankar <[email protected]>
…pdate`. (#19126) These are in preparation to delay to decomposition of `tensor.concat` into `tensor.insert_slice`s. This patch just adds the patterns to lower a `tensor.concat` along the outer dimension to `flow.tensor.update`. Future changes will delay the decomposition of `tensor.concat` to allow for non-outer dimension concatenation to be conveted into `tensor.insert_slice`s before dispatch formation with the `tensor.insert_slice` fused into its producers. Towards #19092 --------- Signed-off-by: MaheshRavishankar <[email protected]>
The following block always generates a slow memory copy for performing the concatenation
The text was updated successfully, but these errors were encountered: