[feat] Add support for nonlinear operations #27

HobbitQia · 2024-10-25T03:18:22Z

I'm so excited to share my updates on the mapper with you. The main changes are listed below:

In DFG.cpp, I added a function called nonlinear_combine(), which will fuse the common patterns occurring in nonlinear operations.
For special functions (e.g., LUT, FP2FX), I recognize them in the DFG through the names of function calls, then demangle their names. Take LUT as an example: In C++ kernel codes, we should define LUT as a function like:
```
__attribute__((noinline)) DATA_TYPE lut(DATA_TYPE x) { ... }
```
Then our mapper will traverse all functional calls, find special functions by the name like demangle(newName) == "lut(float)". We determine a DFG node which contains a special function through DFGNode::getOpcodeName() and compare the operation name with names of predefined special functions. Details can be seen in DFG.cpp:408-432 and DFGNode::isLut(). l For CGRA mapping I also added special functions so that we can choose to configure a tile to equip with LUT/FP2FX or not. We can specify CGRA's nodes through additionalFunc in param.json. Here's an example:
```
"additionalFunc"        : {    "lut" : [1, 2, ...], ...    }
```
For vectorization, I leverage the original mapper to mark the vectorized operations through DFGNode::isVectorized. However for divison we cannot vectorize it since it's hard to support efficient vector divisor from the hardware perspective. Thus I chose to split a divison nodes into multiple nodes and reconnect the precursors and successors in DFG::tuneDivPattern(). Notably, different vectorization factors will lead to different number of node (e.g. if VF=4 we should split a divison into 4 nodes) thus I added a parameter in the params.json so that we can specify the VF (default to 1).
Support for fine-grained fusion. When combining different patterns into a single node, I added a paramter specified by users to determine the "class" of the patterns (a "class" can have multiple fused patterns). Different tiles can support different "classes" of fused patterns, which are also specified in param.json. Here's an example:
```
"additionalFunc"        : {     "complex1" : [4,5,6,7],    "complex2" : [8,9,10,11],    "complex3" : [0,1,2,3], ...   }
```
In this configurations there are three classes of fused patterns and each class is supported by a set of tiles.
Above are my main changes, along with some bug fixed. (sry that I cannot remember everything...)

…n specific tiles.

…n a single node. e.g. division

tancheng

Really appreciate the PR! Can you also provide a param.json to enable the functionality provided by your PR as an example, and include in the action (i.e., github testing automation)?

src/CGRA.cpp

tancheng · 2024-10-26T05:56:41Z

src/CGRA.cpp

-    }
+    // for (int r=0; r<t_rows; ++r) {
+    //   for (int c=0; c<t_columns; ++c) {
+    //     nodes[r][c]->enableCall();


This change is a bug fix, right? So from now on, call can only be supported if user specify it in the param.json or user needs to modify this CGRA.cpp file?

And is this call actually how is the lut is recognized? i.e., instead of support call, user provide the lut func that is actually called in the IR.

Yes, call can only be supported if user specify. For lut, it should be writed as call-lut in param.json after I refactored the code.

tancheng · 2024-10-26T05:57:23Z

src/CGRA.cpp

+      // for (int r=0; r<t_rows; ++r) {
+      //   for (int c=0; c<t_columns; ++c) {
+      //     // if(c == 0 || (r%2==1 and c%2 == 1))
+      //       nodes[r][c]->enableComplex();
+      //   }
+      // }


Means heterogeneity in the param.json won't take effect any more?

No. Here I mean complex operations should be manually specified in the param.json rather than we configure it for the default.

src/CGRANode.cpp

src/DFGNode.cpp

src/DFGNode.h

tancheng · 2024-10-26T06:54:36Z

Plz also resolve the conflict. Thanks!

tancheng · 2024-11-12T18:42:49Z

Thanks @HobbitQia, plz also put response for each of my comments and tag them as fixed/solved (if there is such tag). Thanks a lot!

tancheng · 2024-11-13T06:19:10Z

Can we include at least one .cpp that leverages your nonlinear_param.json for testing the new features?

HobbitQia · 2024-11-13T08:30:27Z

I included a nonlinear_test.cpp to test the new features. Later I will explain the structure of param.json and show some examples.

For previous comments that I have solved (e.g. issues about comments, codes that should be deleted), I marked them as resolved. For other comments that I think we should discuss about, I responsed to them and didn't mark them.

tancheng · 2024-11-13T17:09:42Z

I included a nonlinear_test.cpp to test the new features. Later I will explain the structure of param.json and show some examples.

For previous comments that I have solved (e.g. issues about comments, codes that should be deleted), I marked them as resolved. For other comments that I think we should discuss about, I responsed to them and didn't mark them.

Thanks a lot Jiajun~! Let me know when you wanna set up meeting for discussion~

HobbitQia · 2024-11-16T03:11:40Z

Glad to share my improvement in detail.

param.json

The main change of param.json is the paramter additionalFunc. If we want to enable a special function call in CGRA, we can write call-<function name>: [tile numbers] in additionalFunc. Then the corresponding tiles will be able to execute this function. Similar to the complex operations (i.e. the fused operations like phi-add-add), we can also write complex-<function name>: [tile numbers] in additionalFunc. The corresponding tiles will be able to perform this complex operation.

For compatibility with previous code, we can also write complex:[tile numbers] (i.e. no specific function name). Then all complex operations like phi-add, mul-add... will be regarded as the same kind, which I called general fusion rather than fine-grained fusion.

Take test/nonlinear_param.json as an example. In the code below, there is a special function call fp2fx, enabled in the tile 4,8,7,11, and two complex operations BrT enabled in the tile 4,5,6,7 and CoT enabled in the tile 8,9,10,11.
```
"additionalFunc"        : {
                            "call-fp2fx" : [4,8,7,11],
                            "load" : [0,1,2,3],
                            "store": [0,1,2,3],
                            "complex-BrT" : [4,5,6,7],
                            "complex-CoT" : [8,9,10,11],
                            "div" : [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
                          }
```
It's worth noting that param.json only configures the tiles, and the kernel codes are not effected by this file. So whether a tile can support a special function call or complex operation is also determined by the fusion process in the DFG manipulation in the mapper, which I will illustrate in the next section.
Fusion in the mapper

Currently, I still choose to fuse operation manually, which means we need to change the code of DFG.cpp and add the fusion patterns in C++. When we do fusion, we need to pass a name for the new combined pattern, and the name should be consistent with the param.json. Take nonlinear_combine() in DFG.cpp:42-53 as an example. In the code below, there are 7 fused patterns and I classify them into 2 categories: BrT and CoT, which are supported by the tiles 4,5,6,7 and 8,9,10,11 respectively, as we have configured in the param.json.
```
combineMulAdd("CoT");
combinePhiAdd("BrT");
combine("fcmp", "select", "BrT");
combine("icmp", "select", "BrT");
combine("icmp", "br", "CoT");
combine("fcmp", "br", "CoT");
combineAddAdd("BrT");
```
Similiary, to be compatible, when calling combine() we can pass the empty string as the paramter, which means this pattern is combined in the general fusion.
The special function call

The special function call is a little different from the complex operation. The name is determined by the name of kernel code. Take fp2fx as an example. The code below will be regarded as a special function call and in the mapper will get its function name through the method demangle. Then to support this call, there must be call-fp2fx in the param.json.
```
__attribute__((noinline)) float fp2fx(float x) {
    return x + 1.0;    
}
...
float x = fp2fx(1.0);
...
```
Example of tuning division pattern.

The left is the snapshot of the original DFG and the right is the new one under a vector factor of 4. We can see that the division is splitted into 4 nodes.
Example of nonlinear_test.cpp

The DFG is shown below, and we can see the fp2fx and faddmuladd (i.e. CoT)

HobbitQia · 2024-11-16T04:07:55Z

One more point:
I want to discuss whether we should provide interface for users to specify the fused patterns so that they don't need to change the code of the mapper. Since I remember there are similar functions in the project mlir-cgra and I am not sure about it's necessary to have the similar interface in the mapper.

tancheng · 2024-11-16T05:06:49Z

when calling combine() we can pass the empty string as the paramter, which means this pattern is combined in the general fusion.

You mean empty string for type, right? like combine("fcmp", "select", ""); And we currently don't have such use-case, right?

Can you please help to include the vectorFactorForIdiv into the nonlinear_test.cpp as nonlinear_div_test.cpp, so all the three features are tested.
And include this test into the testing flow:

CGRA-Mapper/.github/workflows/cmake.yml

Line 37 in e76bf4e

run: make

I really appreciate the contribution!

User interface for custom pattern.

Sure, but this could be our future work in another PR when you have bandwidth.

@cwz920716 Just FYI :-) Jiajun is one of the best students we have worked with :-)

HobbitQia · 2024-11-16T06:32:35Z

You mean empty string for type, right? like combine("fcmp", "select", ""); And we currently don't have such use-case, right?

Yes, and actually there are many cases in DFG.cpp written in the past (since the default value of type is the empty string).

I tried to test vectorFactorForIdiv in nonlinear_test.cpp, however, I found it hard to test all three features in a single file. Since to test function calls, we need to call non-inline special functions which are regared as non-vectorized, thus LLVM Auto-Vectorization Pass could not vectorize the whole loop. I didn't find a method to solve it. Do you have any ideas?

tancheng · 2024-11-16T06:39:17Z

Then let's have nonlinear_test.cc and div_test.cc?

.github/workflows/cmake.yml

HobbitQia · 2024-11-16T07:01:59Z

There was a problem with the test...However, I can run test on my local environment. I am trying to figure out the reason.

I think I find the problem! It's due to param.json is not updated and not compatible with current workflow. @tancheng Can I replace the orignal param.json with nonlinear_param.json? Or do u have other elegant methods to handle it?

tancheng · 2024-11-16T07:08:41Z

Right, the file name is hardcoded in the .cpp...
I think you can create another two folders in the test folder, and provide corresponding param.json.

tancheng · 2024-11-16T07:14:32Z

.github/workflows/cmake.yml


    - name: Test Idiv Feature
-      working-directory: ${{github.workspace}}/test
-      run: clang-12 -emit-llvm -O3 -fno-unroll-loops -fno-vectorize -o idiv_test.bc -c idiv_test.cpp && opt-12 -load ../build/src/libmapperPass.so -mapperPass idiv_test.bc
+      working-directory: ${{github.workspace}}/test/nonlinear_test


HobbitQia · 2024-11-16T07:15:01Z

A little strange...let me check it carefully

tancheng · 2024-11-16T07:16:56Z

A little strange...let me check it carefully

No worry~ Thanks!

tancheng · 2024-11-16T18:09:22Z

Based on the error msg, seems failed at isVectorized().

HobbitQia · 2024-11-17T11:28:09Z

Based on the error msg, seems failed at isVectorized().

I found something strange...We used raw_string_ostream() in DFGNode::isVectorized() and raw_fd_ostream() in DFG::generateDot. I deleted them and the workflow can be run correctly. However, I don't know why they will cause interrupt of our programs and when I can use these functions in my local environment everything is ok...

tancheng · 2024-11-17T17:59:42Z

Based on the error msg, seems failed at isVectorized().

I found something strange...We used raw_string_ostream() in DFGNode::isVectorized() and raw_fd_ostream() in DFG::generateDot. I deleted them and the workflow can be run correctly. However, I don't know why they will cause interrupt of our programs and when I can use these functions in my local environment everything is ok...

Seems some library missing: jupyter-xeus/xeus-cling#234 (comment)?

Or you can try to use some C++ standard string write/read/stream functions to replace the LLVM raw_xx_ostream()?

HobbitQia and others added 13 commits July 17, 2024 19:12

[feat] add support for look-up-table

ede67e7

merge

3d060fd

[fix] change initialization for call and lut, then we can enable it o…

32a9671

…n specific tiles.

[update] add special functional support for div

219d1d8

[update] add special functional support for quantization

b7f2926

[fix] combine getelemtnptr + load/store in DFG

e26df4f

[update] add support for int

7fb0f61

[feat] add support for fine-grained fusion

7fbd95f

[update] add support for new special funtionalities in dfg

be01199

[update] Add more detailed comments

c7df862

[feat] split non-vectorized operations into seperate nodes rather tha…

f405e5e

…n a single node. e.g. division

[fix] only enable vectorization in integer format

42df212

[feat] parameterizable vectorization factor for tuning division patterns

73df393

tancheng reviewed Oct 26, 2024

View reviewed changes

HobbitQia added 2 commits November 12, 2024 16:03

[feat] refactor codes for fine-grained fusion and special functions

e9d6c61

[fix] resolve conflicts

67ffd08

[update] fix bugs & add a reference param.json

050f675

[update] add nonlinear_test.cpp

daceefd

[update] rename vectorFactor to vectorFactorForIdiv

88840d2

[update] add idiv_test and enable github testing flow

f0967de

[fix] correct path in cmake.yml

553c187

tancheng reviewed Nov 16, 2024

View reviewed changes

.github/workflows/cmake.yml Show resolved Hide resolved

[fix] adjust file struct of test

514b909

tancheng reviewed Nov 16, 2024

View reviewed changes

HobbitQia closed this Nov 17, 2024

HobbitQia reopened this Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add support for nonlinear operations #27

[feat] Add support for nonlinear operations #27

HobbitQia commented Oct 25, 2024

tancheng left a comment

tancheng Oct 26, 2024

HobbitQia Nov 13, 2024

tancheng Oct 26, 2024

HobbitQia Nov 13, 2024

tancheng commented Oct 26, 2024

tancheng commented Nov 12, 2024

tancheng commented Nov 13, 2024

HobbitQia commented Nov 13, 2024 •

edited

Loading

tancheng commented Nov 13, 2024

HobbitQia commented Nov 16, 2024

HobbitQia commented Nov 16, 2024

tancheng commented Nov 16, 2024

HobbitQia commented Nov 16, 2024

tancheng commented Nov 16, 2024

HobbitQia commented Nov 16, 2024 •

edited

Loading

tancheng commented Nov 16, 2024

tancheng Nov 16, 2024

HobbitQia Nov 16, 2024

HobbitQia commented Nov 16, 2024

tancheng commented Nov 16, 2024

tancheng commented Nov 16, 2024

HobbitQia commented Nov 17, 2024 •

edited

Loading

tancheng commented Nov 17, 2024

[feat] Add support for nonlinear operations #27

Are you sure you want to change the base?

[feat] Add support for nonlinear operations #27

Conversation

HobbitQia commented Oct 25, 2024

tancheng left a comment

Choose a reason for hiding this comment

tancheng Oct 26, 2024

Choose a reason for hiding this comment

HobbitQia Nov 13, 2024

Choose a reason for hiding this comment

tancheng Oct 26, 2024

Choose a reason for hiding this comment

HobbitQia Nov 13, 2024

Choose a reason for hiding this comment

tancheng commented Oct 26, 2024

tancheng commented Nov 12, 2024

tancheng commented Nov 13, 2024

HobbitQia commented Nov 13, 2024 • edited Loading

tancheng commented Nov 13, 2024

HobbitQia commented Nov 16, 2024

HobbitQia commented Nov 16, 2024

tancheng commented Nov 16, 2024

HobbitQia commented Nov 16, 2024

tancheng commented Nov 16, 2024

HobbitQia commented Nov 16, 2024 • edited Loading

tancheng commented Nov 16, 2024

tancheng Nov 16, 2024

Choose a reason for hiding this comment

HobbitQia Nov 16, 2024

Choose a reason for hiding this comment

HobbitQia commented Nov 16, 2024

tancheng commented Nov 16, 2024

tancheng commented Nov 16, 2024

HobbitQia commented Nov 17, 2024 • edited Loading

tancheng commented Nov 17, 2024

HobbitQia commented Nov 13, 2024 •

edited

Loading

HobbitQia commented Nov 16, 2024 •

edited

Loading

HobbitQia commented Nov 17, 2024 •

edited

Loading