Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Add support for nonlinear operations #27

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

HobbitQia
Copy link
Contributor

I'm so excited to share my updates on the mapper with you. The main changes are listed below:

  1. In DFG.cpp, I added a function called nonlinear_combine(), which will fuse the common patterns occurring in nonlinear operations.
  2. For special functions (e.g., LUT, FP2FX), I recognize them in the DFG through the names of function calls, then demangle their names. Take LUT as an example: In C++ kernel codes, we should define LUT as a function like:
    __attribute__((noinline)) DATA_TYPE lut(DATA_TYPE x) { ... }
    Then our mapper will traverse all functional calls, find special functions by the name like demangle(newName) == "lut(float)". We determine a DFG node which contains a special function through DFGNode::getOpcodeName() and compare the operation name with names of predefined special functions. Details can be seen in DFG.cpp:408-432 and DFGNode::isLut(). l For CGRA mapping I also added special functions so that we can choose to configure a tile to equip with LUT/FP2FX or not. We can specify CGRA's nodes through additionalFunc in param.json. Here's an example:
    "additionalFunc"        : {    "lut" : [1, 2, ...], ...    }
  3. For vectorization, I leverage the original mapper to mark the vectorized operations through DFGNode::isVectorized. However for divison we cannot vectorize it since it's hard to support efficient vector divisor from the hardware perspective. Thus I chose to split a divison nodes into multiple nodes and reconnect the precursors and successors in DFG::tuneDivPattern(). Notably, different vectorization factors will lead to different number of node (e.g. if VF=4 we should split a divison into 4 nodes) thus I added a parameter in the params.json so that we can specify the VF (default to 1).
  4. Support for fine-grained fusion. When combining different patterns into a single node, I added a paramter specified by users to determine the "class" of the patterns (a "class" can have multiple fused patterns). Different tiles can support different "classes" of fused patterns, which are also specified in param.json. Here's an example:
    "additionalFunc"        : {     "complex1" : [4,5,6,7],    "complex2" : [8,9,10,11],    "complex3" : [0,1,2,3], ...   }
    In this configurations there are three classes of fused patterns and each class is supported by a set of tiles.
    Above are my main changes, along with some bug fixed. (sry that I cannot remember everything...)

Copy link
Owner

@tancheng tancheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate the PR! Can you also provide a param.json to enable the functionality provided by your PR as an example, and include in the action (i.e., github testing automation)?

src/CGRA.cpp Outdated Show resolved Hide resolved
}
// for (int r=0; r<t_rows; ++r) {
// for (int c=0; c<t_columns; ++c) {
// nodes[r][c]->enableCall();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is a bug fix, right? So from now on, call can only be supported if user specify it in the param.json or user needs to modify this CGRA.cpp file?

And is this call actually how is the lut is recognized? i.e., instead of support call, user provide the lut func that is actually called in the IR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, call can only be supported if user specify. For lut, it should be writed as call-lut in param.json after I refactored the code.

Comment on lines +171 to +176
// for (int r=0; r<t_rows; ++r) {
// for (int c=0; c<t_columns; ++c) {
// // if(c == 0 || (r%2==1 and c%2 == 1))
// nodes[r][c]->enableComplex();
// }
// }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Means heterogeneity in the param.json won't take effect any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Here I mean complex operations should be manually specified in the param.json rather than we configure it for the default.

src/CGRANode.cpp Outdated Show resolved Hide resolved
src/CGRANode.cpp Outdated Show resolved Hide resolved
src/DFGNode.cpp Outdated Show resolved Hide resolved
src/DFGNode.cpp Outdated Show resolved Hide resolved
src/DFGNode.h Outdated Show resolved Hide resolved
src/DFGNode.h Outdated Show resolved Hide resolved
src/DFGNode.h Outdated Show resolved Hide resolved
@tancheng
Copy link
Owner

Plz also resolve the conflict. Thanks!

@tancheng
Copy link
Owner

Thanks @HobbitQia, plz also put response for each of my comments and tag them as fixed/solved (if there is such tag). Thanks a lot!

@tancheng
Copy link
Owner

Can we include at least one .cpp that leverages your nonlinear_param.json for testing the new features?

@HobbitQia
Copy link
Contributor Author

HobbitQia commented Nov 13, 2024

I included a nonlinear_test.cpp to test the new features. Later I will explain the structure of param.json and show some examples.

For previous comments that I have solved (e.g. issues about comments, codes that should be deleted), I marked them as resolved. For other comments that I think we should discuss about, I responsed to them and didn't mark them.

@tancheng
Copy link
Owner

I included a nonlinear_test.cpp to test the new features. Later I will explain the structure of param.json and show some examples.

For previous comments that I have solved (e.g. issues about comments, codes that should be deleted), I marked them as resolved. For other comments that I think we should discuss about, I responsed to them and didn't mark them.

Thanks a lot Jiajun~! Let me know when you wanna set up meeting for discussion~

@HobbitQia
Copy link
Contributor Author

Glad to share my improvement in detail.

  1. param.json

    The main change of param.json is the paramter additionalFunc. If we want to enable a special function call in CGRA, we can write call-<function name>: [tile numbers] in additionalFunc. Then the corresponding tiles will be able to execute this function. Similar to the complex operations (i.e. the fused operations like phi-add-add), we can also write complex-<function name>: [tile numbers] in additionalFunc. The corresponding tiles will be able to perform this complex operation.

    For compatibility with previous code, we can also write complex:[tile numbers] (i.e. no specific function name). Then all complex operations like phi-add, mul-add... will be regarded as the same kind, which I called general fusion rather than fine-grained fusion.

    Take test/nonlinear_param.json as an example. In the code below, there is a special function call fp2fx, enabled in the tile 4,8,7,11, and two complex operations BrT enabled in the tile 4,5,6,7 and CoT enabled in the tile 8,9,10,11.

    "additionalFunc"        : {
                                "call-fp2fx" : [4,8,7,11],
                                "load" : [0,1,2,3],
                                "store": [0,1,2,3],
                                "complex-BrT" : [4,5,6,7],
                                "complex-CoT" : [8,9,10,11],
                                "div" : [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
                              }

    It's worth noting that param.json only configures the tiles, and the kernel codes are not effected by this file. So whether a tile can support a special function call or complex operation is also determined by the fusion process in the DFG manipulation in the mapper, which I will illustrate in the next section.

  2. Fusion in the mapper

    Currently, I still choose to fuse operation manually, which means we need to change the code of DFG.cpp and add the fusion patterns in C++. When we do fusion, we need to pass a name for the new combined pattern, and the name should be consistent with the param.json. Take nonlinear_combine() in DFG.cpp:42-53 as an example. In the code below, there are 7 fused patterns and I classify them into 2 categories: BrT and CoT, which are supported by the tiles 4,5,6,7 and 8,9,10,11 respectively, as we have configured in the param.json.

    combineMulAdd("CoT");
    combinePhiAdd("BrT");
    combine("fcmp", "select", "BrT");
    combine("icmp", "select", "BrT");
    combine("icmp", "br", "CoT");
    combine("fcmp", "br", "CoT");
    combineAddAdd("BrT");

    Similiary, to be compatible, when calling combine() we can pass the empty string as the paramter, which means this pattern is combined in the general fusion.

  3. The special function call

    The special function call is a little different from the complex operation. The name is determined by the name of kernel code. Take fp2fx as an example. The code below will be regarded as a special function call and in the mapper will get its function name through the method demangle. Then to support this call, there must be call-fp2fx in the param.json.

    __attribute__((noinline)) float fp2fx(float x) {
        return x + 1.0;    
    }
    ...
    float x = fp2fx(1.0);
    ...
  4. Example of tuning division pattern.

    The left is the snapshot of the original DFG and the right is the new one under a vector factor of 4. We can see that the division is splitted into 4 nodes.

  5. Example of nonlinear_test.cpp

    The DFG is shown below, and we can see the fp2fx and faddmuladd (i.e. CoT)

@HobbitQia
Copy link
Contributor Author

One more point:
I want to discuss whether we should provide interface for users to specify the fused patterns so that they don't need to change the code of the mapper. Since I remember there are similar functions in the project mlir-cgra and I am not sure about it's necessary to have the similar interface in the mapper.

@tancheng
Copy link
Owner

when calling combine() we can pass the empty string as the paramter, which means this pattern is combined in the general fusion.

You mean empty string for type, right? like combine("fcmp", "select", ""); And we currently don't have such use-case, right?

  • Can you please help to include the vectorFactorForIdiv into the nonlinear_test.cpp as nonlinear_div_test.cpp, so all the three features are tested.
  • And include this test into the testing flow:

I really appreciate the contribution!

User interface for custom pattern.

Sure, but this could be our future work in another PR when you have bandwidth.

@cwz920716 Just FYI :-) Jiajun is one of the best students we have worked with :-)

@HobbitQia
Copy link
Contributor Author

You mean empty string for type, right? like combine("fcmp", "select", ""); And we currently don't have such use-case, right?

Yes, and actually there are many cases in DFG.cpp written in the past (since the default value of type is the empty string).

I tried to test vectorFactorForIdiv in nonlinear_test.cpp, however, I found it hard to test all three features in a single file. Since to test function calls, we need to call non-inline special functions which are regared as non-vectorized, thus LLVM Auto-Vectorization Pass could not vectorize the whole loop. I didn't find a method to solve it. Do you have any ideas?

@tancheng
Copy link
Owner

Then let's have nonlinear_test.cc and div_test.cc?

@HobbitQia
Copy link
Contributor Author

HobbitQia commented Nov 16, 2024

There was a problem with the test...However, I can run test on my local environment. I am trying to figure out the reason.

I think I find the problem! It's due to param.json is not updated and not compatible with current workflow. @tancheng Can I replace the orignal param.json with nonlinear_param.json? Or do u have other elegant methods to handle it?

@tancheng
Copy link
Owner

Right, the file name is hardcoded in the .cpp...
I think you can create another two folders in the test folder, and provide corresponding param.json.


- name: Test Idiv Feature
working-directory: ${{github.workspace}}/test
run: clang-12 -emit-llvm -O3 -fno-unroll-loops -fno-vectorize -o idiv_test.bc -c idiv_test.cpp && opt-12 -load ../build/src/libmapperPass.so -mapperPass idiv_test.bc
working-directory: ${{github.workspace}}/test/nonlinear_test
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idiv_test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@HobbitQia
Copy link
Contributor Author

A little strange...let me check it carefully

@tancheng
Copy link
Owner

A little strange...let me check it carefully

No worry~ Thanks!

@tancheng
Copy link
Owner

Based on the error msg, seems failed at isVectorized().

@HobbitQia HobbitQia closed this Nov 17, 2024
@HobbitQia HobbitQia reopened this Nov 17, 2024
@HobbitQia
Copy link
Contributor Author

HobbitQia commented Nov 17, 2024

Based on the error msg, seems failed at isVectorized().

I found something strange...We used raw_string_ostream() in DFGNode::isVectorized() and raw_fd_ostream() in DFG::generateDot. I deleted them and the workflow can be run correctly. However, I don't know why they will cause interrupt of our programs and when I can use these functions in my local environment everything is ok...

@tancheng
Copy link
Owner

Based on the error msg, seems failed at isVectorized().

I found something strange...We used raw_string_ostream() in DFGNode::isVectorized() and raw_fd_ostream() in DFG::generateDot. I deleted them and the workflow can be run correctly. However, I don't know why they will cause interrupt of our programs and when I can use these functions in my local environment everything is ok...

Seems some library missing: jupyter-xeus/xeus-cling#234 (comment)?

Or you can try to use some C++ standard string write/read/stream functions to replace the LLVM raw_xx_ostream()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants