Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recreate llvm-backend gcroots code on develop branch #1077

Open
31 tasks done
dwightguth opened this issue May 31, 2024 · 2 comments
Open
31 tasks done

Recreate llvm-backend gcroots code on develop branch #1077

dwightguth opened this issue May 31, 2024 · 2 comments

Comments

@dwightguth
Copy link
Collaborator

dwightguth commented May 31, 2024

There exists a body of code that was applied to a branch of the llvm backend in the past whose purpose was to allow garbage collection to be triggered by the allocator when the allocator ran out of space, rather than attempting to only trigger garbage collection after a rewrite step. This can prevent leaks when a long sequence of function calls occurs between two rewrite steps.

However, that code has bit-rotted and it is not at all trivial to resurrect the branch. Instead, we are going to attempt to replay the changes of the branch onto develop behind a feature flag.

This is pretty low priority and also very time consuming, so we should not pick up this work unless it is likely to directly affect the current milestone or if we have a lot of free time on our hands. With that being said, I am going to attempt to describe the changes that were made in the interests of documenting the work that will be required.

Right now this is going to be pretty low level because I don't have access to a clear picture of the high-level changes that were made due to the messiness of the branch. I will try to separate this out into a high level work plan in the future.

Here are the low-level changes that need to be replayed:

  • Created a sort category for Set iterators and Map iterators
  • Added libunwind as a dependency
  • Applied the following additional LLVM passes during optimization, some of which are new in this branch:
    • Mark tail calls as gc leaf
    • No omit frame pointer for tailcc
    • Rewrite statepoints for GC
    • Emit GC layout info
  • Applied llvm-link to opaque.ll (see below) after Rewrite Statepoints for GC is applied
  • Added functions to emit code that enables and disables garbage collector to codegen
  • Added new functions to code generator to emit an allocation that will not be relocated
  • Added new functions to code generator to cast address spaces 0 and 1
  • Added new function to reset the alwaysgcspace only
  • Disable garbage collection during allocation if we are allocating via immer allocator
  • Move garbage collected global variables to address space 1
  • Add set iterator and map iterator types to llvm bitcode
  • Added function to bitcode to return the stack maps provided by LLC
  • Modified allocations to koreAllocAlwaysGC to use the no-reloc allocator in some cases
  • Disable garbage collection during hook execution
  • Set garbage collector to "statepoint-example" on functions that should be given stack maps
  • abstract out ptr to int, int to ptr, and address space casts into a separate opaque.ll bitcode file that is not visible to the statepoints algorithm
  • mark finish_rewriting as does not return
  • Remove gc logic from stepFunctionHeader
  • Call koreClear on each rewrite step
  • Copy "getMangledTypeStr" function from llvm codebase into our own
  • Add version of allocation routine for each llvm type allocated
  • Create 3 llvm passes:
    • Emit GC Layout Info: this one is a problem: we are using it to get the sort category of all relocated pointer types, but this is impossible with opaque pointer types. We will need to figure out a new way to do this.
    • Mark Tail Calls as GC Leaf: sets the function attributes on all tailcc functions with tailcc tail calls to include the gc-leaf-function attribute.
    • No Omit Frame Pointer for Tailcc: sets frame pointer retention to "non-leaf" on tailcc functions.
  • disable gc during allocations by GMP and MPFR
  • Trigger collection during allocation if gc is enabled and we don't have enough room.
  • leave 1kb in block before attempt to collect.
  • Added code to parse stack map
  • Added code to find base of allocation from derived pointer to map/set
  • Added GC code to handle map and set iterator roots.
  • Use libunwind and stack map to find gc roots
  • Modify GC to relocate gc roots
  • Disable garbage collection during initial configuration construction
@dwightguth
Copy link
Collaborator Author

dwightguth commented Jun 3, 2024

I am going to attempt to break this down into some slightly more atomic changes that can be applied as a sequence of high level changes to the backend. This can constitute a sequence of pull requests.

  1. Add SetIterator and MapIterator sort categories
    • sort category
    • llvm bitcode type declarations
  2. Add libunwind as dependency to project
  3. gc_enabled flag:
    • Add functions to code generator
    • Disable during immer allocation
    • Disable during hook execution
    • Disable during GMP and MPFR allocation
    • Disable during initial configuration construction
  4. koreClear: add function and call it every rewrite step
  5. mark finish_rewriting as noreturn in code generator
  6. Add conditional compilation flag
  7. Use addrspace 1
    • For globals
    • Add cast functions to code generator
    • Add function for addrspace 0 allocations
    • Change call sites for addrspace 0 allocations
    • getMangledTypeStr
    • Add typed allocation functions
    • Abstract out casts into opaque.ll
    • Use llvm-link to link opaque.ll
  8. Generate stack maps
    • Add llvm passes
    • Apply llvm passes during code generation
    • Set gc attribute on functions
    • Add function to get stack maps
    • Add function to parse stack maps
  9. Modify collector
    • Remove calls to collector from stepFunctionHeader
    • Trigger collection during allocation if block is near full
    • Parse stack maps
    • Find base pointers
    • Handle setiter and mapiter roots
    • Find and relocate gc roots
    • search roots

@dwightguth
Copy link
Collaborator Author

dwightguth commented Jun 5, 2024

For reference, I am including here the sequence of PRs that comprise most of the changes that were applied to implement this change on the original branch. this is useful because as I am getting into the meat of the changes, it is helpful to refer to the PR descriptions for these PRs to make sure I understand the motivation behind the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant