Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider clearing state of iterator state machine when disposal completes #75666

Open
jcouv opened this issue Oct 29, 2024 · 1 comment · May be fixed by #75908
Open

Consider clearing state of iterator state machine when disposal completes #75666

jcouv opened this issue Oct 29, 2024 · 1 comment · May be fixed by #75908
Assignees
Labels
3 - Working Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Milestone

Comments

@jcouv
Copy link
Member

jcouv commented Oct 29, 2024

In the example below from @stephentoub, the local values2 in the iterator state machine holds a reference to a large allocation but the reference is kept alive past the foreach.

using System.Reflection;
using System.Collections.Generic;
using System;
using System.Linq;

IEnumerable<int> values = Produce();

Console.Write("here");
foreach (int value in values)
{
}

while (true)
{
    GC.Collect();
    Console.WriteLine(((int[])values.GetType().GetField("<values2>5__1", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(values)).Length);
}

static IEnumerable<int> Produce()
{
    int[] values2 = Enumerable.Range(0, 100_000_000).ToArray();
    foreach (int value in values2) yield return value;
}

The foreach calls GetEnumerator which may return this.

        IEnumerator<int> IEnumerable<int>.GetEnumerator()
        {
            if (<>1__state == -2 && <>l__initialThreadId == Environment.CurrentManagedThreadId)
            {
                <>1__state = 0;
                return this;
            }
            return new <<<Main>$>g__Produce|0_0>d(0);
        }

Then the state machine does the allocation:

    [CompilerGenerated]
    private sealed class <<<Main>$>g__Produce|0_0>d : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable
    {
        private int[] <values2>5__1;

        private bool MoveNext()
        {
                ...
                <values2>5__1 = Enumerable.ToArray(Enumerable.Range(0, 100000000));
                ...
       }

That field for hoisted local values2 never gets reset, so we hold onto the allocation for as long as the enumerable values is kept around.
The proposal would be for the Dispose method to reset all the enumerator state in the state machine once disposal completes.

@jcouv jcouv added Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code labels Oct 29, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Oct 29, 2024
@jaredpar jaredpar added this to the 17.13 milestone Oct 29, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged Issues and PRs which have not yet been triaged by a lead label Oct 29, 2024
@jcouv
Copy link
Member Author

jcouv commented Nov 11, 2024

Taking some notes from investigation:
We already have logic to clear unmanaged hoisted locals when we exit scopes. But that logic has a number of limitations:

  1. we don't clear top-level locals on normal exit
  2. we don't clear any locals on yield break
  3. we don't clear any locals on early exit (from yield return, ie. foreach (var i in coll) { break; })
  4. we don't clear any locals on throw

So the current design only works for nested locals on normal exit from the scope...

static IEnumerable<int> Produce()
{
    {
        string hoistedNestedLocal = "";
        // existing clearing logic doesn't work for:
        // `yield break;`
        // `yield return;` and the caller breaks `foreach (var i in coll) { break; }`
        // `throw`
    }

    string hoistedTopLevelLocal = "";
    // existing clearing local doesn't work for:
    // normal exit
    // `yield break;`
    // `yield return;` and the caller breaks `foreach (var i in coll) { break; }`
    // `throw`
}

From discussion with Jared, letting go of references is best effort. It doesn't need to be complete/perfect. In particular, we don't want to pay the cost of an extra try/finally to achieve it. So I'm not proposing to address scenario 4 (thrown exception) perfectly. If an exception is thrown during disposal, the hoisted locals won't be cleared.

We discussed two options for iterators with Aleksey, which would address all non-exceptional scenarios, for both top-level and nested locals:

  1. clear hoisted all locals as last thing in Dispose (could be made cheap by holding all enumerator state with a nested struct)
  2. replicate block structure in Dispose, clear locals in that scope at end of block

For async-iterators, there are similar alternatives which could work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Working Area-Compilers Code Gen Quality Room for improvement in the quality of the compiler's generated code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants