Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing activity in case of "broken" ActivityID in events #2122

Open
chrisnas opened this issue Oct 21, 2024 · 4 comments
Open

Missing activity in case of "broken" ActivityID in events #2122

chrisnas opened this issue Oct 21, 2024 · 4 comments

Comments

@chrisnas
Copy link
Contributor

There is a bug in the BCL that generates 00 byte in an event ActivityID GUID that leads to stopping the parsing and "hidding" final activities.
Image
Since each Start/Stop action is adding an activity, I was expecting to see them (i.e. /1/4/2000/xx/ instead of /1/4/2000/
However, the GUID seemed to contain the missing xx value like in the second 32 value -0010- or -0020-.

 23360 | 00d0c714-0000-0000-0000-00003d2d6f5a =       /1/4/2000/ > event   1 __ [ 1| Start] RequestStart
       |> http://localhost:5500/Products/Info/1990
 23360 | 00d0c714-0010-0000-0000-00002d2d6f5a =       /1/4/2000/ > event   7 __ [ 1| Start] RequestHeadersStart
       |QH[ 20]
 23360 | 00d0c714-0010-0000-0000-00002d2d6f5a =       /1/4/2000/ > event   8 __ [ 2|  Stop] RequestHeadersStop
      <|QH
 44048 | 00d0c714-0020-0000-0000-00001d2d6f5a =       /1/4/2000/ > event  11 __ [ 1| Start] ResponseHeadersStart
       |RH>
 44048 | 00d0c714-0020-0000-0000-00001d2d6f5a =       /1/4/2000/ > event  12 __ [ 2|  Stop] ResponseHeadersStop
  200 <|RH
 44048 | 00d0c714-0030-0000-0000-00000d2d6f5a =       /1/4/2000/ > event  13 __ [ 1| Start] ResponseContentStart
       |RC>
 44048 | 00d0c714-0030-0000-0000-00000d2d6f5a =       /1/4/2000/ > event  14 __ [ 2|  Stop] ResponseContentStop
      <|RC
 44048 | 00d0c714-0000-0000-0000-00003d2d6f5a =       /1/4/2000/ > event   2 __ [ 2|  Stop] RequestStop
  200 <|

These output were generated to analyze network events (HTTP, socket, security, DNS).

This happens only when a 12 bits value is stored AND the high bits are stored in the low nibble of a byte. The following code update in StartStopActivityComputer.cs seems to avoid the problem and supports the proposed fix in the BCL:

// Compute the number (little endian) (thus backwards).
for (int i = (int)numBytes - 1; 0 <= i; --i)
{
    value = (value << 8) + bytePtr[i];
}

// Print the value
sb.Append(separator).Append(value);

bytePtr += numBytes;        // Advance past the bytes.

// FIX: there is a special case for a value < 4096/0xFFF where the encoder made a mistake
// It is encoded with 1 nibble + 1 byte + 1 byte that contains 0 (hence stopping the parsing)
if ((value >= 0xFF) && (value < 0xFFF) && (bytePtr + 1 < endPtr) && (bytePtr[0] == 0) && (bytePtr[1] != 0))
{
    bytePtr++;  // Advance past the 00 byte
}

Note: feel free to use this test app to debug and validate the fix.

@noahfalk
Copy link
Member

@brianrob - do you think we should treat this guid encoding convention as a public convention of the runtime? Currently our docs warn against tools making use of this encoding but it feels wrong if we are telling other tools not to use it but then PerfView appears to be relying on it.

I agree with all the analysis except I think the boundary checks are off-by-one? Looking at the BCL code I'd expect the issue could occur anywhere (value > 0xFF) && (value <= 0xFFF).

@brianrob
Copy link
Member

@brianrob - do you think we should treat this guid encoding convention as a public convention of the runtime?

If I recall correctly, it's not guaranteed that the GUID provided will be decodable. If it's not, then we'd just show the GUID. That being said, I think it's reasonable to treat it as public with that caveat. Also reasonable to make fixes to make it work better. Tools just need to know that if it doesn't work, you don't crash or misbehave, you just treat it as a GUID.

@chrisnas
Copy link
Contributor Author

I agree with all the analysis except I think the boundary checks are off-by-one? Looking at the BCL code I'd expect the issue could occur anywhere (value > 0xFF) && (value <= 0xFFF).

You're right @noahfalk :^)

@chrisnas
Copy link
Contributor Author

@brianrob - do you think we should treat this guid encoding convention as a public convention of the runtime?

If I recall correctly, it's not guaranteed that the GUID provided will be decodable. If it's not, then we'd just show the GUID. That being said, I think it's reasonable to treat it as public with that caveat. Also reasonable to make fixes to make it work better. Tools just need to know that if it doesn't work, you don't crash or misbehave, you just treat it as a GUID.

The "meaning" of the GUID (i.e. its encoding of information) is important because is allows correlations. For example, network events are connected thanks to child activities stacked from an initial HTTP request but not with RelatedActivityID. This is how I will be able to show the duration of each phase from DNS, HTTPS to request/response times. Without the activities correlation, there is not much one can extract from these events.

BTW, it is possible to find a document that describe this kind of need - https://studylib.net/doc/6851938/eventsource-activity-support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants