-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
excess CPU time taken (Chrome, Brave) #112
Comments
There's a small chance this is relevant. I hesitate to even mention it, but the involvement of WebSockets has me perking up my ears, but the age of that ticket tells me to just keep it in mind until I know more. |
|
Even though this is ostensibly about CPU, I just played a tourney and used Chrome's Task Manager to monitor memory and sure enough, we have a leak. Additionally, right at the end, I got a panic similar to what I reported in #105 but it was right as one of Chrome's fields ticked over from 1.9 GB to 2.0 GB. WASM is 32 bits and it's not inconceivable that 2 GB is a magic number. So we may have three separate bugs, one with excess CPU consumption, one at the end of tourneys and one being a memory leak, but the memory leak is pretty much guaranteed to make the game an unpleasant experience for everyone, so that's what I'll be concentrating on. For the record, although I forgot to copy his info verbatim, Quick was using a 32 GB water cooled machine (Alienware, I think) and was using the latest Chrome and he was getting behavior similar to Grizz, where he started getting a lot of lag. I had no perceptible lag up until the panic. However, I'm running MacOS, Quick and Grizz are running Windows. Quick also mentioned a different memory stat:
I wasn't running Activity Monitor, so I don't know if I had similar memory bloat. All of this is trivially reproducible, although it takes time when we play for real. If I knew my web development tools better there's a good chance it handed us the answer on a platter. However, I can read up on the available tools and I can also write some test code. |
Oh, and I have a hunch that the memory leak has to do with the addition of text to my view, because the consumption appeared to be accelerating. At the time of the crash, the hand-history had 1,358 lines in it. I don't keep track of how many lines are generated per action, but my guess is the average would be about 5. As it is, I saved a copy of the buffer (currently in /tmp) and could even use it as a test case (not that it holds anything special that Lorem Ipsum doesn't). |
My hunch was wrong. I wrote a little test program that sends a bunch of lines to a div and there's essentially no memory growth. I then tried routing them through a websocket echo server and no memory loss. FWIW, in my real code I'm using serde_cbor, but it doesn't have any unsafe code, so I don't think that's an issue. At this point, my inability to get geiger to work with It can't be user input, because I was observing a table after I bust and the memory was climbing as Grizz repeatedly raised the two remaining players on vacation. At that point all I'm doing is receiving data from the socket, deserializing it, translating it to text, putting it in the window and scrolling. Translating it to text is done without any unsafe code. I guess scrolling could be it, although that seems exceedingly unlikely. Wait. Now that I think about it, the way I scroll could actually be it. I wouldn't think so, but who knows and it's easy to test. |
There are a ton of little differences between my sample program that doesn't leak and my table code. Each one seems unlikely, but something is causing the leak. In no particular order, here are the differences I'm finding by doing a cursory examination of my table code:
Of the above, I think my next step should be to temporarily put my little test program aside and just make a branch of mb2 where I hack up the lobby and the server so that as soon as a connection is made the server sends a bunch of lobby messages and we display them. I haven't seen memory issues in the lobby, but the lobby really doesn't get many messages compared to a game. Once that's going, we can go back from wss to ws, Json from CBorMinimal, and text vs. binary. OTOH, if I can't get the lobby leaking memory, I'll need to explore the possibility that the table being in a pop-up matters. One thing that is a little strange is that Chrome's Task Manager doesn't show the table as a separate entity, like it does different tabs. Anyway, if I go any further tonight I won't be able to easily able to stop, so I'll do all of the above first thing tomorrow. |
I couldn't sleep, so I hacked away and I have a modified lobby that leaks memory, but it no longer does most of the things it used to. However, it still:
I modified my toy program to be very much like my modified lobby/server in that it simply sends one command to the "echo server", an then the echo server then echos it back 1,500 times. I also tried commenting out the code in the lobby that pushd the line to my log_data and when I did, there was no memory leak, but when I made it so the actix-web server only sent back one chat line after receiving the log in but made it so that when we got that chat line we pushed it to log_data 1,500 times, we also didn't have a memory leak. So it's not just the log_data having 1,500 lines in it (which I didn't think was the case). There may be other differences other than the two bullet items above, but I can probably test each of them without too much difficulty, although I'm pretty tired right now. |
Ugh. Turns out it was wee-alloc. That was a difference, but I didn't notice it until I went to remove yew-router from Cargo.toml, not because I was suspicious of yew-router, but because I was running out of differences. My guess is I can make my tiny self-contained example show this problem and then I can make it even more minimal (for example, I doubt that it's necessary to bounce messages through a WebSocket, but who knows, wee_alloc must work in most cases, or someone else would have already discovered this issue). However, before I do anything else, I'm going to get a new client out. |
Deployed and slightly tested on https://devctm.com:8081 However, simply adding wee_alloc to my small test program did not invoke the leak, so there's something else necessary as well. I have a hunch. I'll keep this issue open until I figure out what that other piece is. |
I created yew_wee_alloc_leak_demo and reported it. |
Grizz was having a lot of trouble with his client yesterday, where it just got laggy and eventually he had to "reboot". I don't know what he meant by that and will need to seek clarification, but at minimum I believe he restarted his browser and there's.a chance he restarted his OS.
I didn't have that problem, per-se, but there have been a few times where I've been playing and my fan has come on and sure enough Activity Monitor shows a Brave process running at 100% of CPU. In every case when I've force-quit that process, my GMail tab instantly shows that it died. However, I haven't had this problem with GMail when I haven't been playing games on mb2, so I suspect Grizz's issue and mine are essentially the same ones (Brave is a Chrome variant).
What makes this especially weird is that this morning when I got up I had a GMail process consuming a bunch of CPU time and not only did I not have any window/tab running the mb2 client, I hadn't had one up in hours (since I had been sleeping), although my computer had been sleeping too.
I don't recall this ever happening back when we were doing Sunday 2pm play-testing. Since then I've upgraded to newer versions of Yew and I've also moved from std-web to web-sys. I suspect that a change to my stack is responsible for this, but right now I don't even know for sure that it's even mb2 related (although Grizz's issue is mb2 related!).
This one is going to be a PITA to track down, however if it is a problem with my stack then it's quite possible someone else will hit it and it will get resolved all by itself. If not, even though it will be a PITA, I believe that will just be because it will be wall-clock time consuming, but not that it will require a lot of mental work, per-se. Either way, although this is a huge issue, it's not yet a huge priority.
The text was updated successfully, but these errors were encountered: