-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random/sporadic failures running test suite interactively against Spike #1049
Comments
I just noticed that most if not all of the failures that I have encoutered seem to have this in common:
Does anybody know why this might happen sporadically/randomly? |
Ah.. Sporadic failures. I've never encountered sporadic failures on these exact tests you've mentioned, but I have some experience with debugging sporadic failures in the test suite. For example, on the current TOT I periodically get sporadic failures of A while ago there was an issue like this: riscv-software-src/riscv-tests#520 which was fixed here: riscv-software-src/riscv-tests#522 . Most likely there are still issues of these nature in the testsuite. I also remember that some tests depend on host machine perfomance. For example I know that MemorySampleSingle is affected, see call to:
Internally at Syntacore we just patch-out the source code of this test for it to work on our CI machine. Again, had no time fix this one, it was easier to patch-out and forget about the problem. I have suspicion that you may encounter issue of similar nature.
I would recommend to file the issue there. That being said - both me and @en-sc have write permissions to riscv-tests repository with an expectation that we do reviews/fixes. So at least you've notified us about the issue. Now, if you need help with debugging these it would be great if you could produce test execution logs that capture this sporadic failure. Sometimes careful analysis of these logs is enough to debug the issue. If you choose to attach log files, please run tests as follows (below is just an example):
Important arguments are:
Passing these extra arguments may simplify log analysis a little bit |
I edited the
to this:
And then ran the smoke tests:
Zipped |
@TommyMurphyTM1234 I've finally managed to look at the 20240421-222154-spike64_2-DownloadTest.log.zip log in a greater detail. Why the test fails.I think I know the reason for this specific failure.
So, when gdb requested to calculate crc, this operation resulted int timeout.
Indeed, it took 20 seconds (29895 - 9011) to calculate the sum. This is why we have test failure. The root causeAs for the root cause - this is because spike is a rather slow simulator. In addition to that halt mode in spike is implemented as a busy loop that constantly executes ROM code (at adress 0x1000 if I remember correctly). This means that whenever you have spike halted, the CPU usage by spike is still 100%. How to fixgdbserver.py has an additional
|
Thanks a lot @aap-sc - as ever, you've provided excellent analysis/insight. I've been meaning to revisit this issue but just didn't get a chance yet. When I do I'll definitely take cognizance of your analysis/advice above. Perhaps the test scripts need tweaking (e.g. to increase the relevant timeout(s)) as I would have expected the tests to run successfully out of the box barring "real" issues. But maybe it's difficult to ensure that this happens on all possible hosts. Mine is a VirtualBox VM running Zorin OS 17.1 Core on a 4th generation i5 host so obviously not the most powerful hardware available these days. Anyway, I'll try to revisit this issue again soon. Thanks again. 👍 |
Actually, you are right! If the Now, looking at the value of
|
Should be addressed by riscv-software-src/riscv-tests#553 |
Just to recap - my test host here is:
WIthout this PR I ran the "smoke test" twice and each time it failed: Then I applied the changes in riscv-software-src/riscv-tests#553 and ran it again but it still failed. Seems to be a different failure to the timeout failures though? |
The set of failed tests is different, right? Regarding this one:
Yes, this seems to be another issue. In fact if you read my comment here: #1049 (comment) :
So yes, this test is known to be affected by some other issue that needs to be weeded out. |
Just to clarify - when the test run fails in these cases it fails on one test which terminates the tests altogether.
Ah - sorry, I forgot that you had posted about this. For what it's worth I did a 4th run after a reset of the board and this time all tests completed. So...
|
I did a 5th run and got another failure: Maybe this is the same class of failure as the aforementioned |
Another run and another failure... I need to go back through these failures to see if they are all of the same class - and maybe I should shift this issue/discussion to the |
@TommyMurphyTM1234 well,
|
Thanks - I will log that upstream in the
Let me first (a) collect verbose OpenOCD/GDB remote protocol logs for such test failures and (b) see if I can have a look sooner. :-) |
The issue there is with:
This makes me think that there is not enough sanitization of the command output.
For now I suggest to:
then we should re-run these tests with verbose logs. |
Thanks @aap-sc.
Is there any point in me manually applying that patch and running the tests to see what happens? |
This looks like way too much hassle for my taste. We'll get this changes anyway in due time. Changes from upstream are merged fairly regularly these days, so it should not be too long. |
Ok, thanks. That's fine so. BTW, running the tests repeatedly again I sometimes get other tests failing (and terminating the test run) and not just the three mentioned in the first post in this issue. I'm not sure if I should keep posting logs for such failures? |
I suggest to wait till the situation with MemorySampleSingle is resolved. I've manged to reproduce the problem in our lab environment with MemorySampleSingle and MemorySampleMixed . In addition, I plan to merge the fixed for some sporadic failures observed in other tests (UnavailableMultiTest for example). If we keep constantly to update this ticket it may be hard to follow the status of the current situation. |
Ok, thanks - I'll do that so.
That's great. 👍
I understand and agree. Thanks again @aap-sc. |
The respected fixes are:
Once these fixes are applied I do not observe sporadic failures in our lab environment anymore (just FYI) Now, I suggest to wait till these are merged in and till https://review.openocd.org/c/openocd/+/8227 is available in riscv-openocd (as discussed above). |
@TommyMurphyTM1234 all the changes are merged to riscv-openocd (including https://review.openocd.org/c/openocd/+/8227). The number of sporadic failures should decrease now. We can try to re-run the testsuite on your machine to weed out remaining bugs. Please, let me know if you still have the will and desire to debug this. |
Thanks a lot @aap-sc. Yes, I'm happy and able to do further testing on this. I might get to it later today but, if not, then tomorrow. |
@TommyMurphyTM1234 no these messages are intentional. The intention is to report the selected seed to the user, so we could reproduce the run. This is important because even the compiled binaries sometimes depend on this seed. So it's quite confusing to debug issues and get different binaries every time. |
I manually did a run of all tests against the four targets that the
Note that previous tests were performed on a VirtualBox VM running Zorin OS Core 17.1 as a guest OS on a Windows 10 host whereas these were performed on WSL2/Ubuntu 22.04 on the same Windows 10 host. I'm not sure if the probably effectively lower spec resources of the former may have given different results (and more sporadic failures?) than the latter? |
@TommyMurphyTM1234 so these ones do not show any signs of sporadic failues (for now). Do you have the capability to run this experiment on VirtualBox ? |
Yes - I'll revert to my VirtualBox setup and run the tests again. I'm not sure why there are a few Python exceptions in the logs, if they are sporadic or reproducible (I suspect the former), and if they are anything to be concerned about? |
They are sporadic. They are somewhat reproducible. The reason for this warnings is described in greater detail here: riscv-software-src/riscv-tests#555 . We should not be concerned about these warnings. If someone should be concerned - its Python developers (I mean literally the developers of CPython interpreter) who don't follow their own guidelines and allow exceptions to propagate from finalizers of TemporaryFile objects. One day I may file a bug against that - but right now I'm still hold the grudge against them (since I had to debug this). |
@aap-sc - thanks a lot for that and for the link to the related/explanatory PR which you had previously tagged me on but which I must've missed. 👍 |
Hi @aap-sc, I've reverted to my original VirtualBox/Zorin OS 17.1 setup and have run the tests (just once) there with the latest of everyting (
I will run them a few more times just to see if there are any sporadic failures. |
Second run and one failure this time:
|
Regarding the MemorySampleMixed. I see the issue. I'll dig to find the root cause. Hopefully it won't take long (no promises, though). @TommyMurphyTM1234 if you don't mind - can I ask you to run these 10-20 times more and report if there are more failures? |
Great - thanks @aap-sc.
Will do - I'll do 10 anyway as it take quite a while to run each set of four target tests on the virtual machine. |
Here are the results of 10 further runs:
|
ghrrrr. That's way more failures than I expected :( |
I'll add further results here:
|
That's the results of 20 runs of these tests - I won't do any more until requested. |
@TommyMurphyTM1234 thanks a lot! I hope it's not that big of a deal to run these... Sorry if it took more effort/time than one could expect :(. It will take some time to fix. At least some of these failures are easy to understand (not necessary easy to fix, though). For example: RepeatReadTest tests fail because test infrastructure expects only output of the underlying monitor command to be present in "monitor " statement. However for whatever abysmall reason OpenOCD adds target warnings to the resulting output (missing keep-alive messages, for example). This results in "invalid literal for int" exceptions to cause test failure. There is no consensus on how to fix this (@en-sc and me have a disagreement at the moment) - hopefully, this should be resolved soon. Other cases are more tricky and will requite more effort to debug. |
No problem. It was a bit tedious on the hardware that I had available for this but not a big deal - just slow! 🙂
That's interesting, thanks. Perhaps some changes to upstream OpenOCD functionality may be required in order to address some of these failures/exceptions?
Ok, thanks. |
yeah. I'm not sure why it was decided to send warnings along with the useful data for "monitor" commands sent by GDB. |
Could it be a |
Kind of but not exactly. OpenOCD sends data (reply) to GDB via dedicated channel (socket/pipe/whatever) via GDB serial protocol. So the output will go to the same sink anyway. That being said - probably warnings should be sent as "asynchronous notifications" that gdb serial protocol should support. I'm not quite sure what OpenOCD does at the moment. If it sends these warnings as a notification - we should have facilities to differentiate between these notifications and the actual command output on client side (that is GDB client). If not - this should be fixed. If these warnings are sent along with the actual output - this should be tackled too (in my opinion). The above still requires an investigation - I'm not ready to make a definitive conclusion yet. I'll post a separate notification once I get a more substantial understanding of what is going on. |
Ah - sorry - I was thinking of the "console" |
I'm not sure if this should be logged here or in the test suite repo - but I'll log it here for now. :-)
I have built the latest toolchain and OpenOCD and am running the basic smoke tests against Spike:
but have been getting random/sporadic failures - e.g. the following from three separate attempts to run the smoke tests:
MemorySampleMixed
: 20240421-125738-spike64-MemorySampleMixed.logRegisters
: 20240421-130859-spike64-Registers.logProgramSwWatchpoint
: 20240421-131950-spike64-ProgramSwWatchpoint.logDoes anybody know why this is happening?
In case it matters I'm doing this on a VirtualBox VM (host OS = Windows 10. guest OS = ZorinOS 17.1 Core) with what seems to be adequate resources (CPUs, memory, disk space etc.).
Maybe it's due to some local issue since the GitHib Action automated tests seem to be running OK?
The text was updated successfully, but these errors were encountered: