Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor our test separation and how we run them #12458

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nagisa
Copy link
Collaborator

@nagisa nagisa commented Nov 14, 2024

This fundamentally removes the differentiation between "integration" and "unit" tests from the test runner perspective, as well as removes the reliance on the "expensive-tests" compile-time feature.

Instead tests are categorized by the underlying reason for which we had the separation in the first place -- test execution speed. These are now annotated with slow_test_ and ultraslow_test_ prefixes, and their execution time is enforced by time limits in nextest.toml.

By using nextest filters we can cleanly filter out tests from these three categories to be run in various contexts. For instance, maybe quick checks locally don't need to run the slow tests every time -- just nextest by default now takes just 9 seconds (on my machine) to run all the tests. Whereas previously (due to e.g. estimator smoke test) it would take at least a minute.

Then a just nextest-slow is added in the rarer cases where one might want to run the full suite that runs on CI. CI will run these mildly slower tests for each PR.

Then what remains are the ultraslow tests which only run on nayduck and can take quite some time to complete. These can now be trivially run locally as well with just nextest-all without necessarily rebuilding the entire project or "unignoring" tests that are marked as ignored for other reasons (like for instance because they're just broken.)

This fundamentally removes the differentiation between "integration"
and "unit" tests from the test runner perspective, as well as removes
the reliance on the "expensive-tests" compile-time feature.

Instead tests are categorized by the underlying reason for which we had
the separation in the first place -- test execution speed. These are now
annotated with `slow_test_` and `ultraslow_test_` prefixes, and their
execution time is enforced by time limits in `nextest.toml`.

By using `nextest` filters we can cleanly filter out tests from these
three categories to be run in various contexts. For instance, maybe
quick checks locally don't need to run the slow tests every time --
`just nextest` by default now takes just 9 seconds (on my machine) to
run all the tests. Whereas previously (due to e.g. estimator smoke test)
it would take at least a minute.

Then a `just nextest-slow` is added in the rarer cases where one might
want to run the full suite that runs on CI. CI will run these mildly
slower tests for each PR.

Then what remains are the ultraslow tests which only run on nayduck and
can take quite some time to complete. These can now be trivially run
locally as well with `just nextest-all` without necessarily rebuilding
the entire project or "unignoring" tests that are marked as ignored for
other reasons (like for instance because they're just broken.)
@nagisa
Copy link
Collaborator Author

nagisa commented Nov 14, 2024

cc @wacban -- I pushed out a draft cause I gotta run now, but feedback would be neat if this addresses your problems with the bls tests.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great, thanks!

Did you check that check_nightly.py still does what it's supposed to do?

nit: ultra_slow_ would be a bit easier on my eyes than ultraslow_

[profile.default]
slow-timeout = { period = "60s", terminate-after = 3, grace-period = "0s" }
slow-timeout = { period = "5s", terminate-after = 3, grace-period = "1s" }
default-filter = "not (test(/^(.*::slow_test|slow_test)/) | test(/^(.*::ultraslow_test|ultraslow_test)/))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need two regexes, one with .*:: and one without it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the tests that are at the package root do not have any prefix to their name, but I haven't actually checked this.

@nagisa
Copy link
Collaborator Author

nagisa commented Nov 15, 2024

Did you check that check_nightly.py still does what it's supposed to do?

You can find it failing even now at https://github.com/near/nearcore/actions/runs/11845706954/job/33011766898?pr=12458 due to me having missed to include one of the tests!

That said, one of the things we could do here at this point is to get rid of expensive.txt as cargo nextest list can list all the tests that match the ultraslow pattern without having to write it all down explicitly.

@nagisa nagisa marked this pull request as ready for review November 15, 2024 12:20
@nagisa nagisa requested a review from a team as a code owner November 15, 2024 12:20
Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 69.85%. Comparing base (f4db1e0) to head (e129fbf).
Report is 34 commits behind head on master.

Files with missing lines Patch % Lines
...e-params-estimator/estimator-warehouse/src/main.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #12458       +/-   ##
===========================================
+ Coverage   39.74%   69.85%   +30.11%     
===========================================
  Files         842      838        -4     
  Lines      170756   169343     -1413     
  Branches   170756   169343     -1413     
===========================================
+ Hits        67872   118303    +50431     
+ Misses      98635    45793    -52842     
- Partials     4249     5247      +998     
Flag Coverage Δ
backward-compatibility 0.16% <ø> (+<0.01%) ⬆️
db-migration 0.16% <ø> (+<0.01%) ⬆️
genesis-check 1.29% <ø> (+0.02%) ⬆️
integration-tests ?
linux 69.16% <75.00%> (+30.47%) ⬆️
linux-nightly 69.42% <75.00%> (+30.13%) ⬆️
macos 51.01% <100.00%> (?)
pytests 1.60% <ø> (+0.02%) ⬆️
sanity-checks 1.40% <ø> (+0.02%) ⬆️
unittests 69.68% <75.00%> (?)
upgradability 0.21% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nagisa nagisa added this pull request to the merge queue Nov 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 15, 2024
@wacban
Copy link
Contributor

wacban commented Nov 15, 2024

I've seen this nayduck error, the fix for me was to use with underscore instead of dash in one place in the expensive.txt:
expensive estimator-warehouse estimator_warehouse tests::ultra_slow_test_full_estimator

@nagisa nagisa added this pull request to the merge queue Nov 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 15, 2024
@nagisa
Copy link
Collaborator Author

nagisa commented Nov 15, 2024

Umm… so apparently nayduck's tests are limited to 3 minutes?? So much for ultra slow 🤔 But wait, there are others that take 12… how does that work?

@wacban
Copy link
Contributor

wacban commented Nov 16, 2024

Umm… so apparently nayduck's tests are limited to 3 minutes?? So much for ultra slow 🤔 But wait, there are others that take 12… how does that work?

From expensive.txt it seems like you can increase the default timeout like so:

expensive --timeout=1200 near-chain near_chain tests::garbage_collection::test_gc_not_remove_fork_large

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants