refactor our test separation and how we run them #12458

nagisa · 2024-11-14T14:44:21Z

This fundamentally removes the differentiation between "integration" and "unit" tests from the test runner perspective, as well as removes the reliance on the "expensive-tests" compile-time feature.

Instead tests are categorized by the underlying reason for which we had the separation in the first place -- test execution speed. These are now annotated with slow_test_ and ultraslow_test_ prefixes, and their execution time is enforced by time limits in nextest.toml.

By using nextest filters we can cleanly filter out tests from these three categories to be run in various contexts. For instance, maybe quick checks locally don't need to run the slow tests every time -- just nextest by default now takes just 9 seconds (on my machine) to run all the tests. Whereas previously (due to e.g. estimator smoke test) it would take at least a minute.

Then a just nextest-slow is added in the rarer cases where one might want to run the full suite that runs on CI. CI will run these mildly slower tests for each PR.

Then what remains are the ultraslow tests which only run on nayduck and can take quite some time to complete. These can now be trivially run locally as well with just nextest-all without necessarily rebuilding the entire project or "unignoring" tests that are marked as ignored for other reasons (like for instance because they're just broken.)

This fundamentally removes the differentiation between "integration" and "unit" tests from the test runner perspective, as well as removes the reliance on the "expensive-tests" compile-time feature. Instead tests are categorized by the underlying reason for which we had the separation in the first place -- test execution speed. These are now annotated with `slow_test_` and `ultraslow_test_` prefixes, and their execution time is enforced by time limits in `nextest.toml`. By using `nextest` filters we can cleanly filter out tests from these three categories to be run in various contexts. For instance, maybe quick checks locally don't need to run the slow tests every time -- `just nextest` by default now takes just 9 seconds (on my machine) to run all the tests. Whereas previously (due to e.g. estimator smoke test) it would take at least a minute. Then a `just nextest-slow` is added in the rarer cases where one might want to run the full suite that runs on CI. CI will run these mildly slower tests for each PR. Then what remains are the ultraslow tests which only run on nayduck and can take quite some time to complete. These can now be trivially run locally as well with `just nextest-all` without necessarily rebuilding the entire project or "unignoring" tests that are marked as ignored for other reasons (like for instance because they're just broken.)

nagisa · 2024-11-14T14:45:16Z

cc @wacban -- I pushed out a draft cause I gotta run now, but feedback would be neat if this addresses your problems with the bls tests.

wacban

It's great, thanks!

Did you check that check_nightly.py still does what it's supposed to do?

nit: ultra_slow_ would be a bit easier on my eyes than ultraslow_

wacban · 2024-11-15T10:59:03Z

.config/nextest.toml

 [profile.default]
-slow-timeout = { period = "60s", terminate-after = 3, grace-period = "0s" }
+slow-timeout = { period = "5s", terminate-after = 3, grace-period = "1s" }
+default-filter = "not (test(/^(.*::slow_test|slow_test)/) | test(/^(.*::ultraslow_test|ultraslow_test)/))"


Why do you need two regexes, one with .*:: and one without it?

My understanding is that the tests that are at the package root do not have any prefix to their name, but I haven't actually checked this.

nagisa · 2024-11-15T11:42:11Z

Did you check that check_nightly.py still does what it's supposed to do?

You can find it failing even now at https://github.com/near/nearcore/actions/runs/11845706954/job/33011766898?pr=12458 due to me having missed to include one of the tests!

That said, one of the things we could do here at this point is to get rid of expensive.txt as cargo nextest list can list all the tests that match the ultraslow pattern without having to write it all down explicitly.

codecov · 2024-11-15T13:12:29Z

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 69.85%. Comparing base (f4db1e0) to head (e129fbf).
Report is 34 commits behind head on master.

Files with missing lines	Patch %	Lines
...e-params-estimator/estimator-warehouse/src/main.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #12458       +/-   ##
===========================================
+ Coverage   39.74%   69.85%   +30.11%     
===========================================
  Files         842      838        -4     
  Lines      170756   169343     -1413     
  Branches   170756   169343     -1413     
===========================================
+ Hits        67872   118303    +50431     
+ Misses      98635    45793    -52842     
- Partials     4249     5247      +998

Flag	Coverage Δ
backward-compatibility	`0.16% <ø> (+<0.01%)`	⬆️
db-migration	`0.16% <ø> (+<0.01%)`	⬆️
genesis-check	`1.29% <ø> (+0.02%)`	⬆️
integration-tests	`?`
linux	`69.16% <75.00%> (+30.47%)`	⬆️
linux-nightly	`69.42% <75.00%> (+30.13%)`	⬆️
macos	`51.01% <100.00%> (?)`
pytests	`1.60% <ø> (+0.02%)`	⬆️
sanity-checks	`1.40% <ø> (+0.02%)`	⬆️
unittests	`69.68% <75.00%> (?)`
upgradability	`0.21% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wacban · 2024-11-15T15:57:55Z

I've seen this nayduck error, the fix for me was to use with underscore instead of dash in one place in the expensive.txt:
expensive estimator-warehouse estimator_warehouse tests::ultra_slow_test_full_estimator

nagisa · 2024-11-15T21:11:50Z

Umm… so apparently nayduck's tests are limited to 3 minutes?? So much for ultra slow 🤔 But wait, there are others that take 12… how does that work?

wacban · 2024-11-16T09:54:13Z

Umm… so apparently nayduck's tests are limited to 3 minutes?? So much for ultra slow 🤔 But wait, there are others that take 12… how does that work?

From expensive.txt it seems like you can increase the default timeout like so:

expensive --timeout=1200 near-chain near_chain tests::garbage_collection::test_gc_not_remove_fork_large

fix ci workflow

474e20e

nagisa force-pushed the refactors-test branch from 29c7497 to 40e9d3a Compare November 14, 2024 21:20

fix format and prefixes

ee8768f

wacban approved these changes Nov 15, 2024

View reviewed changes

nagisa added 2 commits November 15, 2024 14:18

add estimator-warehouse to nayduck

46a21b1

fix coverage uploading

e0593f4

nagisa force-pushed the refactors-test branch from 40e9d3a to e0593f4 Compare November 15, 2024 12:19

nagisa marked this pull request as ready for review November 15, 2024 12:20

nagisa requested a review from a team as a code owner November 15, 2024 12:20

nagisa requested a review from saketh-are November 15, 2024 12:20

s/ultraslow/ultra_slow/

fcfcefc

nagisa force-pushed the refactors-test branch from 79497b0 to fcfcefc Compare November 15, 2024 12:33

restore coverage notifications

38a1404

nagisa added this pull request to the merge queue Nov 15, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 15, 2024

fix nayduck??

e129fbf

nagisa enabled auto-merge November 15, 2024 17:39

nagisa added this pull request to the merge queue Nov 15, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor our test separation and how we run them #12458

refactor our test separation and how we run them #12458

nagisa commented Nov 14, 2024

nagisa commented Nov 14, 2024

wacban left a comment

wacban Nov 15, 2024

nagisa Nov 15, 2024

nagisa commented Nov 15, 2024 •

edited

Loading

codecov bot commented Nov 15, 2024 •

edited

Loading

wacban commented Nov 15, 2024

nagisa commented Nov 15, 2024 •

edited

Loading

wacban commented Nov 16, 2024

refactor our test separation and how we run them #12458

Are you sure you want to change the base?

refactor our test separation and how we run them #12458

Conversation

nagisa commented Nov 14, 2024

nagisa commented Nov 14, 2024

wacban left a comment

Choose a reason for hiding this comment

wacban Nov 15, 2024

Choose a reason for hiding this comment

nagisa Nov 15, 2024

Choose a reason for hiding this comment

nagisa commented Nov 15, 2024 • edited Loading

codecov bot commented Nov 15, 2024 • edited Loading

Codecov Report

wacban commented Nov 15, 2024

nagisa commented Nov 15, 2024 • edited Loading

wacban commented Nov 16, 2024

nagisa commented Nov 15, 2024 •

edited

Loading

codecov bot commented Nov 15, 2024 •

edited

Loading

nagisa commented Nov 15, 2024 •

edited

Loading