Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faketime tests stall when S3 support is enabled #4956

Open
jdblischak opened this issue May 7, 2024 · 12 comments
Open

faketime tests stall when S3 support is enabled #4956

jdblischak opened this issue May 7, 2024 · 12 comments
Assignees

Comments

@jdblischak
Copy link
Contributor

jdblischak commented May 7, 2024

x-ref https://app.shortcut.com/tiledb-inc/story/47373


In my nightly builds, if I build with S3 support (TILEDB_S3=ON), the faketime tests added in #4883 stall. They pass when S3 support is disabled.

xref: https://github.com/jdblischak/centralized-tiledb-nightlies/issues/5#issuecomment-2096691606

Below is a reproducible example I created in an Ubuntu Docker image. Note that I had to install more dependencies than those listed in the Prerequisites of BUILD.md.

docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes cmake curl g++ gcc git pkg-config tar unzip zip
git clone https://github.com/TileDB-Inc/TileDB.git
cd TileDB
git log -n 1 --oneline
## 9a4b88517 (HEAD -> dev, origin/dev, origin/HEAD) Remove deprecated APIs from tests, part 2. (#4951)

# without S3 support
cmake -B build-libtiledb \
  -D TILEDB_WERROR=ON \
  -D TILEDB_SERIALIZATION=ON \
  -D CMAKE_BUILD_TYPE=Release \
  -D VCPKG_TARGET_TRIPLET=x64-linux-release

cmake --build build-libtiledb -j $(nproc) --config Release

cmake --build build-libtiledb --target check --config Release
## 2: 0.066 s: C API: Test consolidation, fragments/commits out of order
## 2: 0.000 s: - sparse array
## 2:
## 2: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## 2: tiledb_unit is a Catch2 v3.4.0 host application.
## 2: Run with -? for options
## 2:
## 2: -------------------------------------------------------------------------------
## 2: C API: Test vacuuming leaves array dir in a consistent state
## 2: -------------------------------------------------------------------------------
## 2: /TileDB/test/src/unit-capi-consolidation.cc:7378
## 2: ...............................................................................
## 2:
## 2: /TileDB/test/src/unit-capi-consolidation.cc:4656: FAILED:
## 2:   REQUIRE( rc == (expect_fail ? (-1) : (0)) )
## 2: with expansion:
## 2:   0 == -1
## 2:
## 2: terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
## 2:   what():  filesystem error: cannot set permissions: No such file or directory [/TileDB/test/tiledb_test/test_consolidate_sparse_array/__fragments/__1715102401556_1715102401556_5b878658b88da5efa423d8eb94514abe_21]
## 2: /TileDB/test/src/unit-capi-consolidation.cc:4656: FAILED:
## 2:   {Unknown expression after the reported line}
## 2: due to a fatal error condition:
## 2:   SIGABRT - Abort (abnormal termination) signal
## 2:
## 2: 0.000 s: C API: Test vacuuming leaves array dir in a consistent state
## 2: ===============================================================================
## 2: test cases:     341 |     340 passed | 1 failed
## 2: assertions: 2798035 | 2798033 passed | 2 failed
## 2:
## 2/3 Test #2: tiledb_unit ......................Subprocess aborted***Exception:  51.68 sec

# I don't know what the above test failure is about. I haven't observed it
# in my nightlies

# with S3 support
cmake -B build-libtiledb-s3 \
  -D TILEDB_WERROR=ON \
  -D TILEDB_SERIALIZATION=ON \
  -D CMAKE_BUILD_TYPE=Release \
  -D VCPKG_TARGET_TRIPLET=x64-linux-release \
  -D TILEDB_S3=ON

cmake --build build-libtiledb-s3 -j $(nproc) --config Release

cmake --build build-libtiledb-s3 --target check --config Release
## [100%] Built target tests
## UpdateCTestConfiguration  from :/TileDB/build-libtiledb-s3/tiledb/test/DartConfiguration.tcl
## UpdateCTestConfiguration  from :/TileDB/build-libtiledb-s3/tiledb/test/DartConfiguration.tcl
## Test project /TileDB/build-libtiledb-s3/tiledb/test
## Constructing a list of tests
## Done constructing a list of tests
## Updating test list for fixtures
## Added 0 tests to meet fixture requirements
## Checking test dependency graph...
## Checking test dependency graph end
## test 1
##     Start 1: tiledb_timing_unit
##
## 1: Test command: /TileDB/build-libtiledb-s3/tiledb/test/tiledb_unit "--durations=yes" "[sub-millisecond]"
## 1: Environment variable modifications:
## 1:  FAKETIME=set:2020-12-24 20:30:00
## 1:  LD_PRELOAD=set:/TileDB/build-libtiledb-s3/vcpkg_installed/x64-linux-release/lib/libfaketime.so
## 1: Test timeout computed to be: 10000000

# it stalls at this point and never completes
@KiterLuc
Copy link
Contributor

KiterLuc commented May 9, 2024

@teo-tsirpanis can you take a look?

@teo-tsirpanis
Copy link
Member

Will take a look tomorrow.

@teo-tsirpanis
Copy link
Member

@jdblischak do you have credentials for S3 configured? It is likely that because S3 is enabled, tiledb_unit tries to connect and it stalls due to retrying. You can make the tests run on local files by passing the --vfs native option.

<opinion>
I don't like this behavior. tiledb_unit should by default run only on local files (or maybe even better MemFS but there are some issues that currently prevent it) and running tests on cloud services should be opt-in. We should also remove support for running tests on many cloud services at once; it has few valid use cases (you can just run tiledb_unit many times) and its existence complicates some code.
</opinion>

@jdblischak
Copy link
Contributor Author

do you have credentials for S3 configured?

No, and I'd prefer not to

It is likely that because S3 is enabled, tiledb_unit tries to connect and it stalls due to retrying.

That makes sense. Thanks for diagnosing the problem!

You can make the tests run on local files by passing the --vfs native option.

Can I pass that flag directly to CMake? Here is how I am currently invoking the TileDB tests:

cmake --build build-libtiledb --target check --config Release

# Is this right?
cmake --build build-libtiledb --target check --config Release --vfs native

@teo-tsirpanis
Copy link
Member

teo-tsirpanis commented May 10, 2024

It's not currently possible within CMake; you will have to manually run ./build-libtiledb/tiledb/test/tiledb_unit --vfs native.

@jdblischak
Copy link
Contributor Author

jdblischak commented May 10, 2024

you will have to manually run ./build-libtiledb/tiledb/test/tiledb_unit --vfs native

How can I manually build the tests in order to manually run them?

docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes cmake curl g++ gcc git pkg-config tar unzip zip
git clone https://github.com/TileDB-Inc/TileDB.git
cd TileDB
git log -n 1 --oneline
## 474fc1ef5 (HEAD -> dev, origin/dev, origin/HEAD) Migrate APIs out of StorageManager: array_get_encryption. (#4950)

# with S3 support
cmake -B build-libtiledb-s3 \
  -D TILEDB_WERROR=ON \
  -D TILEDB_SERIALIZATION=ON \
  -D CMAKE_BUILD_TYPE=Release \
  -D VCPKG_TARGET_TRIPLET=x64-linux-release \
  -D TILEDB_S3=ON

cmake --build build-libtiledb-s3 -j $(nproc) --config Release

./build-libtiledb/tiledb/test/tiledb_unit --vfs native
## bash: ./build-libtiledb/tiledb/test/tiledb_unit: No such file or directory

I tried searching the CMake files, but didn't have much luck. I found where the custom target check is defined, but it just calls cmake with --target check, which is circular:

# make check
if (TILEDB_TESTS)
add_custom_target(check
COMMAND ${CMAKE_COMMAND} --build . --target check --config $<CONFIG>
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/tiledb
)
endif()

I also tried the targets tests and ordinary_unit_tests, but neither of those worked at all.

TileDB/CMakeLists.txt

Lines 550 to 558 in 474fc1e

add_custom_target(ordinary_unit_tests)
add_dependencies(ordinary_unit_tests ${Unit_Tests})
add_dependencies(all_unit_tests ordinary_unit_tests)
add_dependencies(all_unit_tests experimental_unit_tests)
# ----------------------------------
# Target `tests` build and runs all test executables
add_custom_target(tests)
add_dependencies(tests all_unit_tests)

cmake --build build-libtiledb-s3 --target tests --config Release
## gmake: *** No rule to make target 'tests'.  Stop.

cmake --build build-libtiledb-s3 --target ordinary_unit_tests --config Release
## gmake: *** No rule to make target 'ordinary_unit_tests'.  Stop.

@teo-tsirpanis
Copy link
Member

it just calls cmake with --target check, which is circular

CMake is being called in the tiledb subdirectory. You will have to first run a build on the outer build directory, and then to build/tiledb. Here's an example of how we do it in CI:

cmake .. \
-DTILEDB_AZURE=${TILEDB_AZURE}\
-DTILEDB_GCS=${TILEDB_GCS} \
-DTILEDB_S3=${TILEDB_S3} \
-DTILEDB_SERIALIZATION=${TILEDB_SERIALIZATION} \
-DTILEDB_ASSERTIONS=${TILEDB_ASSERTIONS}
make -j4
# Build all unit tests
make -C tiledb tests -j4

@jdblischak
Copy link
Contributor Author

You will have to first run a build on the outer build directory, and then to build/tiledb

@teo-tsirpanis Thanks for the explanation! I was able to build and execute the tests locally. However, there were 3 failed tests. Are these known failures?

docker run --rm -it ubuntu:22.04

# Setup
apt-get update
apt-get install --yes cmake curl g++ gcc git pkg-config tar unzip zip
git clone https://github.com/TileDB-Inc/TileDB.git
cd TileDB
git log -n 1 --oneline
## 8de7d1ca4 (HEAD -> dev, origin/dev, origin/HEAD) Add v2_23_0 arrays to backward compatibility matrix. (#4965)

# with S3 support
cmake -B build-libtiledb-s3 \
  -D TILEDB_WERROR=ON \
  -D TILEDB_SERIALIZATION=ON \
  -D CMAKE_BUILD_TYPE=Release \
  -D VCPKG_TARGET_TRIPLET=x64-linux-release \
  -D TILEDB_S3=ON

cmake --build build-libtiledb-s3 -j $(nproc) --config Release

# Build unit tests
make -C build-libtiledb-s3/tiledb tests -j $(nproc)
## [100%] Building CXX object test/CMakeFiles/tiledb_unit.dir/src/unit.cc.o
## [100%] Built target ordinary_unit_tests
## [100%] Built target all_unit_tests
## [100%] Linking CXX executable tiledb_unit
## [100%] Built target tiledb_unit
## [100%] Built target tests

# Run unit tests
./build-libtiledb-s3/tiledb/test/tiledb_unit --vfs native
## Randomness seeded to: 158965112
##
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## tiledb_unit is a Catch2 v3.4.0 host application.
## Run with -? for options
##
## -------------------------------------------------------------------------------
## C API: Test vacuuming leaves array dir in a consistent state
## -------------------------------------------------------------------------------
## /TileDB/test/src/unit-capi-consolidation.cc:7378
## ...............................................................................
##
## /TileDB/test/src/unit-capi-consolidation.cc:4656: FAILED:
##   REQUIRE( rc == (expect_fail ? (-1) : (0)) )
## with expansion:
##   0 == -1
##
## terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
##   what():  filesystem error: cannot set permissions: No such file or directory [/TileDB/tiledb_test/test_consolidate_sparse_array/__fragments/__1715616656780_1715616656780_71deef232b74224ba6dfd2553ec29ece_21]
## /TileDB/test/src/unit-capi-consolidation.cc:4656: FAILED:
##   {Unknown expression after the reported line}
## due to a fatal error condition:
##   SIGABRT - Abort (abnormal termination) signal
##
## ===============================================================================
## test cases:     341 |     340 passed | 1 failed
## assertions: 3084887 | 3084885 passed | 2 failed
##
## Aborted

@jdblischak
Copy link
Contributor Author

Next I tried running the tests this way in my nightly setup.

Without S3 support, the tests passed

All tests passed (13941361 assertions in 1427 test cases)

However, when I enabled S3 support, there were even more test failures than when I ran it locally above:

test cases:     1434 |     1409 passed | 25 failed
assertions: 13863516 | 13863491 passed | 25 failed

@teo-tsirpanis
Copy link
Member

teo-tsirpanis commented May 13, 2024

Apparently --vfs has no effect in certain newly added tests, see footnote of #4126 (comment). (tracked internally in SC-47373)

@jdblischak
Copy link
Contributor Author

Ok, so it seems that currently it is not possible to run the test suite with TILEDB_S3=ON if you haven't configured S3 credentials.

@jdblischak
Copy link
Contributor Author

Ok, for now I enabled S3 support in order to run the tiledbvcf-py tests and stopped running the libtiledb tests (jdblischak/centralized-tiledb-nightlies@3febc7b). Ideally in the future I can update this to run a subset of the tests that don't require S3 authentication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants