-
Notifications
You must be signed in to change notification settings - Fork 50
Core Slice
Slices are a mechanism to access a tuple member across all tuples in an AoSoA as if it were one large multidimensional array. We will overview the way in which a user can generate a slice and access it's data. Slices, like the AoSoA from which they are derived, can be accessed via 1-dimensional and 2-dimensional indices with the latter exposing vectorizable loops over the inner elements of each SoA in the data structure.
A slice does not copy the AoSoA data - rather it simply points to the data as an unmanaged shallow copy. Because there is no reference counting associated with a slice, the user must take care not to delete the AoSoA from which the slice is derived before they are done using the slice. In addition, if the slice memory is changed in any way (i.e. from resizing, changing the capacity, or any other types of changes to the allocation) the slice is no longer valid and must be recreated from the new AoSoA data.
A member slice allows for member data to be accessed with a simpler syntax:
enum FieldNames { A = 0,
B,
C };
using DataTypes = Cabana::MemberDataTypes<double[2], // A
float, // B
int>; // C
Cabana::AoSoA<DataTypes,Kokkos::HostSpace> aosoa( 6 );
auto slice_A = slice<A>( aosoa );
auto slice_B = slice<B>( aosoa );
auto slice_C = slice<C>( aosoa );
for ( int i = 0; i < aosoa.size(); ++i )
{
for ( int n = 0; n < 2; ++n )
slice_A( i, n ) = 1.2;
slice_B( i ) = 3.4;
slice_C( i ) = 9;
}
In addition to simpler data access syntax, it allows for the composition of interfaces and function signatures which operate on only subsets of AoSoA data. For example, say we wanted to write a function that takes in particle velocities, a time step size, and updates the particle positions. With member slices we can now write a function like this:
template<class VelocitySlice, class PositionSlice>
void updatePosition( const double delta_t,
const VelocitySlice& velocity,
PositionSlice& position )
{
int space_dim = velocity.extent(2);
for ( int i = 0; i < velocity.size(); ++i )
for ( int d = 0; d < space_dim; ++d )
position( i, d ) += velocity( i, d ) * delta_t;
}
By writing functions such as this with member slices instead of full AoSoA data structures we alleviate three important issues: not knowing which member variables in the user's AoSoA represent position and velocity, using member slices from possibly different AoSoA's (assuming they are compatible), and reduction of template parameters.
Slices, like the AoSoA from which they are derived, can be accessed via 1-dimensional and 2-dimensional indices with the later exposing vectorizable loops over the inner elements of each SoA in the data structure.
Let's initialize the data using the 2D indexing scheme. Slice data can
be accessed in 2D using the access()
function. Note that both the SoA
index and the array index are passed to this function. Also note that the slice object has a similar interface to the AoSoA for accessing the total number of tuples in the data structure and the array sizes.
using DataTypes = Cabana::MemberTypes<double[3][3],
float[4],
int>;
using MemorySpace = Kokkos::HostSpace;
int num_tuple = 5;
Cabana::AoSoA<DataTypes,MemorySpace,VectorLength> aosoa( num_tuple );
auto slice_0 = slice<0>( aosoa );
auto slice_1 = slice<1>( aosoa );
auto slice_2 = slice<2>( aosoa );
for ( int s = 0; s < slice_0.numSoA(); ++s )
{
for ( int i = 0; i < 3; ++i )
for ( int j = 0; j < 3; ++j )
for ( int a = 0; a < slice_0.arraySize(s); ++a )
slice_0.access(s,a,i,j) = 1.0 * (a + i + j);
for ( int i = 0; i < 4; ++i )
for ( int a = 0; a < slice_0.arraySize(s); ++a )
slice_1.access(s,a,i) = 1.0 * (a + i);
for ( int a = 0; a < slice_0.arraySize(s); ++a )
slice_2.access(s,a) = a + 1234;
}
Now let's do the same thing but this time we will access the data tuple-by-tuple using 1-dimensional indexing rather than using the 2-dimensional indexing. As with the AoSoA, the upside to this approach is that the indexing is easy. The downside is that we lose the stride-1 vector-length loops over the array index. Note that the slice data access syntax in 1D uses operator()
.
for ( int t = 0; t < num_tuple; ++t )
{
for ( int i = 0; i < 3; ++i )
for ( int j = 0; j < 3; ++j )
slice_0(t,i,j) = ...;
for ( int i = 0; i < 4; ++i )
slice_1(t,i) = ...;
slice_2(t) = ...;
}
Slice data members are of trivial type and can be multidimensional. The number of dimensions in a given member data type is defined as its rank and the size of a given dimension is defined as its extent. For example, consider the following particle data types:
using DataTypes = Cabana::MemberTypes<double[3][3], // member 0
float[2], // member 1
int>; // member 2
int num_tuple = 12;
Cabana::AoSoA<DataTypes,MemorySpace,8> aosoa( num_tuple );
In this case member 0 is of rank 2 and dimensions 0 and 1 both have an extent
of 3. Member 1 is of rank 1 it its 0 dimension has an extent of 2. Member 2 is
a scalar and is therefore of rank 0. A Slice
provides access to these values
through the Slice::rank
and Slice::extent
functions in the context of its 2D decomposition meaning that two extra data dimensions are added to the front of the list - one for the struct index and one for the array index. For example, for the member data types above:
int m0_rank = slice_0.rank(); // returns 4
int m0_extent0 = slice_0.extent( 0 ); // returns 2 (# of SoAs)
int m0_extent1 = slice_0.extent( 1 ); // returns 8 (vector length)
int m0_extent2 = slice_0.extent( 2 ); // returns 3
int m0_extent3 = slice_0.extent( 3 ); // returns 3
int m1_rank = slice_1.rank(); // returns 3
int m1_extent0 = slice_1.extent( 0 ); // returns 2 (# of SoAs)
int m1_extent1 = slice_1.extent( 1 ); // returns 8 (vector length)
int m1_extent2 = slice_1.extent( 2 ); // returns 2
int m2_rank = slice_2.rank(); // returns 2;
int m2_extent0 = slice_2.extent( 0 ); // returns 2 (# of SoAs)
int m2_extent1 = slice_2.extent( 1 ); // returns 8 (vector length)
For all data types, using Cabana::Slice::operator()
or Cabana::Slice::access()
with an incorrect number of dimension arguments will cause a compile error. Consider using accessing data in the 3x3 member defined above:
double s_1 = slice_0(i,2); // Error! Too few arguments.
double s_2 = slice_0(i,0,1); // No errors.
double s_3 = slice_0(i,1,1,0); // Error! Too many arguments.
In addition to compile time checking of multidimensional data access users
should also have an expectation of run time checks in a debug build that
ensure all memory accesses are within bounds for a given call to
Cabana::Slice::operator()
or Cabana::Slice::access()
.
For the composition of C and Fortran interfaces as well as for potential
(although not recommended) use by expert C++ users, access to the raw AoSoA data
block and data for individual members is exposed through a general and
template-free pointer interface in Slices. A pointer to the first element of each member
is accessed through the Slice::data()
function which users can then cast to the appropriate type for that data
member.
To allow users to stream through a single member of the AoSoA data block via a Slice and operate on the data for a single member as if the data block were entirely composed of the member's data type we have introduced the concept of strides. The first stride gives the distance between the beginning of each inner array of a given member in terms of the number of elements for that member's type. The remaining strides give the distance between individual dimensions of a data member, including the array dimension. Using these strides, a user can access the member data block in a given struct, operate on the member array in that struct, and then use the first stride to move forward to the next block. This is possible because the particle data is composed of trivial types and therefore the data within each struct is aligned on a byte boundary which is a multiple of all of the types within the tuple. The total number of bytes between each struct is the same, however, when those bytes are interpreted to be data of the same type as the member variable, the effective number of member variable elements represented by those bytes changes.
To visualize this assume we have defined a particle structure as follows:
enum FieldNames { A = 0,
B,
C };
using DataTypes = Cabana::MemberDataTypes<double[2], // A
float, // B
int>; // C
Cabana::AoSoA<DataTypes,Kokkos::HostSpace> aosoa( 6 );
A visual representation of this data layout is given in the following figure:
AoSoA pointers and strides via the Slice. The raw data block of an AoSoA can be interpreted as a raw pointer to a block of data of the same type as any one of its members. The Slice provides pointers to the first element the member it wraps as well as the number of elements of that member's type between the start of each struct.
In the case of this example, all of the data will be aligned on 8-byte
boundaries within the structs as our largest data member is of type
double
. If member A
would have been a single precision float
as
well, the data would have been aligned on 4-byte boundaries. The total size of
the struct is 48 bytes (4 doubles, 2 floats, and 2 ints) and therefore so is
the byte size between structs. Based on this, the stride for each member is
the number of elements of that member's type that would consume 48 bytes of
memory. For member A
this is 6, and for members B
and C
this
is 12.
Using pointers and strides a C-style approach may be used to access the data:
double* A_ptr = slice_A.data();
float* B_ptr = slice_B.data();
int* C_ptr = slice_C.data();
int A_stride_0 = slice_A.stride( 0 );
int B_stride_0 = slice_B.stride( 0 );
int C_stride_0 = slice_C.stride( 0 );
int num_soa = slice_A.extent( 0 );
int vector_length = slice_A.extent( 1 );
int A_dim_2 = slice_A.extent( 2 );
int A_stride_2 = slice_A.stride( 2 );
int array_size = 2;
for ( int s = 0; s < num_soa; ++s )
{
for ( int i = 0; i < vector_length; ++i )
{
for ( int n = 0; n < A_dim; ++n )
A_ptr[ s * A_stride_0 + n * A_stride_2 + i ] = 1.2;
B_ptr[ s * B_stride_0 + i ] = 3.4;
C_ptr[ s * C_stride_0 + i ] = 9;
}
}
Also note in this example the use of multidimensional data for member
A
. Let's look more closely at the indexing into member A
in the
code above:
A_ptr[ s * A_stride_0 + n * A_stride_2 + i ] = 1.2;
Here we can think of this index as broken into three pieces. The first piece,
s * A_stride_0
calculates the offset from the first overall element of
member A
to the first element of member A
in the current struct
s
. The second piece, n * A_stride_0
, calculates the offset from the
first element of member A
in the current struct the first array value for element n
in the SoA. The last piece, i
, indicates stride-1 access to the array dimension (i.e. slice_A.stride(1) == 1
).
template<typename DataType,
typename MemorySpace,
typename MemoryAccessType,
int VectorLength,
int Stride>
class Slice
NOTE: Arguments should be automatically generated/implied by the internal Cabana accessors which generate slices for you. It's unlikely you want to spend a lot of effort making your own slices without auto
type detection
- Slice Tutorial
- Slice Unit Test
- Slice Advanced Tutorial - OpenMP atomic
- Slice Advanced Tutorial - Cuda atomic
using T0 = float[3];
using T1 = double;
using member_types = Cabana::MemberTypes<T0,T1>;
using memspace = Kokkos::HostSpace;
const int vector_length = 16;
Cabana::AoSoA<member_types,memspace,vector_length> particle_list( num_particles );
// Get slice to first parameter
auto slice_0 = slice<0>( particle_list );
Usage: To access the particle data stored in an AoSoA. Most notably the class Slice
is returned by the accessor in Cabana_AoSoA.hpp
NOTE: This aims to document the user's view of Slice
as returned by the AoSoA accessor slice()
, rather then the implementation details of it under the covers.
This is part of the Programming Guide series
Cabana - A Co-Designed Library for Exascale Particle Simulations