Skip to content

Core AoSoA

Sam Reeve edited this page May 18, 2023 · 2 revisions

Overview

An AoSoA represents tuples and their data via an array-of-structs-of-arrays. Tuples in Cabana are not normally represented individually by a singular data structure (although Cabana::Tuple is available for convenience). Rather, they are always represented in groups within a data structure known as AoSoA. An AoSoA is a generalization of the more commonly known Struct-of-Arrays (SoA) and Array-of-Structs (AoS) with both being common choices in practice depending on the type of computation and the architecture on which the simulation is performed.

First consider the AoS data layout. In this case we can define a Particle data structure (the struct) to hold our particle information. We will define a particle with three data members: a double precision velocity, a single precision position, and an integer material id:

   struct Particle
   {
       double vel_x;
       double vel_y;
       double vel_z;
       float pos_x;
       float pos_y;
       float pos_z;
       int matid;
   };

An AoS is then very simply an array of the struct (e.g. std::vector<Particle>). This particle data structure fits well with object-oriented programming paradigm, and is commonly employed in many legacy particle codes. From a programming point of view this is very convenient as the data for a single particle is encapsulated in one data structure. For example, basic operations such as sorting can be performed with many standard library routines and communicating particles between MPI ranks is simple as the contiguous nature of particle data allows for easy serialization.

An SoA is an inversion of the AoS principle. We re-write our Particle struct from before to now be a Struct-of-Arrays, e.g.:

   struct SoA
   {
       double vel_x[ARRAY_SIZE];
       double vel_y[ARRAY_SIZE];
       double vel_z[ARRAY_SIZE];
       float pos_x[ARRAY_SIZE];
       float pos_y[ARRAY_SIZE];
       float pos_z[ARRAY_SIZE];
       int matid[ARRAY_SIZE];
   };

Here each data member is an array holding the data for a few particles instead of an individual particle. In the context of performance, looping over this data structure will incur a very different memory access pattern than that of an AoS (e.g. std::vector<Particle>).

An AoSoA is a generalization of both of these principles. It is simply an array of SoA data structures (e.g. std::vector<SoA>). Conveniently, this type of data structure encapsulates both AoS and SoA memory access patterns by simply altering the ARRAY_SIZE:

  • If ARRAY_SIZE is 1, the AoSoA is identical to an AoS.
  • If ARRAY_SIZE is equal to the number of tuples, it is identical to an SoA.

When ARRAY_SIZE is somewhere in between 1 and the number of particles, different types of memory access patterns can be exposed allowing for performance tuning both on different architectures and in different types of computational kernels. In addition, the 2-dimensional nature of this data structure exposes opportunities for additional types of parallelism on architectures that support 2-dimensional thread indexing concepts.

To further elucidate the difference between the data structures consider the following figure:

AoSoA Layout A comparison of data layouts in Array-of-Structs (AoS), Struct-of-Arrays (SoA), and Array-of-Structs-of-Arrays (AoSoA). In AoS, particle data (x,y,w) is represented in a contiguous block for every particle. In SoA, there is a contiguous array of x, an array of y, and an array of w, each of the size of the number of particles. In AoSoA, each component is stored in smaller contiguous chunks (perhaps the size of a vectorization unit).

From a performance and usability point of view, there is a potential benefit to all three layouts. In the case of AoS, this allows for a very clean set of abstractions on a particle-by-particle basis, ease of programmability, and cache efficiency. In the case of SoA, the large contiguous chunks of memory for each variable may have potential benefit both in terms of contiguous memory accesses as well as efficient vectorization. AoSoA can be configured to support vectorization through smaller contiguous blocks of member data variables as well as potential improvements to cache efficiency due to the smaller sizes of those blocks.

Within Cabana, the AoSoA is represented by the Cabana::AoSoA class. Next we will overview this class and its API.

Defining Tuple Data Types

Users may assign any data to their tuples within Cabana as long as that data is composed of trivial types and classes. A trivial type is a type whose:

  • storage is contiguous (trivially copyable),
  • which only supports static default initialization (trivially default constructible), either cv-qualified or not.

Trivial types include scalar types, trivial classes and arrays of any such types. A trivial class is a class (defined with class, struct or union) that is both trivially default constructible and trivially copyable, which implies that:

  • uses the implicitly defined default copy and move constructors, copy and move assignments, and destructor,
  • has no virtual members,
  • has no non-static data members with brace- or equal- initializers,
  • its base class and non-static data members (if any) are themselves also trivial types.

Cabana provides a type to users, Cabana::MemberTypes, which is then used to define the data associated with a tuple. For example, using our particle example from above we would define the following types of tuple data:

   using DataTypes = Cabana::MemberTypes<double, // vel_x
                                         double, // vel_y
                                         double, // vel_z
                                         float,  // pos_x
                                         float,  // pos_y
                                         float,  // pos_z
                                         int>;   // matid

There are a few things to note in this definition. First, Cabana::MemberTypes is itself simply just a list of types and the comments associated with each entry in the list indicates which type relates to which tuple variable. Second, the types of a larger size (e.g. sizeof(double) > sizeof(float)) are listed first. We recommend listing larger data types first and like types sequentially to ensure a minimal amount of storage of tuple data. Storage sizes of tuple data (even for the SoA and AoS examples above) can vary depending on the order of the data members due to padding to satisfy alignment requirements.

One benefit of allowing for general trivial data types to be associated with tuples is that we can also define multidimensional data on tuples. For example, we can re-write our particle data types above to use multidimensional data instead of representing each data component individually:

   using DataTypes = Cabana::MemberTypes<double[3], // velocity
                                         float[3],  // position
                                         int>;      // matid

This provides two distinct advantages over the scalar data approach. First, it reduces the overall amount of code as multiple components of the same variable are now concatenated. This also reduces the amount of code needed to implement work units as operations on individual components can be replaced with loops over components. Second, this can offer potential performance advantages as all components for a single variable can be accessed in a single memory transaction.

Note: See the current implementation of Cabana::AoSoA for the maximum number of supported tuple data dimensions.

Vector Length

The size of the static arrays within each struct of a Cabana::AoSoA are defined via a VectorLength template parameter. This vector length is simply a compile-time integral constant which defines the static size of each SoA within the AoSoA. Users may define their own vector length as long is a power of 2. If the user does not specify the vector length, the library will auto-select a (hopefully) reasonably performing vector length based on the vector length of the hardware on which the library was compiled.

Creating an AoSoA

There are several template parameters associated with the Cabana::AoSoA class, however, the most basic definition of the structure only requires the member data types of the tuples and a memory space in which to allocate the tuples. For example, allocating 10^7 particles using CUDA UVM with a double precision velocity, a single precision position, and an integer material id would be achieved by:

   using DataTypes = Cabana::MemberTypes<double[3], // velocity
                                         float[3],  // position
                                         int>;      // matid

   int num_tuple = static_cast<int>(1e7);
   Cabana::AoSoA<DataTypes,Kokkos::CudaUVMSpace> aosoa( num_tuple );

In this case, the inner array size will be auto-selected based on the CUDA execution space associated with the CUDA UVM memory space. If a user wanted the AoSoA using the same particle types to be compatible with OpenMP execution and to use an inner array size of 128 they would write:

   using DataTypes = Cabana::MemberTypes<double[3], // velocity
                                         float[3],  // position
                                         int>;      // matid

   int num_tuple = static_cast<int>(1e7);
   Cabana::AoSoA<DataTypes,ArraySize,Kokkos::HostSpace,128> aosoa( num_tuple );

In this case, the tuples will be allocated in the host space. Users may assign any arbitrary number of tuples to the AoSoA independent of the choice of inner array size. Note, however, that although the size may be the requested number of particles, the amount of memory allocated by the AoSoA will always be in multiples of the inner array size and therefore more memory may actually be allocated. The number of structs allocated within an AoSoA can be accessed via AoSoA::numSoA().

Resizing and Reserving Memory

AoSoA memory is defined in a single contiguous block. Users can both resize the AoSoA as well as reserve memory in this block. In both cases, the behavior is identical to std::vector within the C++ standard library. The size of the container refers to the number of elements in the container over which computations can be performed (e.g. the number of particles in the container) and the capacity of the container refers to the size of the storage space currently allocated for the container (e.g. the maximum number of tuples the container can currently store). The size of the container is not necessarily equal to the capacity of the container. The capacity of the container is always greater than or equal to the size of the container. This allows for an up-front allocation of memory in the AoSoA which can be followed by incremental addition of tuples to the container without the need for reallocation. When more capacity is needed, the container automatically expands it by reallocating a new, contiguous storage space.

To query the current size of an AoSoA and then to resize it:

   int old_num_tuple = aosoa.size();
   int new_num_tuple = static_cast<int>(6.5e6);
   aosoa.resize( new_num_tuple );

When resizing the AoSoA to a new size n, it behaves in the following manner:

  • If n is smaller than the current container size, the content is reduced to its first n elements.
  • If n is greater than the current container size, the content is expanded by inserting at the end as many elements as needed to reach a size of n.
  • If n is also greater than the current container capacity, an automatic reallocation of the allocated storage space takes place.

To query the current capacity of an AoSoA and the to reserve more memory (thereby increasing its capacity):

   int old_tuple_capacity = aosoa.capacity();
   int new_tuple_capacity = static_cast<int>(9.4e7);
   aosoa.reserve( new_tuple_capacity );

When reserving a new capacity n the container behaves in the following manner:

  • If n is greater than the current container capacity, the function causes the container to reallocate its storage increasing its capacity to n (or greater to the nearest multiple of the inner array size).
  • In all other cases, the function call does not cause a reallocation and the container capacity is not affected.

In all use cases of AoSoA::resize() and AoSoA::reserve(), the allocated storage of the container never decreased - it will either stay the same or increase.

Implementation

Cabana_AoSoA.hpp

Interface

template<class DataTypes,
         class MemorySpace,
         int VectorLength>
class AoSoA

Examples

Usage

  using T0 = float[3];
  using T1 = double;
  using member_types = Cabana::MemberTypes<T0,T1>;
  using memspace = Kokkos::HostSpace;
  const int vector_length = 16;
  
  Cabana::AoSoA<member_types,memspace,vector_length> particle_list( num_particles );

This is part of the Programming Guide series

Clone this wiki locally