RSS odkazy
««« »»»

Struct of arrays & array of structs

25. 5. 2021 #D #memory #CPU #en

Hey look at this: Another article claiming the OOP is pure evil, mainstream programming languages are bad and our mental models are hopelessly obsolete. I've read dozen just like this every day before breakfast. It's nothing new in general, but this one mixes up things a little bit. Mentioned evils are catastrophes because they lead to data representation which is a priori inefficient on modern CPUs. Logical conclusion of OOP-style objects is the array of structs layout, but processors are efficient only with the struct of arrays. That is, of course, nonsense.

Arrays of structs (AoS) is the typical object layout where fields of one object are packed in memory next to each other.

struct Ant {
  string color;
  int age;
  bool warrior;

Ant[] aos;

On the other hand, struct of arrays (SoA) stores data field-wise. The first fields of every object are packed together, then second one and so on.

struct AntSoA {
  string[] color;
  int[] age;
  bool[] warrior;

AntSoA soa;

The SoA approach is beneficial in situations where a program needs to access only a few fields of a large set of objects. These are packed closely together, therefore reading is nice and snappy: small number of cache misses, every byte of every cache line is used and hardware prefetch works the best.

Important detail is that SoA and AoS have different uses, not completely dissimilar to column and row databases. They assume locality in different dimensions and sometimes one or another approach will work better. The article seems to claim the SoA is always better for the CPU. It's rather lazy iconoclasm: Smash an old dogma, replace it with a new one.

Also, we need to take into account constant evolution. SW is being accommodated to HW. HW is being improved to match SW requirements. Guessing speed and behavior of a program based just on a number of cache lines touched is insufficient in most cases. It's approach one decade past its sell by date. Today's CPU are much more complicated, they are out-of-order monsters with large and deep caches and aggressive prefetch units. All of this is there to mask main memory latency. The OOO engine can speculate hundreds of instructions ahead in order to initiate as much cache misses as possible and wait on them in parallel. Ideally the prefetch and the OOO masks RAM latency altogether and loads are limited just by throughput of the memory. On my cheap old laptop full cache miss can take up to 130 ns, but just about 5 ns if the same cache line is delivered by the prefetcher. The article speaks about data locality and cache and that's all very important, but avoids topic of data dependency. In some sense true data dependency is much bigger showstopper because HW cannot speculate through it.

One example: Following test program touches cache lines in random order leading to steady stream of cache misses. First with data dependency, second without it. The difference is staggering.

struct Line { ubyte[64] data; }

auto len = 1 << 24;
Line[] lines = new Line[len];
Line*[] points = lines.map!((ref l) => &l).array;

// with data dependency
ulong prev;
foreach (p; points) {
  // adding prev to introduce data dependency
  // prev is always 0, but CPU have ho knowledge of it and it's forced to wait
  // for data to arrive from memory. RAM accesses are serialized as result.
  sink += prev = (*(p+prev)).data[0];

// withou data dependency
foreach (p; points) {
  sink += (*p).data[0];

Both variants iterate through the data in the same random order. What's different is the fact that the first loop have data dependency and as such it's forced to serialize data accesses. Conversely the second loop can speculate freely.

The data dependent loop runs about 3.5× slower on rather historic i5-2520M machine. It's safe to assume a newer CPUs with more aggressive OOO and larger memory parallelism will show even greater difference. As usual, it depends on details. SoA and AoS are tool, one is not universally more efficient than the other.

But on the other hand it is true programming languages should offer some means to choose between SoA and AoS layouts without too much changes to the code. Memory layout is just another abstraction.

But it isn't true, no language can offer this capability. Any language with sufficiently strong meta-programming capabilities should be able to do it. I quickly wrote a sketch of SoA layouted array in D language. There's the abridged version of it.

struct StructOfArrays(T) if (is(T == struct)) {

  private static forEachField(string code) {
    string res;
    static foreach (field; FieldNameTuple!T) { res ~= code.replace("$f", field); }
    return res;

  mixin(forEachField(q{ typeof(T.$f)[] $f; }));

  struct FakeItem {
    ulong i;
    StructOfArrays* soa;

    mixin(forEachField(q{ auto $f() { return soa.$f[i]; } }));

    T convertToT() {
      T res;
      mixin(forEachField(q{ res.$f = $f; }));
      return res;

    alias convertToT this;

  ulong length() {
    return mixin((FieldNameTuple!T)[0]).length;

  auto opIndex(ulong i) {
    return FakeItem(i, &this);

  auto opIndexAssign(U)(U x, ulong i) if (is(U == T) || is(U == FakeItem)) {
    mixin(forEachField(q{ $f[i] = x.$f; }));

  auto opOpAssign(alias op)(T x) if (op == "~") {
    mixin(forEachField(q{ $f ~= x.$f; }));

  // more methods in similar spirit


It's no beauty for sure. It mostly just generates methods for packing and unpacking structs to arrays and back. Usage is very similar to handling of ordinary data structures. Only difference is that there's SoA under the hood.

struct Ant {
  string color;
  int age;
  bool warrior;

auto ants = StructOfArrays!Ant();
ants ~= Ant("red", 11, false);
ants ~= Ant("blue", 1, true);

writeln(ants[0]);       // soa.Ant(color: red, age: 11, warrior: false)
writeln(ants.warrior);  // [false, true]
writeln(ants.length);   // 2
ants[1] = ants[0];
writeln(ants);          // StructOfArrays!(Ant)(["red", "red"], [11, 11], [false, false])
Ant x = ants[0];
writeln(x);             // Ant("red", 11, false)

When Andrei Alexandrescu proclaimed introspection, meta-programming and code generation will be the next big thing, he was onto something. It's incredibly powerful tool, which can bridge over a chasm between expressivity on one side and efficiency on another.


Note: It seems the term data-oriented programming is used in two rather different contexts. On one hand as a method to organize data to better fir CPU cache and on the another as a realization of Rich Hickey'S monstrous dream about naked data without any abstractions.

píše k47 (@kaja47, k47)