Performance Gains with Union

The power of Unions for efficient memory usage and performance improvement.

1. Union Basics

In C++, a union is a special data structure that allows multiple members to share the same memory location. All members of a union start at the same memory address, meaning that at any given time, only one member of the union can hold a value. This contrasts with a struct, where each member has its own memory location.

  • The memory allocated for a union is equal to the size of its largest member, and all members share that same memory space.
  • Only one member can hold a valid value at any time, though the union can store different types of values across its members.

2. Use Case

Here we have a typical senario where we can use union to enhance the cod efficiency. Our goal is to assign values to non-trivial types like int, float, and bool. In the less efficient implementation, we dynamically allocate memory for each data type. However, in the optimized version, we use a union to hold all the required variable types. This approach allows different types of variables to share the same memory address, eliminating redundant memory allocations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
#include <iostream>
#include <chrono>
#include <vector>
#include <memory>
#include <cstring>

const int numOperations = 1000000;

// Without union - traditional struct approach
struct RegularData {
// Pointer to hold the actual data
void* value;

// Constructor for int
RegularData(int val) {
value = new int(val);
}

// Constructor for float
RegularData(float val) {
value = new float(val);
}

// Constructor for bool
RegularData(bool val) {
value = new bool(val);
}

// Constructor for string
RegularData(const char* val) {
// Allocate memory and copy the string
size_t len = std::strlen(val) + 1; // Include null terminator
value = new char[len];
std::strcpy(static_cast<char*>(value), val);
}
};

// With union - optimized approach
struct OptimizedData {
enum class Type {INT, FLOAT, BOOL, STRING} type;
union DataUnion { // Size of largest member: 16 bytes
int a;
float b;
bool c;
char d[16];
} data;

OptimizedData(int val) {
data.a = val;
}

OptimizedData(float val) {
data.b = val;
}

OptimizedData(bool val) {
data.c = val;
}

OptimizedData(const char* val) {
std::strncpy(data.d, val, sizeof(data.d) - 1);
data.d[sizeof(data.d) - 1] = '\0';
}
};

void performanceTestRegular() {
std::vector<RegularData> data;

auto start = std::chrono::high_resolution_clock::now();

// Perform operations using the union
for (int i = 0; i < numOperations; ++i) {
switch (i % 4) {
case 0: // Use int data
data.emplace_back(1);
break;
case 1: // Use float data
data.emplace_back(1.f);
break;
case 2: // use bool data
data.emplace_back(true);
break;
case 3: // Use string data
data.emplace_back("hello");
break;
default:
break;
}
}

auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

std::cout << "Performance Test (Regular):\n";
std::cout << "Time taken for " << numOperations << " operations: "
<< duration.count() << " milliseconds\n";
}

void performanceTestOptimized() {
std::vector<OptimizedData> data;

auto start = std::chrono::high_resolution_clock::now();

// Perform operations using the union
for (int i = 0; i < numOperations; i++) {
switch (i % 4) {
case 0: // Use int data
data.emplace_back(1);
break;
case 1: // Use float data
data.emplace_back(1.f);
break;
case 2: // use bool data
data.emplace_back(true);
break;
case 3: // Use string data
data.emplace_back("hello");
break;
default:
break;
}
}

auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

std::cout << "Performance Test (Optimized):\n";
std::cout << "Time taken for " << numOperations << " operations: "
<< duration.count() << " milliseconds\n";
}

int main() {
performanceTestOptimized();
performanceTestRegular();
return 0;
}
1
2
3
4
Performance Test (Optimized):
Time taken for 1000000 operations: 74 milliseconds
Performance Test (Regular):
Time taken for 1000000 operations: 96 milliseconds

~23% performance improvement by using union.

3. Conclusion

The optimized version using a union is faster because:

  1. Regular version uses new, requiring dynamic allocation on heap memory. The optimized version uses a union which pre-allocates fixed space on the stack. Heap allocations are much slower than stack operations.

  2. Dynamically allocated memory is scattered across the heap, making memory access less predictable and potentially resulting in cache misses. The optimized version stores data within the object, making it contiguous in memory and improving cache locality.

  3. In regular version, every access requires dereferencing the pointer, introducing additional overhead. Copying or moving a RegularData object involves duplicating or reassigning pointers, potentially leading to more heap operations. The union stores the data inline, so no pointer dereferencing is required. Copying or moving OptimizedData objects is more efficient because it only involves copying the fixed-size union.

Author

Joe Chu

Posted on

2024-11-23

Updated on

2025-01-09

Licensed under

Comments