This article focuses on some of the non-obvious and easy to make mistakes non-experienced C programmers are likely to make and are/can not completely be covered by tooling without going into edge cases relevant to performance and covering the most simple and conservative approach:
Compiler flags or implementation may provide workarounds to these problems to prevent optimizations based on introduced Undefined Behavior (UB). Review used C compilers with flags used including tests and platforms before reusing of any code. The SEI wiki covers some basic cases without covering compiler workarounds. Pointer construction is widely unspecified in earlier C standards before C11 and up to this day with C23 pointer semantics have no formal model, see also item pointer construction requirements.
char*
pointer.
void*
pointer. ptr
must uphold (&array[0] <= ptr && ptr < &array[len+1])
for access with array being the “memory origin range” on stack or heap. Pointers must point to the same array, when being used for arithmetic.restrict
(__restrict__
for C++ in clang/gcc), but pointers of different types are not allowed to have those regions. Pointer comparison must be done via identical alignments, unless comparison of a pointer against pointer to 0, usually abbreviated via macro NULL
.Pointer access in practice.
#include <stdio.h>
#include <stdlib.h>
void use_ptr(int *arr) { printf("0: %d, 9: %d\n", arr[0], arr[9]); }
int main() {
int arr1[10];
use_ptr(arr1);
int *arr2 = malloc(sizeof(int));
use_ptr(arr2);
free(arr2);
}
#include <stdint.h>
#include <string.h>
void use_bytes(uint8_t *bytes, int32_t len_bytes, uint32_t *output, int32_t len_output) {
for (int i = 0; i * 4 < len_bytes && i < len_output; i += 4) {
memcpy(&output[i], &bytes[4 * i], sizeof(len_output));
}
}
#include <stdint.h>
#include <string.h>
int ptr_no_reinterpret_cast() {
uint8_t arr[4] = {0, 0, 0, 1};
// unnecessary variable hopefully elided
uint32_t u32_arr = 0;
memcpy(&u32_arr, &arr[0], 4);
uint32_t *u32_arr_ptr = &u32_arr;
// <use u32_arr_ptr here>
// Footgun: Dont return stack local variables
(void)u32_arr_ptr;
return 0;
}
#include <stdint.h>
#include <stdlib.h>
struct sStruct1 {
uint8_t a1;
uint8_t a2;
uint32_t b1;
uint32_t b2;
};
void padding() {
struct sStruct1 *str1 = malloc(sizeof(struct sStruct1));
str1->a1 = 5;
free(str1);
}
#include <stdint.h>
void allowed_aliasing(uint16_t *bytes, int32_t len_bytes, uint16_t *lim) {
for (int i = 0; i < len_bytes; i += 1) {
if (bytes == lim)
break;
bytes[i] = 42;
}
}
#include <stdint.h>
void non_allowed_aliasing(uint16_t *bytes, int32_t len_bytes, uint8_t *lim) {
for (int i = 0; i < len_bytes; i += 1) {
if (bytes == lim)
break;
bytes[i] = 42;
}
}
The Exceptions
-fno-strict-aliasing
, msvc and tcc do disable type-based aliasing analysis based optimizations.restrict
can be en/disabled in all compilers via #pragma optimize("", on/off)
. It can also be disabled in all compilers via #define restrict
, using an according optimization level (typical -O1
) or via separating header and implementation and disabling link time optimziations.void *
to function pointers and back. This also means that sizeof (function pointer) == sizeof (void *)
must be uphold, which is not true for microcontrollers with separate address space for code and data or CHERI in mixed capability mode/hybrid compilation mode. Address space annotations are mandatory for this to work and it is unfortunate that standards do not reflect this as of 2024-04-28. #include <assert.h>
#include <stdint.h>
uint8_t external_memory[1024];
typedef int (*pfn_add_one)(int);
int add_one(int x) { return x + 1; }
void usage(int x) {
// read fn ptr from external code
void *pv_add_one = (void *)external_memory;
pfn_add_one pfn_add_one_casted = (pfn_add_one)pv_add_one;
int res = pfn_add_one_casted(1);
assert(res == 2);
}
Pointer construction requirements are unspecified in all C standards with potentially some hints and nothing concrete up to including C23 which further implies that pointer semantics have no formal model. At least a few possible formal models exist (paper VIP: Verifying Real-World C Idioms with Integer-Pointer Casts, N2676, P2318R1: A Provenance-aware Memory Object Model for C) so far without taking into account CHERI in mixed capability mode/hybrid compilation mode and from what I understand without taking all equivalence classes of pointer operations into account.
Therefore it is best to use the most conservative approach xor to provide the set of chosen (non-portable) compiler semantics in the build system next to the code to remove room for ambiguity.
For further information about this, take a look into paper “Subtleties of the ANSI/ISO C standard” and “n2263: Clarifying Pointer Provenance v4”.
To simplify things, we can however extend the strict aliasing rule pointer construction with shortcomings regarding “effective type” on type punning for hardware related programming: This would mean that generated pointers must uphold (&array[0] <= ptr && ptr <= &array[len+1]) || ptr == 0 || ptr = undefined)
with ptr == 0
and undefined pointers being the exceptions.
Standards up to including C23 do not specify this behavior explicitly. For example C23 specifies that operations on pointers to a object must remain in the above given range and temporal pointer overflow behavior is undefined. Expected behavior of exposed (externally readable and writable) addresses via headers and object files including possible future C standard direction can be found in “A Provenance-aware Memory Object Model for C”.
Temporal out of bounds behavior, linker semantics with guaranteed addresses or address regions and all other constrains remain unspecified.
It is not discussed here how the optimizer would prove how serialized and deserialized pointer have the same provenance regions (integer cast, memory copy or external usage), because there are multiple algorithms and this article is already too long.
Rust decided to allow programmers experimental low level control over provenance with experimenting on CHERI and an interpreter for iterating on the provenance model and to work(around) with backends, see Rust RFC 3559 title “rust_has_provenance” and section “Drawbacks”.
The following special cases of pointer operations can be taken into account, when discussing provenance-based optimizations (in contrast to type-based aliasing analysis):
Pointer construction in practice. The original intention was to explain provenance based rules, but due to long standing bugs in LLVM and gcc and no formal model with performance safety, compilation time and other implications, I would suggest the reader to write thorough tests and on doubts about testability to disable provenance based optimizations, especially in production code.
Optimizers with provenance based optimization steps are unfortunately not build with controllability and debuggability in mind and standard bodies so far can not recommend any extensive test corpus to derive how frontend and backend optimizer tests would need to be designed.
Other more elaborative examples can be seen in the github gist “What is the Strict Aliasing Rule and Why do we care?”.
Opaque type idiom. #include <stddef.h>
#include <stdint.h>
struct item;
size_t item_size(void);
void id_setid(struct item *it, int32_t id);
int item_getid(struct item *it);
#include "opaque.h"
struct item {
int32_t id;
};
size_t item_size(void) { return sizeof(struct item); }
void id_setid(struct item *it, int32_t id) { it->id = id; }
int item_getid(struct item *it) { return it->id; }
Pointer to integer and integer to pointer conversion. #include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void memset_16aligned(void *ptr, char byte, size_t size_bytes, uint16_t alignment) {
assert((size_bytes & (alignment - 1)) == 0); // Size aligned
assert(((uintptr_t)ptr & (alignment - 1)) == 0); // Pointer aligned
memset(ptr, byte, size_bytes);
}
// 1. Careful with segmented address spaces: lookup uintptr_t semantics
// 2. Careful with long standing existing optimization compiler bugs pointer to
// integer and back optimizations in for example clang and gcc
// 3. Careful with LTO potentially creating problem 2.
// 4. Consider C11 aligned_alloc or posix_memalign
void ptrtointtoptr() {
uint16_t const alignment = 16;
uint16_t const align_min_1 = alignment - 1;
void *mem = malloc(1024 + align_min_1);
// C89: void *ptr = (void *)(((INT_WITH_PTR_SIZE)mem+align_min_1) & ~(INT_WITH_PTR_SIZE)align_min_1);
// ie void *ptr = (void *)(((uint64_t)mem+align_min_1) & ~(uint64_t)align_min_1);
// offset ptr to next alignment byte boundary
void *ptr = (void *)(((uintptr_t)mem + align_min_1) & ~(uintptr_t)align_min_1);
printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
memset_16aligned(ptr, 0, 1024, alignment);
free(mem);
}
Link time optimization (LTO) usage and problems. One can use ptrtoint_inttoptr.c
with flags for strong LTO to optimize the bit code of the complete program, for example via clang -flto -funified-lto -fuse-ld=lld ptrtoint_inttoptr.c
.
IO Compiler intrinsic semantics example. It would be helpful to have a way to add alignment to pointers to have th compiler automatically do runtime selection of the best SIMD routine instead of being forced to do this manually. __attribute__(aligned(ALIGNMENT)
, _Alignas(ALIGNMENT)
, alignas(ALIGNMENT)
do offer no guarantee that the code is vectorized and one has to check for example via clang -O3 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize
or gcc -O3 -ftree-vectorizer-verbose=3
and use clang extensions and gcc extensions like __builtin_assume_aligned
. #include <immintrin.h>
void memcpy_avx(__m256i *__restrict src, __m256i *__restrict dest, size_t n);
#include "extern_avx.h"
/// requires 32 byte aligned src, dest; src and dest must not overlap
void memcpy_avx(__m256i *__restrict src, __m256i *__restrict dest, size_t n) {
size_t n_vec = n / sizeof(__m256i);
for (size_t i = 0; i < n_vec; i += 1) {
__m256i const temp = _mm256_load_si256(src);
_mm256_store_si256(dest, temp);
src += 1;
dest += 1;
}
}
#include <stdint.h>
#include <stdio.h>
#include "extern_avx.h"
int main(void) {
uint8_t mem_src[1024] = {0};
uint8_t mem_dest[1024] = {0};
uint16_t const alignment = 32;
uint16_t const align_min_1 = alignment - 1;
__m256i *p_src = (void *)(((uintptr_t)mem_src + align_min_1) & ~(uintptr_t)align_min_1);
__m256i *p_dest = (void *)(((uintptr_t)mem_dest + align_min_1) & ~(uintptr_t)align_min_1);
memcpy_avx(p_src, p_dest, 4);
fprintf(stdout, "p_src: %p, p_dest: %p\n", (void *)p_src, (void *)p_dest);
return 0;
}
// clang -Weverything -O3 -march=native memcpy_avx.c extern.c && ./a.out
// Output (contains C++ warnings):
// extern.c:8:5: warning: unsafe pointer arithmetic [-Wunsafe-buffer-usage]
// 8 | src += 1;
// | ^~~
// extern.c:9:5: warning: unsafe pointer arithmetic [-Wunsafe-buffer-usage]
// 9 | dest += 1;
// | ^~~~
// 2 warnings generated.
// p_src: 0x7ffceb985a60, p_dest: 0x7ffceb985660
Checking C code validity with Cerberus does not imply absence of compiler miscompilations. #!/usr/bin/env bash
# Install opam with ocaml
git clone https://github.com/rems-project/cerberus
opam install --deps-only ./cerberus-lib.opam ./cerberus.opam
make
make install DESTDIR=$HOME/.local/cerberus
echo 'PATH=${PATH}:"$HOME/.local/cerberus/bin"' >> ~/.bashrc
eval (opam env)
cerberus --help
#include <stddef.h>
extern size_t x;
#include "extern_miscompilation.h"
size_t x = 0;
#include <stdio.h>
#include "extern_miscompilation.h"
// Removing restrict makes the miscompilation go away
size_t f(size_t *restrict ptr_to_x);
size_t f(size_t *restrict ptr_to_x) {
size_t *p = ptr_to_x;
*p = 1;
if (p == &x) {
// Expected branch, taken only in Debug mode
*p = 2;
}
return *p;
}
int main(void) {
if (f(&x) == 1)
fprintf(stderr, "panic : p != &x\n");
}
// clang -O0 -Weverything ptr_provenance_miscompilation.c extern.c && ./a.out
// output:
// clang -O1 -Weverything ptr_provenance_miscompilation.c extern.c && ./a.out
// output: panic : p != &x
// cerberus ptr_provenance_miscompilation.c extern.c
// output:
// merging everything into ptr_provenance_miscompilation.c
// cerberus ptr_provenance_miscompilation.c
// output:
CHERI usage is left as task for the reader. Useful links are https://github.com/CTSRD-CHERI/cheribuild
, https://github.com/CTSRD-CHERI/cheri-c-programming
and https://github.com/capablevms/cheri-examples
.
Sequence Points in simple case and with storage lifetime extension. #include <stdint.h>
#include <stdio.h>
int f(int *a) {
*a = *a + 1;
return *a;
}
void simple_sequence_points() {
int a = 0;
// warning: Multiple unsequenced modifications to a
// a = a++ + a++;
// Problem without warnings
a = f(&a) + f(&a);
a = f(&a);
a += f(&a);
}
struct sExample {
int32_t a[1];
};
struct sExample create_sExample(void) {
struct sExample res = {{1}};
return res;
}
int storage_lifetime_footgun(void) {
// undefined behavior introduced if temporary is missing
// printf("%x", ++(create_fail().a[0]));
struct sExample res = create_sExample();
printf("%x", ++(res.a[0]));
return 0;
}
Bit-fields should not be used unless for non-portable code regarding compilers and CPUs and do not make assumptions regarding the layout of structures with bit-fields and use static_assert
/_Static_assert
on every struct. Keep bit-fields as simple as possible, meaning prefer not to nest them or also static_assert the layout. Reasons from ISO/IEC 9899:TC3
> An implementation may allocate any addressable storage unit large enough to hold a bit
> field. If enough space remains, a bit-field that immediately follows another bit-field in a
> structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
> whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
> implementation-defined. The order of allocation of bit-fields within a unit (high-order to
> low-order or low-order to high-order) is implementation-defined. The alignment of the
> addressable storage unit is unspecified.
or in other words: