C shennanigans: Pointers, sequence points and bit fields.

This text focuses on some of the non-obvious and easy to make mistakes non-experienced C programmers are likely to make and are/can not completely be covered by tooling without going into edge cases relevant to performance and covering the most simple and conservative approach:
  1. Pointer semantics
  2. Sequence points
  3. Bit-fields
Compiler flags or implementation may provide workarounds to these problems to prevent optimizations based on introduced Undefined Behavior (UB). Review used C compilers with flags used including tests and platforms before reusing of any code. The SEI wiki covers some basic cases without covering compiler workarounds. Pointer construction is widely unspecified in earlier C standards before C11 and up to this day with C23 pointer semantics have no formal model, see also item pointer construction requirements.
  1. Pointer semantics in C.
    1. Pointer access requirements
    2. Pointer access in practice
    3. The Exceptions
    4. Pointer construction requirements
    5. Pointer construction in practice

    1. Pointer access requirements are fairly well specified from C89 on in strong contrast to pointer construction requirements and programmers with knowledge of how processing hardware works can derive below information.
      1. Proper alignment
        • Cleanly accessing a pointer with increased alignment requires to use a temporary with memcopy.
        • To only compare pointers decrease alignment with char* pointer.
        • To prune type info for generics use void* pointer.
        • You are responsible to call a function that provides or provide yourself.
      2. Sufficient storage (pointer must point to valid object)
      3. Sufficient padding (ie withing structs).
      4. Correct aliasing ("Strict Aliasing Rule")
        • C23 6.5 Expressions paragraph 7
          "An object shall have its stored value accessed only by an lvalue expression
          that has one of the following types - a type compatible with the effective type of the object,
          - a qualified version of a type compatible with the effective type of the object,
          - a type that is the signed or unsigned type corresponding to the effective type of the object,
          - a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
          - an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
          - a character type."
        • What does this means in practice?
          Each pointer has an associated "provenance" it is allowed to point to. This mean that a pointer ptr must uphold (&array[0] <= ptr && ptr < &array[len+1]). for access with array being the "memory origin range" on stack or heap. Pointers must point to the same array, when being used for arithmetic.
          Function arguments of identical pointer types are allowed to have overlapping provenance regions, unless annotated with restrict (__restrict__ for C++ in clang/gcc), but pointers of different types are not allowed to have those regions. Pointer comparison must be done via identical alignments unless comparison of a pointer against pointer to 0 usually abbreviated via macro NULL.
    2. Pointer access in practice.
      • Provenance as regions pointer is allowed to point to for access.
        provenance.c
        #include <stdio.h>
        #include <stdlib.h>
        void use_ptr(int * arr) {
          printf("0: %d, 9: %d\n", arr[0], arr[9]);
        }
        int main() {
          int arr1[10];
          use_ptr(arr1);
          int * arr2 = malloc(sizeof(int));
          use_ptr(arr2);
          free(arr2);
        }
        
      • Copy around some bytes from not overlapping regions (otherwise use memmove).
        copy_bytes.c
        #include <string.h>
        #include <stdint.h>
        void use_bytes(uint8_t * bytes, int32_t len_bytes, uint32_t * output, int32_t len_output) {
          for(int i=0; i*4<len_bytes && i<len_output; i+=4) {
            memcpy(&output[i], &bytes[4*i], sizeof(len_output));
          }
        }
        
      • Correct alignment of pointers with temporary, when necessary.
        correct_alignment.c
        #include <string.h>
        #include <stdint.h>
        int ptr_no_reinterpret_cast() {
          uint8_t arr[4] = {0,0,0,1};
          // unnecessary variable hopefully elided
          uint32_t u32_arr = 0;
          memcpy(&u32_arr, &arr[0], 4);
          uint32_t * u32_arr_ptr = &u32_arr;
          // <use u32_arr_ptr here>
          // Footgun: Dont return stack local variables
          return 0;
        }
        
      • Ensure correct storage and padding size for pointers via sizeof.
        storage_padding.c
        #include <stdint.h>
        #include <stdlib.h>
        struct sStruct1 {
          uint8_t a1;
          uint8_t a2;
          uint32_t b1;
          uint32_t b2;
        };
        void padding() {
          struct sStruct1 * str1 = malloc(sizeof(struct sStruct1)); 
          str1->a1 = 5;
          free(str1);
        }
        
      • Allowed aliasing of pointers (type-based aliasing analysis)
        allowed_aliasing.c
        #include <stdint.h>
        void allowed_aliasing(uint16_t * bytes, int32_t len_bytes, uint16_t * lim) {
          for(int i=0; i<len_bytes; i+=1) {
            if (bytes == lim) break;
            bytes[i] = 42;
          }
        }
        
      • Non-allowed aliasing of pointers (type-based aliasing analysis)
        non_allowed_aliasing.c
        void non_allowed_aliasing(uint16_t * bytes, int32_t len_bytes, uint8_t * lim) {
          for(int i=0; i<len_bytes; i+=1) {
            if (bytes == lim) break;
            bytes[i] = 42;
          }
        }
        
    3. The Exceptions.
      • Controlling the build system + compiler invocation to opt-out of provenance based optimizations.
        1. Clang and gcc have -fno-strict-aliasing, msvc and tcc do disable type-based aliasing analysis based optimizations.
        2. As of 20240603, there is no switch to disable provenance-based alias analysis in compilers (clang, gcc, msvc, tcc).
        3. Usage of restrict can be en/disabled in all compilers via #pragma optimize("", on/off).
        4. It can also be disabled in all compilers via #define restrict, using an according optimization level (typical -O1) or via separating header and implementation and disabling link time optimziations.
      • Posix extension and Windows in practice enable dynamic linking via casting pointers void *to function pointers and back. This also means that sizeof (function pointer) == sizeof (void *) must be uphold, which is not true for microcontrollers with separate address space for code and data or CHERI in mixed capability mode/hybrid compilation mode. Address space annotations are mandatory for this to work and it is unfortunate that standards do not reflect this as of 20240428.
        aliasing_exceptions_uniform_address_space.c
        #include <assert.h>
        #include <stdint.h>
        uint8_t external_memory [1024];
        typedef int(*pfn_add_one)(int);
        int add_one(int x) { return x+1; }
        void usage(int x) {
          // read fn ptr from external code
          void * pv_add_one = (void*)external_memory;
          pfn_add_one pfn_add_one_casted = (pfn_add_one)pv_add_one;
          int res = pfn_add_one_casted(1);
          assert(res == 2);
        }
        
    4. Pointer construction requirements are unspecified in all C standards with potentially some hints and nothing concrete up to including C23 which further implies that pointer semantics have no formal model. At least a few possible formal models exist (paper VIP: Verifying Real-World C Idioms with Integer-Pointer Casts, N2676, P2318R1: A Provenance-aware Memory Object Model for C) so far without taking into account CHERI in mixed capability mode/hybrid compilation mode and from what I understand without taking all equivalence classes of pointer operations into account.
      Therefore it is best to use the most conservative approach xor to provide the set of chosen (non-portable) compiler semantics in the build system next to the code to remove room for ambiguity.
      For further information about this, take a look into paper "Subtleties of the ANSI/ISO C standard" and "n2263: Clarifying Pointer Provenance v4".
      To simplify things, we can however extend the strict aliasing rule pointer construction with shortcomings regarding "effective type" on type punning for hardware related programming: This would mean that generated pointers must uphold (&array[0] <= ptr && ptr <= &array[len+1]) || ptr == 0 || ptr = undefined) with ptr == 0 and undefined pointers being the exceptions.
      Standards up to including C23 do not specify this behavior explicitly. For example C23 specifies that operations on pointers to a object must remain in the above given range and temporal pointer overflow behavior is undefined. Expected behavior of exposed (externally readable and writable) addresses via headers and object files including possible future C standard direction can be found in "A Provenance-aware Memory Object Model for C".
      Temporal out of bounds behavior, linker semantics with guaranteed addresses or address regions and all other constrains remain unspecified.
      It is not discussed here how the optimizer would prove how serialized and deserialized pointer have the same provenance regions (integer cast, memory copy or external usage), because there are multiple algorithms and this article is already too long.
      Rust decided to allow programmers experimental low level control over provenance with experimenting on CHERI and an interpreter for iterating on the provenance model and to work(around) with backends, see Rust RFC 3559 title "rust_has_provenance" and section "Drawbacks".
      The following special cases of pointer operations can be taken into account, when discussing provenance-based optimizations (in contrast to type-based aliasing analysis):
      1. Opaque type idiom.
        Opaque types provide a way to guarantee correct usage of object and pointer properties for a library or API user and thus should be preferred, if feasible.
      2. Pointer to integer and integer to pointer conversion.
        Pointer/integer to integer/pointer conversion mandates in all suggested models for pointer semantics (of C) to prevent provenance-based optimizations unless the optimizer can prove with certainty the origin of pointer provenance and/or programmers must/can annotate provenance information to pointers to guide the optimizer about which memory relations can and can not be optimized (unstandardised).
      3. Headers/exports exposing data structures, pointers to data structures and void pointers.
        Link time optimization (LTO) works across across header and object boundaries if sufficient information/artefacts for caller and callee are given, so construction of exposed aliasing pointers may lead to undefined behavior depending on the build system flags and used compiler.
      4. Compiler intrinsics for IO: memcmp, memcpy, memmove, memset.
        IO Compiler intrinsic semantics are yet to be taken portably into account due to a lot legacy code relying on certain properties and pointer properties like alignment being implicit. Technically optimizations are possible with annotating sufficient pointer information and useful to accelerate via SIMD and tracking provenance along pointers, like for different addressing modes or capabilities in CHERI, would be further useful.
      5. Checking C code validity with Cerberus.
        Cerberus allows checking C code semantics for most common idioms, but does not support the complete corpus of C syntax. It also offers checking semantics of multithreaded code, but this is out of scope for this article.
      6. CHERI rules for pointers.
        In CHERI mixed-capability mode pointers may be raw pointers inclusive or pointer with annotated capabilities, which can include things like lower and upper address bound, permissions masks, flags usable for OS or application tasks, see "Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture (Version 9)". Since there is no formal model on how CHERI pointer semantics work, examples are not included. A work in progress CHERI C is given in paper "Formal Mechanised Semantics of CHERI C: Capabilities, Undefined Behaviour, and Provenance". CHERI offers (scalable) compartmentalization, spatial memory safety with opt-in temporal memory safety via runtime support mandating pointer capability revocation on freeing memory with latest example being CheriBSD experimental userspace temporal memory safety (20240602).
      7. What to expect for the future.
        LLVM support for full restrict has been merged, but it has design and quality problems, so it looks like the C++ code base of LLVM prevents faster iterations and/or more fundamental changes, since the feature is in development for now ca. 5 years (since 2019).
        Semantic implications are not communicated and neither formalized, so the future path remains unclear. This is reflected by long-lasting miscompilation not getting fixed.
    5. Pointer construction in practice. The original intention was to explain provenance based rules, but due to long standing bugs in LLVM and gcc and no formal model with performance safety, compilation time and other implications, I would suggest the reader to write thorough tests and on doubts about testability to disable provenance based optimizations, especially in production code.
      Optimizers with provenance based optimization steps are unfortunately not build with controllability and debuggability in mind and standard bodies so far can not recommend any extensive test corpus to derive how frontend and backend optimizer tests would need to be designed.
      Other more elaborative examples can be seen in the github gist "What is the Strict Aliasing Rule and Why do we care?".
      1. Opaque type idiom.
        opaque.h
        #include <stddef.h>
        #include <stdint.h>
        struct item;
        size_t item_size(void);
        void id_setid(struct item * it, int32_t id);
        int item_getid(struct item * it);
        
        opaque.c
        #include "opaque.h"
        struct item { int32_t id; };
        size_t item_size(void) { return sizeof(struct item); }
        void id_setid(struct item * it, int32_t id) { it->id = id; }
        int item_getid(struct item * it) { return it->id; }
        
      2. Pointer to integer and integer to pointer conversion.
        ptrtoint_inttoptr.c
        #include <assert.h>
        #include <inttypes.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        static void memset_16aligned(void * ptr, char byte, size_t size_bytes, uint16_t alignment) {
            assert((size_bytes & (alignment-1)) == 0); // Size aligned
            assert(((uintptr_t)ptr & (alignment-1)) == 0); // Pointer aligned
            memset(ptr, byte, size_bytes);
        }
        // 1. Careful with segmented address spaces: lookup uintptr_t semantics
        // 2. Careful with long standing existing optimization compiler bugs pointer to
        // integer and back optimizations in for example clang and gcc
        // 3. Careful with LTO potentially creating problem 2.
        // 4. Consider C11 aligned_alloc or posix_memalign
        void ptrtointtoptr() {
          const uint16_t alignment = 16;
          const uint16_t align_min_1 = alignment - 1;
          void * mem = malloc(1024+align_min_1);
          // C89: void *ptr = (void *)(((INT_WITH_PTR_SIZE)mem+align_min_1) & ~(INT_WITH_PTR_SIZE)align_min_1);
          // ie void *ptr = (void *)(((uint64_t)mem+align_min_1) & ~(uint64_t)align_min_1);
          // offset ptr to next alignment byte boundary
          void * ptr = (void *)(((uintptr_t)mem+align_min_1) & ~(uintptr_t)align_min_1);
          printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
          memset_16aligned(ptr, 0, 1024, alignment);
          free(mem);
        }
        
      3. Link time optimization (LTO) usage and problems. One can use ptrtoint_inttoptr.c with flags for strong LTO to optimize the bit code of the complete program, for example via clang -flto -funified-lto -fuse-ld=lld ptrtoint_inttoptr.c.
      4. IO Compiler intrinsic semantics example. It would be helpful to have a way to add alignment to pointers to have the compiler automatically do runtime selection of the best SIMD routine instead of being forced to do this manually. __attribute__(aligned(ALIGNMENT), _Alignas(ALIGNMENT), alignas(ALIGNMENT) do offer no guarantee that the code is vectorized and one has to check for example via clang -O3 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize or gcc -O3 -ftree-vectorizer-verbose=3 and use extensions https://clang.llvm.org/docs/LanguageExtensions.html and https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html like __builtin_assume_aligned.
        extern.h
        #include <immintrin.h>
        void memcpy_avx(__m256i * __restrict src, __m256i * __restrict dest, size_t n);
        
        extern.c
        #include "extern.h"
        /// requires 32 byte aligned src, dest; src and dest must not overlap
        void memcpy_avx(__m256i * __restrict src, __m256i * __restrict dest, size_t n);
          size_t n_vec = n / sizeof(__m256i);
          for(size_t i=0; i<n_vec; i+=1) {
            const __m256i temp = _mm256_load_si256(src);
            _mm256_store_si256(dest, temp);
            src += 1;
            dest += 1;
          }
        }
        
        memcpy_avx.c
        #include <stdio.h>
        #include <stdint.h>
        #include "extern.h"
        int main(void) {
          uint8_t mem_src[1024] = { 0 };
          uint8_t mem_dest[1024] = { 0 };
          const uint16_t alignment = 32;
          const uint16_t align_min_1 = alignment - 1;
          __m256i * p_src = (void *)(((uintptr_t)mem_src+align_min_1) & ~(uintptr_t)align_min_1);
          __m256i * p_dest = (void *)(((uintptr_t)mem_dest+align_min_1) & ~(uintptr_t)align_min_1);
          memcpy_avx(p_src, p_dest, 4);
          fprintf(stdout, "p_src: %p, p_dest: %p\n", (void*)p_src, (void*)p_dest);
          return 0;
        }
        // clang -Weverything -O3 -march=native memcpy_avx.c extern.c && ./a.out
        // Output (contains C++ warnings):
        // extern.c:8:5: warning: unsafe pointer arithmetic [-Wunsafe-buffer-usage]
        //     8 |     src += 1;
        //       |     ^~~
        // extern.c:9:5: warning: unsafe pointer arithmetic [-Wunsafe-buffer-usage]
        //     9 |     dest += 1;
        //       |     ^~~~
        // 2 warnings generated.
        // p_src: 0x7ffceb985a60, p_dest: 0x7ffceb985660
        
      5. Checking C code validity with Cerberus does not imply absence of compiler miscompilations.
        cerberus_install.sh
        # Install opam with ocaml
        git clone https://github.com/rems-project/cerberus
        opam install --deps-only ./cerberus-lib.opam ./cerberus.opam
        make
        make install DESTDIR=$HOME/.local/cerberus
        echo 'PATH=${PATH}:"$HOME/.local/cerberus/bin"' >> ~/.bashrc
        eval (opam env)
        cerberus --help
        
        extern.h
        #include <stddef.h>
        extern size_t x;
        
        extern.c
        #include "extern.h"
        size_t x = 0;
        
        ptr_provenance_miscompilation.c
        #include <stddef.h>
        #include <stdio.h>
        #include "extern.h"
        // Removing restrict makes the miscompilation go away
        size_t f(size_t * restrict ptr_to_x);
        size_t f(size_t * restrict ptr_to_x) {
          size_t * p = ptr_to_x;
          *p = 1;
          if (p == &x) {
              // Expected branch, taken only in Debug mode
              *p = 2;
          }
          return *p;
        }
        int main(void) {
          if (f(&x) == 1) fprintf(stderr, "panic : p != &x\n");
        }
        // clang -O0 -Weverything ptr_provenance_miscompilation.c extern.c && ./a.out
        // output:
        // clang -O1 -Weverything ptr_provenance_miscompilation.c extern.c && ./a.out
        // output: panic : p != &x
        // cerberus ptr_provenance_miscompilation.c extern.c
        // output:
        // merging everything into ptr_provenance_miscompilation.c
        // cerberus ptr_provenance_miscompilation.c
        // output:
        
      6. CHERI usage is left as task for the reader.. Useful links are https://github.com/CTSRD-CHERI/cheribuild, https://github.com/CTSRD-CHERI/cheri-c-programming and https://github.com/capablevms/cheri-examples.
  2. Sequence Points in simple case and with storage lifetime extension.
    sequence_points.c
    #include <stdint.h>
    #include <stdio.h>
    int f(int* a) {
      *a=*a+1;
      return *a;
    }
    void simple_sequence_points() {
      int a = 0;
      // warning: Multiple unsequenced modifications to a
      // a = a++ + a++;
      // Problem without warnings
      a = f(&a) + f(&a);
      a = f(&a);
      a += f(&a);
    }
    struct sExample { int32_t a[1]; };
    struct sExample create_sExample(void) {
      structt sExample res = { { 1 } };
      return res;
    }
    int storage_lifetime_footgun(void) {
      // undefined behavior introduced if temporary is missing
      // printf("%x", ++(create_fail().a[0]));
      struct sExample res = create_sExample();
      printf("%x", ++(res.a[0]));
      return 0;
    }
    
  3. Bit-fields should not be used unless for non-portable code regarding compilers and CPUs and do not make assumptions regarding the layout of structures with bit-fields and use static_assert/_Static_assert on every struct. Keep bit-fields as simple as possible, meaning prefer not to nest them or also static_assert the layout.
    Reasons from ISO/IEC 9899:TC3
    > An implementation may allocate any addressable storage unit large enough to hold a bit
    > field. If enough space remains, a bit-field that immediately follows another bit-field in a
    > structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
    > whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
    > implementation-defined. The order of allocation of bit-fields within a unit (high-order to
    > low-order or low-order to high-order) is implementation-defined. The alignment of the
    > addressable storage unit is unspecified.
    or in other words:
    1. Order of allocation not specified.
    2. Most significant bit not specified.
    3. Alignment is not specified.
    4. Implementations can determine, whether bit-fields cross a storage unit boundary.
    5. Structs may contain padding bytes anywhere.