This article is intended as overview of software based debugging techniques to efficiently mix and match the appropriate technique for system level debugging with focus on statically optimizing compiler languages to keep complexity and scope limited. The reader may notice that there are several documented deficits across platforms and tooling on documentation or functionality, which will be improved. The author accepts the irony of such statements by “C having no ABI”/many systems in practice having no stable or formally specified ABI, but reality is in this text simplified for brevity and sanity.
Section 1 (theory) feels complete aside of simulation and hard-/software replacement techniques and are good first drafts for bug, debugging and debugging process. Section 2 (practical) is tailored towards non micro Kernels, which are based on process abstraction, but is currently missing some content and scalability numbers for tooling. The idea is to provide understanding and numbers to estimate for system design, 1 if formal proof of correctness is feasible and on what parts, 2 problems and methods applicable for dynamic system analysis. Section 3 (future) will wrap-up practical problems of what is currently not well to use or possible in Section 2 and speculate about more advanced ideas for brevity without numbers. Those ideas are planned to be towards how to design systems for rewriting and debugging using formal methods, compilers and code synthesis.
A (software) system can be represented as (often non-deterministic) state machine, such that a bug is a bad transition rule between those states. It is usually assumed that the developer/user knows correct and incorrect (bad) system states and the code represents a somewhat correct model of the intended semantics. Then an execution witness are the states and state transitions encountered on a specific program run. If the execution witness shows a “bad state”, then there must be a bug. Thus a debugger can be seen as query engine over states and transitions of a buggy execution witness.
In more simple terms, debugging is not making bugs or removing them.
Frequent operations are bug source isolation to deterministic components, where encapsulation of non-determinism usually simplifies the process. In contrast to that, concurrent code is tricky to debug, because one needs to trace multiple execution flows to estimate where the origin of the incorrect state is.
The process of debugging means to use static and dynamic (software) system analysis and its automation and adaption to speed up bug (classes) elimination for the (classes of) target systems.
One can generally categorize methods into the following list [automate, simplify, observe, understand, learn] (asoul)
with the fundamental constrains being [finding, eensuring, limited] (feel)
Common static and dynamic (software) system analysis methods to run the system to feel a soul for the purpose of eliminating the bug (classes) are:
The core ideas for what software system to run based on code with its semantics are then typically a mix of
Further, isolation and simplification are typically applied on all potential sub-components including, but not limited to hardware, code versioning including dependencies, source system, compiler framework and target system. Methods are usually
Debugging is domain- and design-specific and relies on core component(s) of the to be debugged system to provide necessary debug functionality. For example, software based hardware debugging relies on interfaces to the hardware like JTAG, kernel debugging on kernel compilation or configuration and elevated (user), user-space debugging on process and user permissions, system configuration or a child process to be debugged on Posix systems via ptrace
.
Without costly hardware devices to trace and physical access to the computing unit for exact recording of the system behavior including time information, dynamic (software) system analysis (to run the system) requires trade-offs on what program parts and aspects to inspect and collect data from. Therefore, it depends on many factors, for example bug classes and target systems, to what degree the process of debugging can and should be automated or optimized.
Depending on the domain and environment, problematic behavior of hardware or software components must be (more or less) 1 avoided or 2 traceable and there exist various (domain) metrics as decision helper. Very well designed systems explain users how to debug regarding to functional behavior, time behavior with internal and external system resources up to the degree the system usage and task execution correctness is intended. Access restrictions limit or rule out stepping, whereas storage limitations limit or rule out logging, tracing and recording.
Formal methods, Specification, (software) system synthesis and Formal Verification
(Highly) safety-critical systems or hardware are typically created from formal Specification by (software) system synthesis or, when (full) synthesis is unfeasible, implementations are formally verified. Standards for (highly) security-critical systems (like Creative Commons Evaluation Assurance Levels) provide customer assurances of the security policy according to the specification and are to my knowledge typically realized via Specification and Formal Verification without synthesis (2025-09-28).
For non safety- or security-critical or hardware (sub)systems, usually semantics are not “set into stone”, so Formal Verification or (software) system synthesis is rarely an option. Formal models and (semi-)formal specifications are however commonly used for design, planning, testing, review and validation of fail-safe or core (software) system functionality.
Typical used models for C, C++, Zig and compiler backends are Integer Arithmetic, Modular Arithmetic, Saturation Arithmetic for integers and Floating point arithmetic (with possible rough edge cases like signaling NaN propagation), Fixed-Point Arithmetic for real numbers. (Simplified) instances of Separation Logic may be used to model and check pointers and resources, for example Safe Rust uses separation logic with lifetime inference and user annotations based on strict aliasing of Unsafe Rust.
Typical relevant unsolved or incomplete models for compilers are
and typical problems more related to platforms like Kernels are
For Validation, Sanitizers are typically used as the most efficient and simplest debugging tools for C and C++, whereas Zig implements them, besides thread sanitizer, as allocator and safety mode. Instrumented sanitizers have a 2x-4x slowdown vs dynamic ones with 20x-50x slowdown.
Nr | Clang usage | Zig usage | Memory | Runtime | Comments |
---|---|---|---|---|---|
1 | -fsanitize=address | alloc + safety | 1x (3x stack) | 2x | Clang 16+ TB of virt mem |
2 | -fsanitize=leak | allocator | 1x | 1x | on exit ?x? more mem+time |
3 | -fsanitize=memory | unimplemented | 2-3x | 3x | |
4 | -fsanitize=thread | -fsanitize=thread | 5-10x+1MB/thread | 5-15x | Clang ?x? (“lots of”) virt mem |
5 | -fsanitize=type | unimplemented | ? | ? | not enough data |
6 | -fsanitize=undefined | safety mode | 1x | ~1x | |
7 | -fsanitize=dataflow | unimplemented | 1-2x? | 1-4x? | wip, get variable dependencies |
8 | -fsanitize=memtag | unimplemented | ~1.0Yx? | ~1.0Yx? | wip, address cheri-like ptr tagging |
9 | -fsanitize=cfi | unimplemented | 1x | ~1x | forward edge ctrl flow protection |
10 | -fsanitize=safe-stack | unimplemented | 1x | ~1x | backward edge ctrl flow protection |
11 | -fsanitize=shadow-call-stack | unimplemented | 1x | ~1x | backward edge ctrl flow protection |
Sanitizers 1-6 are recommended for testing purpose and 7-11 for production by LLVM. Memory and slowdown numbers are only reported for LLVM sanitizers. Zig does not report own numbers yet (2025-01-11). Slowdown for dynamic sanitizer versions increases by a factor of 10x in contrast to the listed static usage costs. The leak sanitizer does only check for memory leaks, not other system resources. Besides various kernel specific tools to track system resources, Valgrind can be used on Posix systems for non-memory resources and Application Verifier for Windows. Address and thread sanitizers can not be combined in Clang and combined usage of the Zig implementation is limited by virtual memory usage. In Zig, aliasing can currently not be sanitized against, whereas in Clang only typed based aliasing can be sanitized without any numbers reported by LLVM yet.
Besides adjusting source code semantics via 1 sanitizers, one can do 2 own dynamic source code adjustments or use 3 tooling that use kernel APIs to trace and optionally 3.1 run-time check information or 3.2 run-time check kernel APIs and with underlying state. Kernels further may simplify access to information, for example the proc
file system simplifies access to process information.
TODO proper benchmarks
Testing is very context and use-case dependent with typical separations being between pure/impure, time-invariant/variant, accurate/approximate, hardware/simulation/software (sub)system separation from simple unit tests up to integration and end to end tests based on statistical/probability analysis and system intuition on determinstic expected behavior based on explicit or implicit requirements.
TODO tools, hardware, software, mixed hw/sw examples
Stepping Stepping is generally based on temporary substitution of the debugger target assembly with interrupt instructions (INT
on x86). Typically, afterwards and simplifying here for brevity, control is then switched by the Kernel to the debugger to do interrupt logic execution like conditional breakpoint, other logical checks or querying registers, variables based on debug information, resuming execution or dumping the complete program state. However, Kernels abstract access, typically restrict one debugger per debugee process, add custom events and make things much slower due to Interrupt Routine execution and Kernel logic execution for data flow instead of either read/write buffers and asynchronous execution done from within the debuggee and debugger as fast path (also called non-stop debugging) or instruction emulation for tracing use cases. Fast-paths via “soft interrupts” at user-specified program states and/or timeouts or cycle detection. Customization (for user-implemented Recording etc), visualization and automation of the control logic and information is in the process of implementation by RAD Debugger without tackling the core bottlenecks yet (2025-09-27). Other implementations like gdb or lldb focus on functionality, like remote debugging, portability and utilities (record and replay, etc), over performance.
TODO potential hardware improvements based on simulation
Logging and Tracing Logging is typically applied to resolve problems of long-running and (intentional) hard to introspect systems and used via persistent or temporary storage. Logging does typically follow a log level convention with compile-time and/or run-time configuration.
Tracers are used, where more user control or logic is needed, to track down problematic behavior and for short-running and (intentional) easy to introspect systems. dtrace is closest to being a cross-platform tracing solution via binary instrumentation based on debug information, but does not handle virtualization use cases yet. babeltrace is closest to being a unified (Linux) Kernel tracing solution. Accurate hardware based tracing can be done via CPU sampling used by Linux perf, Windows ETW, Macos dtrace or on barebone via frequency control and doing the respective assembly instructions. General Kernel space (less overhead or more flexible) tracing solutions are inspired by dtrace like systemtap, bcc and bpftrace and Kernels have lots of specialized tracing solutions to observe specific subsystems efficiently with a variety of application interfaces. OpenTelemetry can be used for logging, tracing and metrics of (cloud) distributed applications without storage, performance and network bandwidth concerns due to (very) verbose JSON without compression offering neither human readability nor high information density.
To my knowledge, no structured encoding of system log, trace or metrics via ontologies or based on time synchronization models (for distributed systems) exists (2025-09-25).
TODO proof read tooling, + typical memory,runtime,latency overhead https://www.blackhat.com/presentations/bh-europe-08/Beauchamp-Weston/Presentation/bh-eu-08-beauchamp-weston.pdf
Recording Recording is typically applied to investigate and eliminate problem causes of a system and realized via 1 state snapshots based on upper bound states reachability in case of non-determinism and/or 2 elimination of non-determinism via 2.1 logging non-deterministic choices and/or 2.2 logging/pre-selection of choices. Typical examples are user input recording (gui, keyboard) and Kernel input/output recording (rr, time travel debugging). One excellent example, which utilizes recording, incremental compilation and live patching, is Tomorrow Corporation Tech Demo.
Scheduling Scheduling to debug requires sufficient control over the scheduler and typically simplification methods meaning to extend time duration of synchronization areas, to simplify state like testing a sub-system with edge cases and/or using artifical synchronization between operations and/or extracting or specifying synchronization and timing relations based on scheduler configuration, hardware and empiric observations. Debuggers like gdb, lldb, WinDbg provide very clumsy and insufficiently slow ways for such functionality. To my knowledge, no models or standards for synchronization, timing relations, scheduler configuration exist or project attempting a type 1 hypervisor similar to what a SPS allows with API for debugging purposes or project to annotate and extract synchronization and timing relations between tasks for optimizing scheduler decisions and (formal) model generation (2025-09-27).
Reversal computing Reversal computing is a typical explicit tactic in programs on error paths to undo the operation and usually fairly simple without Kernel/external input/output. When Kernel/external input/output is involved, high performance code uses batching and users of more “safety”-aware languages typically utilize type system (linear/affine types in C++/Rust) or verify cleanup (frama-c in C), but usually this only covers memory and not other effects.
TODO check database integrity + kernel/database security (integrity) strategies before making baseless claims. also check “let it crash”/actor systems To my knowledge, no widely aware strategy of “in-between cleanups” besides controlled shutdown via linear setup and teardown has been proposed (2025-09-27).
TODO complexity comparison
Time-reversal computing
TODO
The following is a list of typical problems with simple solution tactics. To keep analysis simple, no virtual machine/emulator and simulation approaches are given.
clang -Werror -Weverything -fsanitize="undefined,type"
, zig -OReleaseSafe
, zig -ODebug
zig --verbose-llvm-ir test.zig
(so far without an option to store LTO artifacts)
and clang -O3 -Xclang -disable-llvm-optzns -emit-llvm -S test.c
with (if needed) LTO artifact storing via -plugin-opt=save-temps
.
Getting optimized LLVM IR works via clang -O3 -emit-llvm -S test.c
and zig -femit-llvm-ir test.zig
.
clang -fsanitize=address
, zig -ODebug/-OReleaseSafe
clang -fsanitize=address
, zig -ODebug/-OReleaseSafe
clang -fsanitize="address,undefined
, zig -ODebug/-OReleaseSafe
clang -fsanitize=undefined
, zig -ODebug/-OReleaseSafe
clang -fsanitize=address
, Zig allocator configurationclang -fsanitize=address
and ASAN_OPTIONS=detect_stack_use_after_return=1
with 1.3-2x runtime and 11MB fake stack per thread, unimplemented in Zig.
clang -fsanitize=memory
,
unimplemented in Zig for partial initialization
(implementation only checks against any initialization, if
value is used in branch and only if memory is not coerced to
different types).
-fsanitize=thread
, but Zig offers no
annotation for "intentionally racy reads and writes" via __attribute__((no_sanitize("thread")))
.
clang -fsanitize=type
,
unimplemented in Zig.
valgrind --tool=massif prog; ms_print massif.out.12345
),
for memory checks Valgrind MemCheck (valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose prog
),
for memory analysis at runtime gdb
with pwndbg
(for example using vmmap
)
or memory analysis after runtime using coredumps, meaning gcore -o $TMPDIR/process $PID
,
cat /proc/$pid/smaps > $TMPDIR/TimeMemAction.txt
or
gdb -p $pid; dump memory memory.dump 0xSTART 0xEND; hexdump -C memory.dump
.
Windows systems have for memory profiling VMMap (graphical), for memory checks
but there is also with a bunch of tooling
Windows has for memory profiling VMMap and RAMMap, DrMemory as graphical tools, for memory leaks UMDH
gflags /i prog.exe +ust; $Env=_NT_EXECUTABLE_IMAGE_PATH="url_ms_sym_server"; umdh -p:$PID -f:b4leak.log; umdh b4leak.log afterleak.log > res.diff
,
DrMemory, for memory analysis at runtime Visual Studio (Code) with "Memory Usage"
and analysis after runtime with windbg
gflags /i prog.exe +ust; WinDbgX.exe prog.exe; .dump /ma b4leak.dmp;
.opendump leak.dmp; f5; ||1s; ||.; !heap -s;
!heap -h HANDLE; !heap -p -a ADDRESS; !heap -flt s SIZE
(find stack to allocation).
valgrind --track-fds=yes prog
and on Windows with manually checking Handle, ProcessExplorer, ETW traces or automatically with proprietary solutions.
/proc/PID_OF_PROCESS
, on Windows NtQuerySystemInformation
with SYSTEM_HANDLE_INFORMATION
and SYSTEM_HANDLE_TABLE_ENTRY_INFO
,
on BSDs sysctl
, kvm
, procmap
and there exist various other
kernel specific trace options.
TSAN_OPTIONS=detect_deadlocks=1:second_deadlock_stack=1
.
ptrace(GETSIGINFO, ..)
, WaitForDebugEvent
are options to trace signals
besides kernel tracers like ktrace, dtrace or on Windows ETW, but usually it is simpler
to reproduce the behavior in a debugger with simplified code.
As it was shown before, modern languages simplify detection or elimination of memory problems and runtime detectable undefined behavior. So far undetectable undefined behavior may be automatically reduced, if backend optimizers are redesigned with according reduction APIs. Detecting miscompilations requires strict formal reasoning of executing the source code semantics or formal verification of the compiler itself, which shall not be discussed here. This leaves hardware problems, kernel problems, resource leaks, freezes, performance problems and logic problems.
TODO check