Towards an optimal debugging framework library.

This text is intended as overview of debugging techniques and motivation for uniform execution representation and setup to efficiently mix and match the appropriate technique for system level debugging with focus on statically optimizing compiler languages to keep complexity and scope limited. The author accepts the irony of such statements by "C having no ABI"/many systems in practice having no ABI, but reality is in this text simplified for brevity and sanity.
  1. Theory of debugging.
  2. Practical methods with tradeoffs.
  3. Uniform execution representation.
  4. Abstraction problems during problem isolation.
  5. Possible implementations.
  1. Theory of debugging.
  2. A program can be represented as (often non-deterministic) state machine, such that a bug is a bad transition rule between those states. It is usually assumed that the developer/user knows correct and incorrect (bad) system states and the code represents a somewhat correct model of the intended semantics. Then an execution witness are the states and state transitions encountered on a specific program run. If the execution witness shows a "bad state", then there must be a bug. Thus a debugger can be seen as query engine over states and transitions of a buggy execution witness.
    Frequent operations are bug source isolation to deterministic components, where encapsulation of non-determinism usually simplifies the process. In contrast to that, concurrent code is tricky to debug, because one needs to trace multiple execution flows to estimate where the origin of the incorrect state is.
    One can generally categorize methods into the following list (asoul)
    1. automate the process to minimize errors/oversights during debugging, against probabilistic errors, document the process etc
    2. simplify and isolate system components and changes over time
    3. observe the system while running it to trace state or state changes
    4. understand the expected and actual code semantics to the degree
    5. learn, extend and ensure how and which system invariants are satisfied
    6. necessary from of the involved systems (for example userspace processes, kernel, build system, compiler, source code, linker, object code, assembly, hardware etc)
    with the fundamental constrains being (feel)
    1. finding out correct sytem components semantics
    2. [e]ensuring deterministic reproducibility of the problem
    3. limited time and effort
    Common debugging methods to feel a soul with various tradeoffs from compile-time to runtime debugging less to more run-time data collection:
    1. Formal Verification as ahead or compile-time invariant resolving.
    2. Validation as runtime invariant checks.
    3. Testing as sample based runtime invariant checks.
    4. Stepping via "classical debugger" to manipulate task execution context, manipulate memory optionally via source code location translation via REPL commands, graphically, scripting or (rarely) freely programmable.
    5. Logging as dumping (a simplification of) state with context from bugs (usually timestamps in production systems).
    6. Tracing as dumping (a simplification of) runtime behavior via temporal relations (usually timestamps).
    7. Recording Encoded dumping of runtime to replay runtime with before specified time and state determinism.
    Simplification and isolation means to apply the meaning of both words on all potential sub-components including, but not limited to hardware, code versioning including dependencies, source system, compiler framework and target system. Typical methods are
    1. Bisection via git or the actual binaries.
    2. Reduction via removal of system parts or trying to reproduce with (a minimal) example.
    3. Statistical analysis from collected data on how the problem manifests on given environment(s) etc.
    Debugging is domain- and design-specific and relies on core component(s) of the to be debugged system to provide necessary debug functionality. For example, software based hardware debugging relies on interfaces to the hardware like JTAG, Kernel debugging on Kernel compilation or configuration and elevated (user), userspace debugging on process and user permissions, system configuration or a child process to be debugged on Posix systems via ptrace.
  3. Practical methods with tradeoffs.
  4. Usually semantics are not "set into stone" inclusive or do not offer sufficient tradeoffs, so formal verification is rarely an option aside of usage of models as design and planning tool. Depending on the domain and environment, problematic behavior of hardware or software components must be to be more or less 1. avoided and 2. traceable and there exist various (domain) metrics as decision helper. Very well designed systems explain users how to debug bugs regarding to functional behavior, time behavior with internal and external system resources themself up to the degree the system usage and task execution correctness is intended. Access restrictions limit or rule out stepping, whereas storage limitations limit or rule out logging, tracing and recording. TODO: requirements on system design for formal verification vs debugging. no surprise rule: core system enabling debugging (in any form) must be correct to the degree necessary. TODO: good argumentation on ignoring linker speak, language footguns etc.
    1. Bugs related to functional behavior.
    2. Bugs related to time behavior.
    3. Internal and external system resources.
    1. Debugging hard(ware) problems
    2. Hardware design reviews with extensive focus on core components (power, battery, periphery, busses, memory/flash and debug/test infrastructure) to enable debugging and component tests against product and assembling defects. TODO Elimination or mitigation of time channels, formal methods? attack fuzzing? software toggles?
    3. Kernel and platform problems.
    4. The managing environment the code is running on can vary a lot. As example, the typical four phases of the Linux boot process (system startup, bootloader stage, kernel stage, and init process) have each their own debugging infrastructure and methods. Generally, working with (introspection-restricted) platforms requires 1. reverse engineering and "trying to find info" and/or 2. "use some tracing tool" and for 3. open source "adjust the source and stare at kernel dumps/use debugger". Kernels are rarely designed for tracing, recording or formal verification due to internal complexity and virtualization is slow and hides many classes of synchronization bugs.
    5. Detectable Undefined Behavior
    6. Compiler and runtime sanitizers
      1. C
      2. C++
      3. Zig with -OReleaseSafe turns "undefined behavior" into runtime-checked disallowed behavior except for
        1. TODO
        2. TODO
        3. TODO
        4. TODO
    7. Undetectable Undefined Behavior
    8. Staring at source code, backend intermediate representation like LLVM IR and reducing the problem or resulting assembly. Unfortunately the backend optimizers like LLVM do not offer frontend language writers debug APIs and related tooling due to not being designed for that purpose.
    9. Miscompilations
    10. Tools like Miri or Cerberus run the program in an interpreter, but may not cover all possible program semantics due to ambiguity and may not be feasible, so the only good chance is to reduce it.
    11. Memory problems sanitizers, validators, simulator, tracers: TODO which, configurability and costs
      1. out-of-bounds access
      2. sanitizer
      3. null pointer dereference
      4. sanitizer
      5. type confusion
      6. sanitizer
      7. integer overflow
      8. sanitizer
      9. use after free
      10. sanitizer
      11. invalid stack access
      12. sanitizer
      13. usage of uninitialized memory
      14. sanitizer
      15. data races
      16. sanitizer
    12. Resource leaks (Freestanding/Kernel)
    13. sanitizer
    14. Freezes (deadlocks, softlocks, signal safety, unbounded loops etc)
    15. sanitizer, validator
    16. Performance problems
    17. simulator, tracer
    18. Logic problems of software systems can be described as problems related to incorrectly applied logic of how the code is solving the intended and follow-up problems ignoring hardware problems, kernel problems, different types of UB, miscompilations memory problems, resource leaks, freezes and performance issues.
      1. (temporary) inconstency of state (relations)
      2. incorrect math ie for edge cases
      3. incorrect modeling of external and internal state and synchronization
      4. incorrect protocol handling
      5. insufficient handling of or the software requirements themself
      The source of these problems are usually
      1. incorrect constrains on the design, meaning how the different parts should interact and work towards the goals for the use cases
      2. unclear, unspecified or incorrectly assumed hardware or software guarantees by components
      3. implementation oversights, unintended use cases, unfeasibility of a general solution due to time and/or money constrains
      Formal modeling of the design, model checking, code review, writing tests for edge cases or runtime validation are typically used with best practice being to write code in a risk-aware, testable and debuggable way. The methods and scope are here very wide and very domain and use case specific, so no general or short recommendation can be made.
    TODOs:
    1. Tooling and performance tradeoffs.
    2. minimal descriptions for C, Rust, Zig; Posix, Linux, Windows
    Ideally, only the system behavior and interactions with domain and use-case specific parts need cognitive load from the programmer, whereas the other error classes have standard approaches to isolate and eliminate. Unifying debug tooling simplifies usage for bigger developer productivity and exposing as library allows to automate this process.
  5. Uniform execution representation.
  6. As it was shown before, modern languages simplify detection or elimination of memory problems and runtime detectable undefined behavior. So far undetectable undefined behavior may be detected, if backend optimizers are redesignede with according APIs. Detecting miscompilations requires strict formal reasoning of executing the source code semantics or formal verification of the compiler itself, which shall not be discussed here. This leaves hardware problems, kernel problems, resource leaks, freezes, performance problems and logic problems. TODO: what they have in common + motivation TODO: Uniform execution representation and queries over program execution.
  7. Abstraction problems during problem isolation.
  8. TODO: origin detection, isolation and abstraction
  9. Possible implementations.
  10. TODO: (query system data vs modify the system vs other) to validate approaches; Program modification and validation language, query language and alternatives.