Towards an optimal debugging framework library.
This text is intended as overview of debugging techniques and motivation for
uniform execution representation and setup to efficiently mix and match the
appropriate technique for system level debugging with focus on statically
optimizing compiler languages to keep complexity and scope limited.
The author accepts the irony of such statements by "C having no ABI"/many
systems in practice having no ABI, but reality is in this text simplified for
brevity and sanity.
- Theory of debugging.
- Practical methods with tradeoffs.
- Uniform execution representation.
- Abstraction problems during problem isolation.
- Possible implementations.
- Theory of debugging.
A program
can be represented as (often non-deterministic) state machine,
such that a bug is a bad transition rule between those states.
It is usually assumed that the developer/user knows correct and incorrect
(bad) system states and the code represents a somewhat correct model of
the intended semantics.
Then an execution witness are the states and state transitions encountered
on a specific program run. If the execution witness shows a "bad state",
then there must be a bug.
Thus a debugger can be seen as query engine over states and transitions of
a buggy execution witness.
Frequent operations are bug source isolation to deterministic components,
where encapsulation of non-determinism usually simplifies the process.
In contrast to that, concurrent code is tricky to debug, because one
needs to trace multiple execution flows to estimate where the origin of the
incorrect state is.
One can generally categorize methods into the following list (asoul)
- automate the process to minimize errors/oversights during debugging,
against probabilistic errors, document the process etc
- simplify and isolate system components and changes over time
- observe the system while running it to trace state or state changes
- understand the expected and actual code semantics to the degree
- learn, extend and ensure how and which system invariants are satisfied
necessary from of the involved systems (for example userspace
processes, kernel, build system, compiler, source code, linker,
object code, assembly, hardware etc)
with the fundamental constrains being (feel)
- finding out correct sytem components semantics
- [e]ensuring deterministic reproducibility of the problem
- limited time and effort
Common debugging methods to feel a soul with various tradeoffs
from compile-time to runtime debugging less to more run-time data
collection:
- Formal Verification as ahead or compile-time invariant resolving.
- Validation as runtime invariant checks.
- Testing as sample based runtime invariant checks.
- Stepping via "classical debugger" to manipulate task execution
context, manipulate memory optionally via source code location translation
via REPL commands, graphically, scripting or (rarely) freely programmable.
- Logging as dumping (a simplification of) state with context
from bugs (usually timestamps in production systems).
- Tracing as dumping (a simplification of) runtime behavior
via temporal relations (usually timestamps).
- Recording Encoded dumping of runtime to replay runtime with
before specified time and state determinism.
Simplification and isolation means to apply the meaning of both words on
all potential sub-components including, but not limited to
hardware, code versioning including dependencies, source system,
compiler framework and target system. Typical methods are
- Bisection via git or the actual binaries.
- Reduction via removal of system parts or trying to reproduce
with (a minimal) example.
- Statistical analysis from collected data on how the problem
manifests on given environment(s) etc.
Debugging is domain- and design-specific and relies on core component(s)
of the to be debugged system to provide necessary debug functionality.
For example, software based hardware debugging relies on interfaces to
the hardware like JTAG, Kernel debugging on Kernel compilation or
configuration and elevated (user), userspace debugging on process and
user permissions, system configuration or a child process to be debugged
on Posix systems via ptrace.
- Practical methods with tradeoffs.
Usually semantics are not "set into stone" inclusive or do not offer
sufficient tradeoffs, so formal verification is rarely an option aside of
usage of models as design and planning tool.
Depending on the domain and environment, problematic behavior of hardware
or software components must be to be more or less 1. avoided and 2. traceable
and there exist various (domain) metrics as decision helper.
Very well designed systems explain users how to debug bugs regarding to
functional behavior, time behavior with internal and
external system resources themself up to the degree the system usage and
task execution correctness is intended.
Access restrictions limit or rule out stepping, whereas storage limitations
limit or rule out logging, tracing and recording.
TODO: requirements on system design for formal verification vs debugging.
no surprise rule: core system enabling debugging (in any form) must be correct
to the degree necessary.
TODO: good argumentation on ignoring linker speak, language footguns etc.
- Bugs related to functional behavior.
- Bugs related to time behavior.
- Internal and external system resources.
- Debugging hard(ware) problems
Hardware design reviews with extensive focus on core components
(power, battery, periphery, busses, memory/flash and debug/test infrastructure)
to enable debugging and component tests against product and assembling defects.
TODO Elimination or mitigation of time channels, formal methods? attack fuzzing?
software toggles?
- Kernel and platform problems.
The managing environment the code is running on can vary a lot.
As example, the typical four phases of the Linux boot process
(system startup, bootloader stage, kernel stage, and init process)
have each their own debugging infrastructure and methods.
Generally, working with (introspection-restricted) platforms requires
1. reverse engineering and "trying to find info" and/or 2. "use some tracing
tool" and for 3. open source "adjust the source and stare at kernel
dumps/use debugger".
Kernels are rarely designed for tracing, recording or formal
verification due to internal complexity and virtualization is slow and
hides many classes of synchronization bugs.
- Detectable Undefined Behavior
Compiler and runtime sanitizers
- C
- C++
- Zig with -OReleaseSafe turns "undefined behavior" into
runtime-checked disallowed behavior except for
- TODO
- TODO
- TODO
- TODO
- Undetectable Undefined Behavior
Staring at source code, backend intermediate representation like LLVM
IR and reducing the problem or resulting assembly. Unfortunately the
backend optimizers like LLVM do not offer frontend language writers
debug APIs and related tooling due to not being designed for that
purpose.
- Miscompilations
Tools like Miri or Cerberus run the program in an interpreter,
but may not cover all possible program semantics due to ambiguity
and may not be feasible, so the only good chance is to reduce it.
- Memory problems
sanitizers, validators, simulator, tracers: TODO which, configurability and costs
- out-of-bounds access
sanitizer
- null pointer dereference
sanitizer
- type confusion
sanitizer
- integer overflow
sanitizer
- use after free
sanitizer
- invalid stack access
sanitizer
- usage of uninitialized memory
sanitizer
- data races
sanitizer
- Resource leaks (Freestanding/Kernel)
sanitizer
- Freezes (deadlocks, softlocks, signal safety, unbounded loops etc)
sanitizer, validator
- Performance problems
simulator, tracer
- Logic problems of software systems can be described as problems
related to incorrectly applied logic of how the code is solving the
intended and follow-up problems ignoring hardware problems, kernel
problems, different types of UB, miscompilations memory problems,
resource leaks, freezes and performance issues.
- (temporary) inconstency of state (relations)
- incorrect math ie for edge cases
- incorrect modeling of external and internal state and synchronization
- incorrect protocol handling
- insufficient handling of or the software requirements themself
The source of these problems are usually
- incorrect constrains on the design, meaning how the different
parts should interact and work towards the goals for the use
cases
- unclear, unspecified or incorrectly assumed hardware or software
guarantees by components
- implementation oversights, unintended use cases, unfeasibility
of a general solution due to time and/or money constrains
Formal modeling of the design, model checking, code review, writing
tests for edge cases or runtime validation are typically used with
best practice being to write code in a risk-aware, testable and
debuggable way. The methods and scope are here very wide and very
domain and use case specific, so no general or short recommendation
can be made.
TODOs:
- Tooling and performance tradeoffs.
- minimal descriptions for C, Rust, Zig; Posix, Linux, Windows
Ideally, only the system behavior and interactions with domain and
use-case specific parts need cognitive load from the programmer, whereas
the other error classes have standard approaches to isolate and eliminate.
Unifying debug tooling simplifies usage for bigger developer productivity
and exposing as library allows to automate this process.
- Uniform execution representation.
As it was shown before, modern languages simplify detection or elimination of
memory problems and runtime detectable undefined behavior.
So far undetectable undefined behavior may be detected, if backend optimizers
are redesignede with according APIs.
Detecting miscompilations requires strict formal reasoning of executing
the source code semantics or formal verification of the compiler
itself, which shall not be discussed here.
This leaves hardware problems, kernel problems, resource leaks, freezes,
performance problems and logic problems.
TODO: what they have in common + motivation
TODO: Uniform execution representation and queries over program execution.
- Abstraction problems during problem isolation.
TODO: origin detection, isolation and abstraction
- Possible implementations.
TODO: (query system data vs modify the system vs other) to validate approaches;
Program modification and validation language, query language and alternatives.