1
0
mirror of https://github.com/openshift/installer.git synced 2026-02-05 15:47:14 +01:00
Files
installer/docs/dev/rr-debugging.md
Joseph Callen fc462861f8 dev docs: add rr capi debugging
This is some quick docs to allow the
use of `rr` to troubleshoot and trace
difficult cluster api provider problems.

Assisted-by: cursor
2025-08-14 15:39:29 -04:00

121 lines
3.8 KiB
Markdown

# RR (Record and Replay) Debugging
This document describes the implementation of RR (Record and Replay) debugging capabilities in the OpenShift installer's Cluster API system. RR is a powerful debugging tool that allows you to record program execution and then replay it deterministically for debugging purposes.
## Overview
The implementation includes modifications to enable RR debugging for Cluster API controllers. This allows developers to:
1. Record the execution of Cluster API controllers with deterministic replay
2. Debug complex timing-dependent issues that are difficult to reproduce
3. Step through execution multiple times with identical behavior
4. Analyze race conditions and concurrency issues
## Implementation Details
### Key Changes
The implementation modifies two main components:
1. **Process Management** (`pkg/clusterapi/internal/process/process.go`)
2. **System Controller** (`pkg/clusterapi/system.go`)
### Process Management Modifications
The process management system has been modified to:
- Use process group signaling (`syscall.Kill(-ps.Cmd.Process.Pid, syscall.SIGTERM)`) instead of direct process signaling
- Add enhanced logging for process exit states
- Temporarily disable timeout-based process termination for debugging purposes
### Controller Execution Modifications
The controller execution system has been modified to:
- Replace capi provider with `rr` for CPU binding and execution recording
- Add RR-specific flags for optimal debugging:
- `--wait`: Ensures RR waits for the recorded process
- `--disable-avx-512`: Disables AVX-512 instructions for compatibility
- `--bind-to-cpu=0`: Binds execution to CPU 0 for deterministic behavior (or CPUs with P and E cores)
## Setup
### Installing RR
```bash
# On Fedora/RHEL/CentOS
sudo dnf install rr
# On Ubuntu/Debian
sudo apt install rr
```
### Installing Delve
```bash
go install github.com/go-delve/delve/cmd/dlv@latest
```
### Applying the Patch
Apply the RR debugging patch to enable RR recording:
```bash
# Apply the patch (patch file location: docs/dev/rr-debugging.patch)
git apply docs/dev/rr-debugging.patch
```
### Building the Installer
Build the installer with RR debugging enabled:
```bash
MODE=dev TAGS="release" ./hack/build.sh
```
## Usage
### Recording Execution
`rr` requires these kernel parameters to trace:
```bash
sudo sysctl kernel.perf_event_paranoid=-1;
sudo sysctl kernel.kptr_restrict=0
```
When running the installer with RR debugging enabled, Cluster API controllers will automatically be recorded using RR. The recording process:
1. Captures all system calls, memory accesses, and timing information
2. Stores the trace in `~/.local/share/rr/latest-trace`
3. Maintains deterministic replay capability
### Replaying with Delve
To replay a recorded trace using Delve (dlv), use the following command:
```bash
dlv replay --listen=:2345 --headless=true --api-version=2 --accept-multiclient ~/.local/share/rr/latest-trace
```
**Command Options:**
- `--listen=:2345`: Listens on port 2345 for debugger connections
- `--headless=true`: Runs in headless mode without requiring a terminal
- `--api-version=2`: Uses Delve API version 2
- `--accept-multiclient`: Allows multiple debugger clients to connect
- `~/.local/share/rr/latest-trace`: Path to the recorded RR trace
### Connecting a Debugger
After starting the replay session, you can connect your preferred debugger:
- **VS Code**: Configure launch.json to connect to `localhost:2345`
- **GoLand/IntelliJ**: [Use the remote debugging configuration](https://www.jetbrains.com/help/go/attach-to-running-go-processes-with-debugger.html#attach-to-a-process-on-a-remote-machine)
- **Command line**: Use `dlv connect localhost:2345`
## References
- [RR Documentation](https://rr-project.org/)
- [Delve Documentation](https://github.com/go-delve/delve)
- [Cluster API Documentation](https://cluster-api.sigs.k8s.io/)