mirror of
https://github.com/openshift/installer.git
synced 2026-02-05 06:46:36 +01:00
121 lines
3.8 KiB
Markdown
121 lines
3.8 KiB
Markdown
|
|
# RR (Record and Replay) Debugging
|
||
|
|
|
||
|
|
This document describes the implementation of RR (Record and Replay) debugging capabilities in the OpenShift installer's Cluster API system. RR is a powerful debugging tool that allows you to record program execution and then replay it deterministically for debugging purposes.
|
||
|
|
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The implementation includes modifications to enable RR debugging for Cluster API controllers. This allows developers to:
|
||
|
|
|
||
|
|
1. Record the execution of Cluster API controllers with deterministic replay
|
||
|
|
2. Debug complex timing-dependent issues that are difficult to reproduce
|
||
|
|
3. Step through execution multiple times with identical behavior
|
||
|
|
4. Analyze race conditions and concurrency issues
|
||
|
|
|
||
|
|
## Implementation Details
|
||
|
|
|
||
|
|
### Key Changes
|
||
|
|
|
||
|
|
The implementation modifies two main components:
|
||
|
|
|
||
|
|
1. **Process Management** (`pkg/clusterapi/internal/process/process.go`)
|
||
|
|
2. **System Controller** (`pkg/clusterapi/system.go`)
|
||
|
|
|
||
|
|
### Process Management Modifications
|
||
|
|
|
||
|
|
The process management system has been modified to:
|
||
|
|
|
||
|
|
- Use process group signaling (`syscall.Kill(-ps.Cmd.Process.Pid, syscall.SIGTERM)`) instead of direct process signaling
|
||
|
|
- Add enhanced logging for process exit states
|
||
|
|
- Temporarily disable timeout-based process termination for debugging purposes
|
||
|
|
|
||
|
|
### Controller Execution Modifications
|
||
|
|
|
||
|
|
The controller execution system has been modified to:
|
||
|
|
|
||
|
|
- Replace capi provider with `rr` for CPU binding and execution recording
|
||
|
|
- Add RR-specific flags for optimal debugging:
|
||
|
|
- `--wait`: Ensures RR waits for the recorded process
|
||
|
|
- `--disable-avx-512`: Disables AVX-512 instructions for compatibility
|
||
|
|
- `--bind-to-cpu=0`: Binds execution to CPU 0 for deterministic behavior (or CPUs with P and E cores)
|
||
|
|
|
||
|
|
## Setup
|
||
|
|
|
||
|
|
### Installing RR
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# On Fedora/RHEL/CentOS
|
||
|
|
sudo dnf install rr
|
||
|
|
|
||
|
|
# On Ubuntu/Debian
|
||
|
|
sudo apt install rr
|
||
|
|
```
|
||
|
|
|
||
|
|
### Installing Delve
|
||
|
|
|
||
|
|
```bash
|
||
|
|
go install github.com/go-delve/delve/cmd/dlv@latest
|
||
|
|
```
|
||
|
|
### Applying the Patch
|
||
|
|
|
||
|
|
Apply the RR debugging patch to enable RR recording:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Apply the patch (patch file location: docs/dev/rr-debugging.patch)
|
||
|
|
git apply docs/dev/rr-debugging.patch
|
||
|
|
```
|
||
|
|
|
||
|
|
### Building the Installer
|
||
|
|
|
||
|
|
Build the installer with RR debugging enabled:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
MODE=dev TAGS="release" ./hack/build.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Recording Execution
|
||
|
|
|
||
|
|
`rr` requires these kernel parameters to trace:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo sysctl kernel.perf_event_paranoid=-1;
|
||
|
|
sudo sysctl kernel.kptr_restrict=0
|
||
|
|
```
|
||
|
|
|
||
|
|
When running the installer with RR debugging enabled, Cluster API controllers will automatically be recorded using RR. The recording process:
|
||
|
|
|
||
|
|
1. Captures all system calls, memory accesses, and timing information
|
||
|
|
2. Stores the trace in `~/.local/share/rr/latest-trace`
|
||
|
|
3. Maintains deterministic replay capability
|
||
|
|
|
||
|
|
### Replaying with Delve
|
||
|
|
|
||
|
|
To replay a recorded trace using Delve (dlv), use the following command:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
dlv replay --listen=:2345 --headless=true --api-version=2 --accept-multiclient ~/.local/share/rr/latest-trace
|
||
|
|
```
|
||
|
|
|
||
|
|
**Command Options:**
|
||
|
|
- `--listen=:2345`: Listens on port 2345 for debugger connections
|
||
|
|
- `--headless=true`: Runs in headless mode without requiring a terminal
|
||
|
|
- `--api-version=2`: Uses Delve API version 2
|
||
|
|
- `--accept-multiclient`: Allows multiple debugger clients to connect
|
||
|
|
- `~/.local/share/rr/latest-trace`: Path to the recorded RR trace
|
||
|
|
|
||
|
|
### Connecting a Debugger
|
||
|
|
|
||
|
|
After starting the replay session, you can connect your preferred debugger:
|
||
|
|
|
||
|
|
- **VS Code**: Configure launch.json to connect to `localhost:2345`
|
||
|
|
- **GoLand/IntelliJ**: [Use the remote debugging configuration](https://www.jetbrains.com/help/go/attach-to-running-go-processes-with-debugger.html#attach-to-a-process-on-a-remote-machine)
|
||
|
|
- **Command line**: Use `dlv connect localhost:2345`
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- [RR Documentation](https://rr-project.org/)
|
||
|
|
- [Delve Documentation](https://github.com/go-delve/delve)
|
||
|
|
- [Cluster API Documentation](https://cluster-api.sigs.k8s.io/)
|