How trace sampling affects production debugging, including head sampling, parent-based sampling, tail sampling trade-offs, error traces, rare paths, async work, and metrics that show whether sampling is hiding incidents.
Observability vs logging in production, with a practical guide to when logs, metrics, traces, and correlation IDs answer different debugging questions.
How to run background jobs safely in production with replay-safe handlers, bounded retries, dead-letter triage, visibility timeouts, queue dashboards, and business-level correctness checks.
Why request timeouts limit waiting but do not stop cascading failures unless they are paired with admission control, bounded queues, backpressure, and load shedding.