How to plan database migration rollback in production with compatibility windows, forward fixes, backfill recovery, and clear stop points before destructive schema changes.
How database connection pool exhaustion happens under production load, how to distinguish pool wait from slow SQL, and how to size pools across instances without overloading PostgreSQL.
How retry budgets keep microservice retries useful without letting clients amplify overload, including per-request limits, client retry ratios, token buckets, retry metadata, and production metrics.
Observability vs logging in production, with a practical guide to when logs, metrics, traces, and correlation IDs answer different debugging questions.
A practical OpenTelemetry guide for backend engineers: what to instrument first, how traces, metrics, logs, context propagation, attributes, sampling, and collectors make production debugging clearer.