Fleshed out documentation
This commit is contained in:
@@ -0,0 +1,9 @@
|
||||
# Runbooks
|
||||
|
||||
Operational procedures live here. Each new production-impacting workflow should add or update a runbook.
|
||||
|
||||
Existing runbooks:
|
||||
|
||||
- `docs/runbooks/auth_otp_failures.md`
|
||||
- `docs/runbooks/booking_failures.md`
|
||||
- `docs/runbooks/payments_sanity_check.md`
|
||||
@@ -0,0 +1,39 @@
|
||||
# Runbook: Auth OTP Failures
|
||||
|
||||
## Summary
|
||||
|
||||
Guide for diagnosing and mitigating OTP send or verify failures in phone-first authentication.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Users report not receiving OTP codes.
|
||||
- `/api/auth/otp/request/` or `/api/auth/phone/request/` returns HTTP 500 or rate-limit errors.
|
||||
- `/api/auth/otp/verify/` or `/api/auth/phone/verify/` returns invalid or expired OTP errors unexpectedly.
|
||||
|
||||
## Impact
|
||||
|
||||
- Users cannot sign in or complete phone verification.
|
||||
- Booking and payment flows are blocked when auth is required.
|
||||
|
||||
## Quick Checks
|
||||
|
||||
- Confirm the provider configured in `backend/salon_api/settings.py` via `OTP_PROVIDER`.
|
||||
- Check recent application logs for OTP send errors.
|
||||
- Verify provider credentials are present in `backend/.env` for the active provider.
|
||||
|
||||
## Mitigation Steps
|
||||
|
||||
- If provider credentials are missing or invalid, fix the environment variables and restart the API process.
|
||||
- If the provider is down, temporarily switch to `OTP_PROVIDER=console` for non-production environments and notify support.
|
||||
- If rate limits are triggered, validate `OTP_MAX_PER_WINDOW`, `OTP_WINDOW_MINUTES`, and `OTP_RESEND_COOLDOWN_SECONDS` values and confirm client behavior is not retrying aggressively.
|
||||
- If verification is failing, confirm server time is correct and `OTP_EXPIRY_MINUTES` is appropriate.
|
||||
|
||||
## Rollback / Escalation
|
||||
|
||||
- Roll back recent auth/OTP changes if the failure coincides with a deployment.
|
||||
- Escalate to the provider (Authentica) with request IDs and timestamps if external API errors persist.
|
||||
|
||||
## Notes
|
||||
|
||||
- Authentica is the primary OTP provider for MVP; console provider is for local development.
|
||||
- OTP send/verify logic lives in `backend/apps/accounts/services/otp.py`.
|
||||
@@ -0,0 +1,40 @@
|
||||
# Runbook: Booking Failures
|
||||
|
||||
## Summary
|
||||
|
||||
Guide for diagnosing booking creation or status update failures (availability, overlap prevention, or validation errors).
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `POST /api/bookings/` returns HTTP 400 or 500.
|
||||
- `PATCH /api/bookings/<id>/` fails when confirming or cancelling.
|
||||
- Users report bookings not appearing or incorrect status.
|
||||
|
||||
## Impact
|
||||
|
||||
- Customers cannot place bookings.
|
||||
- Staff schedules become inconsistent.
|
||||
- Notification and payment flows may not trigger.
|
||||
|
||||
## Quick Checks
|
||||
|
||||
- Confirm the request payload includes a valid `service`, `staff`, and scheduled time.
|
||||
- Check server logs for booking validation errors or integrity exceptions.
|
||||
- Verify that staff availability and overlap prevention rules are behaving as expected.
|
||||
|
||||
## Mitigation Steps
|
||||
|
||||
- Reproduce with a known test user and staff member to isolate data issues.
|
||||
- If overlap rules are too strict, review booking validation logic and confirm time zone assumptions.
|
||||
- If status updates are blocked, verify role checks and serializer permissions in `backend/apps/bookings/`.
|
||||
- If notifications are expected but missing, confirm `NOTIFICATION_PROVIDER` configuration and notification records.
|
||||
|
||||
## Rollback / Escalation
|
||||
|
||||
- Roll back recent booking-related changes if failures started after a deployment.
|
||||
- Escalate to engineering with the booking ID, user ID, and timestamps.
|
||||
|
||||
## Notes
|
||||
|
||||
- Booking validation and status transitions live in `backend/apps/bookings/`.
|
||||
- Notifications for booking lifecycle are handled in `backend/apps/notifications/`.
|
||||
Reference in New Issue
Block a user