chore: condense all docs and markdown files
This commit is contained in:
@@ -1,40 +1,34 @@
|
||||
# Runbook: Auth OTP Failures
|
||||
|
||||
## Summary
|
||||
|
||||
Guide for diagnosing and mitigating OTP send or verify failures in phone-first authentication.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Users report not receiving OTP codes.
|
||||
- `/api/auth/otp/request/` or `/api/auth/phone/request/` returns HTTP 500 or rate-limit errors.
|
||||
- `/api/auth/otp/verify/` or `/api/auth/phone/verify/` returns invalid or expired OTP errors unexpectedly.
|
||||
- Users do not receive OTP.
|
||||
- `/api/auth/otp/request` or `/api/auth/phone/request` fails.
|
||||
- `/api/auth/otp/verify` or `/api/auth/phone/verify` shows invalid/expired unexpectedly.
|
||||
|
||||
## Impact
|
||||
|
||||
- Users cannot sign in or complete phone verification.
|
||||
- Booking and payment flows are blocked when auth is required.
|
||||
Users cannot sign in/verify phone; booking/payment flows may block.
|
||||
|
||||
## Quick Checks
|
||||
- Confirm `OTP_PROVIDER` in `backend/salon_api/settings.py`.
|
||||
- Check OTP provider credentials in `backend/.env`.
|
||||
- Check app logs for provider/timeouts/rate-limit errors.
|
||||
- Validate OTP rate-limit settings:
|
||||
- `OTP_MAX_PER_WINDOW`
|
||||
- `OTP_WINDOW_MINUTES`
|
||||
- `OTP_RESEND_COOLDOWN_SECONDS`
|
||||
- `PHONE_AUTH_IP_MAX_PER_WINDOW`
|
||||
- `PHONE_AUTH_DEVICE_MAX_PER_WINDOW`
|
||||
|
||||
- Confirm the provider configured in `backend/salon_api/settings.py` via `OTP_PROVIDER`.
|
||||
- Check recent application logs for OTP send errors.
|
||||
- Verify provider credentials are present in `backend/.env` for the active provider.
|
||||
## Mitigation
|
||||
1. Fix env/config mismatch; restart API.
|
||||
2. If provider outage, use `console` only in non-prod.
|
||||
3. If abuse spike/false positives, tune IP/device thresholds.
|
||||
4. Verify server clock and `OTP_EXPIRY_MINUTES`.
|
||||
|
||||
## Mitigation Steps
|
||||
## Escalation
|
||||
- Roll back recent auth changes if correlated with deployment.
|
||||
- Escalate to Authentica with request IDs + timestamps.
|
||||
|
||||
- If provider credentials are missing or invalid, fix the environment variables and restart the API process.
|
||||
- If the provider is down, temporarily switch to `OTP_PROVIDER=console` for non-production environments and notify support.
|
||||
- If rate limits are triggered, validate `OTP_MAX_PER_WINDOW`, `OTP_WINDOW_MINUTES`, and `OTP_RESEND_COOLDOWN_SECONDS` values and confirm client behavior is not retrying aggressively.
|
||||
- For phone-login abuse spikes, also validate `PHONE_AUTH_IP_MAX_PER_WINDOW`, `PHONE_AUTH_DEVICE_MAX_PER_WINDOW`, and `PHONE_AUTH_RISK_WINDOW_MINUTES`.
|
||||
- If verification is failing, confirm server time is correct and `OTP_EXPIRY_MINUTES` is appropriate.
|
||||
|
||||
## Rollback / Escalation
|
||||
|
||||
- Roll back recent auth/OTP changes if the failure coincides with a deployment.
|
||||
- Escalate to the provider (Authentica) with request IDs and timestamps if external API errors persist.
|
||||
|
||||
## Notes
|
||||
|
||||
- Authentica is the primary OTP provider for MVP; console provider is for local development.
|
||||
- OTP send/verify logic lives in `backend/apps/accounts/services/otp.py`.
|
||||
## References
|
||||
- OTP logic: `backend/apps/accounts/services/otp.py`
|
||||
- Risks: `docs/risks.md`
|
||||
|
||||
Reference in New Issue
Block a user