chore: condense all docs and markdown files
This commit is contained in:
@@ -1,9 +1,8 @@
|
||||
# Runbooks
|
||||
|
||||
Operational procedures live here. Each new production-impacting workflow should add or update a runbook.
|
||||
|
||||
Existing runbooks:
|
||||
# Runbooks Index
|
||||
|
||||
Runbooks for production-impacting flows:
|
||||
- `docs/runbooks/auth_otp_failures.md`
|
||||
- `docs/runbooks/booking_failures.md`
|
||||
- `docs/runbooks/payments_sanity_check.md`
|
||||
|
||||
Rule: if a new production flow is added, add or update a runbook in same change.
|
||||
|
||||
@@ -1,40 +1,34 @@
|
||||
# Runbook: Auth OTP Failures
|
||||
|
||||
## Summary
|
||||
|
||||
Guide for diagnosing and mitigating OTP send or verify failures in phone-first authentication.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Users report not receiving OTP codes.
|
||||
- `/api/auth/otp/request/` or `/api/auth/phone/request/` returns HTTP 500 or rate-limit errors.
|
||||
- `/api/auth/otp/verify/` or `/api/auth/phone/verify/` returns invalid or expired OTP errors unexpectedly.
|
||||
- Users do not receive OTP.
|
||||
- `/api/auth/otp/request` or `/api/auth/phone/request` fails.
|
||||
- `/api/auth/otp/verify` or `/api/auth/phone/verify` shows invalid/expired unexpectedly.
|
||||
|
||||
## Impact
|
||||
|
||||
- Users cannot sign in or complete phone verification.
|
||||
- Booking and payment flows are blocked when auth is required.
|
||||
Users cannot sign in/verify phone; booking/payment flows may block.
|
||||
|
||||
## Quick Checks
|
||||
- Confirm `OTP_PROVIDER` in `backend/salon_api/settings.py`.
|
||||
- Check OTP provider credentials in `backend/.env`.
|
||||
- Check app logs for provider/timeouts/rate-limit errors.
|
||||
- Validate OTP rate-limit settings:
|
||||
- `OTP_MAX_PER_WINDOW`
|
||||
- `OTP_WINDOW_MINUTES`
|
||||
- `OTP_RESEND_COOLDOWN_SECONDS`
|
||||
- `PHONE_AUTH_IP_MAX_PER_WINDOW`
|
||||
- `PHONE_AUTH_DEVICE_MAX_PER_WINDOW`
|
||||
|
||||
- Confirm the provider configured in `backend/salon_api/settings.py` via `OTP_PROVIDER`.
|
||||
- Check recent application logs for OTP send errors.
|
||||
- Verify provider credentials are present in `backend/.env` for the active provider.
|
||||
## Mitigation
|
||||
1. Fix env/config mismatch; restart API.
|
||||
2. If provider outage, use `console` only in non-prod.
|
||||
3. If abuse spike/false positives, tune IP/device thresholds.
|
||||
4. Verify server clock and `OTP_EXPIRY_MINUTES`.
|
||||
|
||||
## Mitigation Steps
|
||||
## Escalation
|
||||
- Roll back recent auth changes if correlated with deployment.
|
||||
- Escalate to Authentica with request IDs + timestamps.
|
||||
|
||||
- If provider credentials are missing or invalid, fix the environment variables and restart the API process.
|
||||
- If the provider is down, temporarily switch to `OTP_PROVIDER=console` for non-production environments and notify support.
|
||||
- If rate limits are triggered, validate `OTP_MAX_PER_WINDOW`, `OTP_WINDOW_MINUTES`, and `OTP_RESEND_COOLDOWN_SECONDS` values and confirm client behavior is not retrying aggressively.
|
||||
- For phone-login abuse spikes, also validate `PHONE_AUTH_IP_MAX_PER_WINDOW`, `PHONE_AUTH_DEVICE_MAX_PER_WINDOW`, and `PHONE_AUTH_RISK_WINDOW_MINUTES`.
|
||||
- If verification is failing, confirm server time is correct and `OTP_EXPIRY_MINUTES` is appropriate.
|
||||
|
||||
## Rollback / Escalation
|
||||
|
||||
- Roll back recent auth/OTP changes if the failure coincides with a deployment.
|
||||
- Escalate to the provider (Authentica) with request IDs and timestamps if external API errors persist.
|
||||
|
||||
## Notes
|
||||
|
||||
- Authentica is the primary OTP provider for MVP; console provider is for local development.
|
||||
- OTP send/verify logic lives in `backend/apps/accounts/services/otp.py`.
|
||||
## References
|
||||
- OTP logic: `backend/apps/accounts/services/otp.py`
|
||||
- Risks: `docs/risks.md`
|
||||
|
||||
@@ -1,40 +1,28 @@
|
||||
# Runbook: Booking Failures
|
||||
|
||||
## Summary
|
||||
|
||||
Guide for diagnosing booking creation or status update failures (availability, overlap prevention, or validation errors).
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `POST /api/bookings/` returns HTTP 400 or 500.
|
||||
- `PATCH /api/bookings/<id>/` fails when confirming or cancelling.
|
||||
- Users report bookings not appearing or incorrect status.
|
||||
- `POST /api/bookings/` fails (400/500).
|
||||
- Booking status update fails.
|
||||
- Booking missing/incorrect in listing.
|
||||
|
||||
## Impact
|
||||
|
||||
- Customers cannot place bookings.
|
||||
- Staff schedules become inconsistent.
|
||||
- Notification and payment flows may not trigger.
|
||||
Customers cannot book; staff schedule integrity degrades; dependent flows break.
|
||||
|
||||
## Quick Checks
|
||||
- Validate payload: `service`, `staff`, `start_time`, `end_time`.
|
||||
- Check logs for validation/integrity errors.
|
||||
- Confirm staff availability + overlap expectations.
|
||||
- If notifications expected, confirm provider config + notification rows.
|
||||
|
||||
- Confirm the request payload includes a valid `service`, `staff`, and scheduled time.
|
||||
- Check server logs for booking validation errors or integrity exceptions.
|
||||
- Verify that staff availability and overlap prevention rules are behaving as expected.
|
||||
## Mitigation
|
||||
1. Reproduce with known test data.
|
||||
2. Inspect booking validation service and serializer permissions.
|
||||
3. Confirm timezone assumptions for failing case.
|
||||
4. If regression after deploy, roll back booking-related change.
|
||||
|
||||
## Mitigation Steps
|
||||
## Escalation
|
||||
Share booking id, user id, timestamps, and failing payload/response with engineering.
|
||||
|
||||
- Reproduce with a known test user and staff member to isolate data issues.
|
||||
- If overlap rules are too strict, review booking validation logic and confirm time zone assumptions.
|
||||
- If status updates are blocked, verify role checks and serializer permissions in `backend/apps/bookings/`.
|
||||
- If notifications are expected but missing, confirm `NOTIFICATION_PROVIDER` configuration and notification records.
|
||||
|
||||
## Rollback / Escalation
|
||||
|
||||
- Roll back recent booking-related changes if failures started after a deployment.
|
||||
- Escalate to engineering with the booking ID, user ID, and timestamps.
|
||||
|
||||
## Notes
|
||||
|
||||
- Booking validation and status transitions live in `backend/apps/bookings/`.
|
||||
- Notifications for booking lifecycle are handled in `backend/apps/notifications/`.
|
||||
## References
|
||||
- Booking logic: `backend/apps/bookings/`
|
||||
- Notification logic: `backend/apps/notifications/`
|
||||
|
||||
@@ -1,136 +1,37 @@
|
||||
# Payments Sanity Check (Moyasar Mock + Demo Data)
|
||||
# Runbook: Payments Sanity Check (Local Mock)
|
||||
|
||||
This runbook documents the end-to-end sanity check for the Moyasar payments flow using demo data and a local mock provider. It is intended for developers and agents validating payment creation + webhook reconciliation before merging to `main`.
|
||||
|
||||
## Purpose
|
||||
|
||||
Verify that the payment creation endpoint and webhook processing work end-to-end in a local environment without hitting Moyasar.
|
||||
Validate payment create + webhook reconciliation without hitting Moyasar.
|
||||
|
||||
## Preconditions
|
||||
|
||||
- Backend dependencies installed in the Python venv.
|
||||
- Frontend is not required for this check.
|
||||
- `backend/` database is migrated and uses SQLite for local dev.
|
||||
|
||||
## High-level Flow
|
||||
|
||||
1. Start a local mock Moyasar server (HTTP) that emulates `/v1/payments` responses.
|
||||
2. Run migrations and seed demo data.
|
||||
3. Start Django with a local payment configuration pointing to the mock server.
|
||||
4. Obtain a JWT access token for the demo customer.
|
||||
5. Create a payment for an existing booking.
|
||||
6. Send a webhook payload to mark it as paid.
|
||||
7. Verify the payment status updates.
|
||||
- Venv + backend deps installed.
|
||||
- DB migrated.
|
||||
- Run from repo root unless noted.
|
||||
|
||||
## Steps
|
||||
1. Start local mock server on `127.0.0.1:8001` exposing `POST /v1/payments`.
|
||||
2. Seed data:
|
||||
- `source venv/bin/activate`
|
||||
- `cd backend`
|
||||
- `python3 manage.py migrate`
|
||||
- `python3 manage.py seed_demo`
|
||||
3. Run API with mock settings:
|
||||
- `DJANGO_DEBUG=1 MOYASAR_SECRET_KEY=sk_test MOYASAR_PUBLISHABLE_KEY=pk_test MOYASAR_BASE_URL=http://127.0.0.1:8001 MOYASAR_WEBHOOK_SECRET=whsec python3 manage.py runserver 8000`
|
||||
4. Generate JWT in shell (demo user) and store as `<ACCESS>`.
|
||||
5. Create payment:
|
||||
- `POST /api/payments/` with `booking_id`, `provider=moyasar`, `idempotency_key`, valid source.
|
||||
6. Send paid webhook:
|
||||
- `POST /api/payments/webhook/` with `{"type":"payment_paid","secret_token":"whsec","data":{"id":"<external_id>"}}`
|
||||
7. Verify `GET /api/payments/` shows status `paid` and `paid_at` set.
|
||||
|
||||
### 1) Start the mock Moyasar server
|
||||
## Expected Results
|
||||
- Create payment returns `status=initiated` + provider `external_id` + `redirect_url`.
|
||||
- Webhook returns `{"detail":"Webhook processed"}`.
|
||||
- Payment transitions to `paid` idempotently.
|
||||
|
||||
The mock server responds to `POST /v1/payments` with a static `id` and `transaction_url`.
|
||||
|
||||
Create the mock server at `/tmp/moyasar_mock.py` and run it:
|
||||
|
||||
python3 /tmp/moyasar_mock.py
|
||||
|
||||
Expected: the process stays running, listening on `http://127.0.0.1:8001`.
|
||||
|
||||
### 2) Run migrations and seed demo data
|
||||
|
||||
source venv/bin/activate
|
||||
cd backend
|
||||
python3 manage.py migrate
|
||||
python3 manage.py seed_demo
|
||||
|
||||
Expected: `Demo data seeded.`
|
||||
|
||||
### 3) Start Django with the mock provider
|
||||
|
||||
Run the backend with environment variables pointing to the mock server:
|
||||
|
||||
DJANGO_DEBUG=1 \
|
||||
MOYASAR_SECRET_KEY=sk_test \
|
||||
MOYASAR_PUBLISHABLE_KEY=pk_test \
|
||||
MOYASAR_BASE_URL=http://127.0.0.1:8001 \
|
||||
MOYASAR_WEBHOOK_SECRET=whsec \
|
||||
python3 manage.py runserver 8000
|
||||
|
||||
Expected: server starts at `http://127.0.0.1:8000/`.
|
||||
|
||||
### 4) Obtain a JWT access token
|
||||
|
||||
Password token login at `/api/auth/token/` is deprecated for phone-first auth. For this runbook, mint a local JWT in Django shell.
|
||||
|
||||
The demo customer is:
|
||||
|
||||
- `customer@example.com`
|
||||
- `Customer123!`
|
||||
|
||||
Generate an access token:
|
||||
|
||||
python3 manage.py shell -c "from django.contrib.auth import get_user_model; from rest_framework_simplejwt.tokens import RefreshToken; u=get_user_model().objects.get(email='customer@example.com'); print(str(RefreshToken.for_user(u).access_token))"
|
||||
|
||||
Expected: a JWT string printed to stdout. Use it as `<ACCESS>`.
|
||||
|
||||
### 5) Create a payment
|
||||
|
||||
Pick a booking (demo data creates bookings; you can list them):
|
||||
|
||||
curl -s -H "Authorization: Bearer <ACCESS>" http://127.0.0.1:8000/api/bookings/
|
||||
|
||||
Then create a payment (example uses booking id `3`):
|
||||
|
||||
curl -s -X POST http://127.0.0.1:8000/api/payments/ \
|
||||
-H "Authorization: Bearer <ACCESS>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"booking_id": 3,
|
||||
"provider": "moyasar",
|
||||
"idempotency_key": "<UUID>",
|
||||
"source": {"type": "stcpay", "mobile": "0500000000"}
|
||||
}'
|
||||
|
||||
Expected: response includes:
|
||||
|
||||
- `status: initiated`
|
||||
- `external_id: pay_mock_123`
|
||||
- `redirect_url: https://moyasar.example/tx/mock`
|
||||
|
||||
### 6) Send webhook for paid state
|
||||
|
||||
curl -s -X POST http://127.0.0.1:8000/api/payments/webhook/ \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"type":"payment_paid","secret_token":"whsec","data":{"id":"pay_mock_123"}}'
|
||||
|
||||
Expected: `{ "detail": "Webhook processed" }`
|
||||
|
||||
### 7) Verify payment state
|
||||
|
||||
curl -s -H "Authorization: Bearer <ACCESS>" http://127.0.0.1:8000/api/payments/
|
||||
|
||||
Expected: payment record shows:
|
||||
|
||||
- `status: paid`
|
||||
- `paid_at` set
|
||||
- `metadata.last_webhook` populated
|
||||
|
||||
## Considerations and Edge Cases
|
||||
|
||||
- **Webhook secret**: `MOYASAR_WEBHOOK_SECRET` must be set. Requests missing or mismatching `secret_token` return `401`.
|
||||
- **Idempotency**: reuse the same `idempotency_key` to verify the API returns the existing payment without creating another provider charge.
|
||||
- **Unsupported sources**: `creditcard` is rejected by the backend. Use `stcpay`, `token`, or `applepay`.
|
||||
- **Callback URL**: required for `token` payments; otherwise validation fails.
|
||||
- **Demo data**: `seed_demo` creates a payment with `external_id=None` (not empty string) to avoid violating unique constraints.
|
||||
- **Debug mode**: `DJANGO_DEBUG=1` is required for local `runserver` if `ALLOWED_HOSTS` is not set.
|
||||
- **JWT warnings**: short JWT secret keys can trigger warnings in logs; this is acceptable for local sanity checks but should be hardened in production.
|
||||
|
||||
## What to Look For
|
||||
|
||||
- Payment creation returns `external_id` from the mock server.
|
||||
- Webhook transitions the payment to `paid` and populates `paid_at`.
|
||||
- `metadata.last_webhook` persists the payload for audit.
|
||||
## Edge Checks
|
||||
- Wrong/missing webhook secret -> `401`.
|
||||
- Reused idempotency key -> same payment reused, no duplicate charge.
|
||||
- Unsupported sources rejected by validation.
|
||||
|
||||
## Cleanup
|
||||
|
||||
- Stop the Django server (`Ctrl+C`).
|
||||
- Stop the mock server (`Ctrl+C`).
|
||||
- Optionally delete `/tmp/moyasar_mock.py`.
|
||||
Stop Django + mock processes.
|
||||
|
||||
Reference in New Issue
Block a user