From aa607b9b6e3873b85260e896daaef37d91af864a Mon Sep 17 00:00:00 2001 From: mohammad Date: Sat, 28 Feb 2026 17:41:00 +0300 Subject: [PATCH] Fleshed out documentation --- README.md | 1 + docs/README.md | 43 ++++++++++++---- .../0001-synchronous-external-calls-mvp.md | 28 ++++++++++ docs/adr/0002-moyasar-payment-gateway.md | 30 +++++++++++ docs/adr/0003-authentica-otp-provider.md | 30 +++++++++++ docs/adr/README.md | 5 ++ docs/architecture.md | 11 ++++ docs/documentation.md | 51 +++++++++++++++++++ docs/runbooks/README.md | 9 ++++ docs/runbooks/auth_otp_failures.md | 39 ++++++++++++++ docs/runbooks/booking_failures.md | 40 +++++++++++++++ docs/templates/adr.md | 25 +++++++++ docs/templates/runbook.md | 29 +++++++++++ 13 files changed, 332 insertions(+), 9 deletions(-) create mode 100644 docs/adr/0001-synchronous-external-calls-mvp.md create mode 100644 docs/adr/0002-moyasar-payment-gateway.md create mode 100644 docs/adr/0003-authentica-otp-provider.md create mode 100644 docs/adr/README.md create mode 100644 docs/documentation.md create mode 100644 docs/runbooks/README.md create mode 100644 docs/runbooks/auth_otp_failures.md create mode 100644 docs/runbooks/booking_failures.md create mode 100644 docs/templates/adr.md create mode 100644 docs/templates/runbook.md diff --git a/README.md b/README.md index e2cc029..3c78ac0 100644 --- a/README.md +++ b/README.md @@ -64,3 +64,4 @@ The dev server proxies `/api` to `http://localhost:8000`. - Known gaps and risks: `docs/risks.md` - Architecture and async/observability decisions: `docs/architecture.md` +- Documentation index and standards: `docs/README.md` and `docs/documentation.md` diff --git a/docs/README.md b/docs/README.md index 1ebdb2f..806cd6e 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,11 +1,36 @@ -# Docs Notes (MVP Alignment) +# Documentation Index -## High-Level Takeaways -- The MVP roadmap aligns with Phase 1 goals but needs tighter documentation around provider readiness and async strategy. -- ExecPlan references drift between `AGENTS.md` and `PLANS.md` should be resolved to avoid conflicting guidance. -- Observability and operational visibility are thin; errors are stored but not surfaced through clear runbooks/dashboards. +This directory is the source of truth for product, engineering, and ops documentation. Keep it current as features change. -## Near-Term Focus -- Make ExecPlan references consistent and keep active plans clearly labeled. -- Document whether MVP uses async jobs (and which system) or remains synchronous with strict timeouts. -- Keep `docs/risks.md` current as gaps are closed. +## Start Here + +- Project overview and setup: `README.md` (repo root) +- Architecture overview: `docs/architecture.md` +- Active ExecPlan: `docs/execplans/booking-notifications.md` +- Known risks and gaps: `docs/risks.md` + +## Documentation Standards + +See `docs/documentation.md` for documentation goals, update triggers, and templates. + +## Docs Map + +- `docs/architecture.md`: System architecture, boundaries, and MVP async/observability decision. +- `docs/adr/`: Architecture Decision Records (ADRs). New cross-cutting decisions must land here. +- `docs/execplans/`: Execution plans for significant features or refactors. +- `docs/runbooks/`: Operational runbooks and production checklists. +- `docs/risks.md`: Tracked risks and gaps. +- `docs/templates/`: Reusable templates (ADR, runbook). + +## Update Triggers (Quick Reference) + +- New external dependency, provider, or major flow: add an ADR in `docs/adr/`. +- Change to booking/payment/auth logic: update `docs/architecture.md` and relevant runbook(s). +- New operational procedure: add a runbook in `docs/runbooks/`. +- Close or add a significant risk: update `docs/risks.md`. + +## Ownership And Review + +- Authors own freshness: if you touch an area, update the docs in the same PR. +- New production flows require at least one runbook. +- Avoid duplicating instructions; link to the single source of truth. diff --git a/docs/adr/0001-synchronous-external-calls-mvp.md b/docs/adr/0001-synchronous-external-calls-mvp.md new file mode 100644 index 0000000..3e95cf1 --- /dev/null +++ b/docs/adr/0001-synchronous-external-calls-mvp.md @@ -0,0 +1,28 @@ +# ADR 0001: Synchronous External Calls For MVP + +## Status + +Accepted + +## Context + +The MVP relies on OTP delivery, booking notifications, and payment gateway calls. Introducing a task queue (Celery/RQ) would add infrastructure (Redis, workers, retries) and operational complexity that is not required for the early launch. + +## Decision + +For the MVP, OTP sends, booking notifications, and payment gateway calls run synchronously in the request/response path with strict timeouts. A task queue will be revisited when traffic grows or operational needs change. + +## Consequences + +- Faster initial delivery with fewer moving parts. +- Increased latency risk on endpoints that call external providers. +- Failures are immediately visible to clients and logged for support. + +## Alternatives Considered + +- Celery + Redis for all external calls: rejected for MVP due to infra overhead. +- Hybrid async for notifications only: rejected to keep the execution model consistent. + +## Related + +- `docs/architecture.md` diff --git a/docs/adr/0002-moyasar-payment-gateway.md b/docs/adr/0002-moyasar-payment-gateway.md new file mode 100644 index 0000000..e77d4ec --- /dev/null +++ b/docs/adr/0002-moyasar-payment-gateway.md @@ -0,0 +1,30 @@ +# ADR 0002: Moyasar As The Payment Gateway + +## Status + +Accepted + +## Context + +The platform needs a payment gateway that supports Saudi Arabia, SAR currency defaults, and local payment methods (e.g. STC Pay, Apple Pay, Samsung Pay). The backend already implements a `MoyasarGateway` integration and models `payments.Payment` with a `moyasar` provider option. + +## Decision + +Use Moyasar as the payment gateway for the MVP. Payment creation, capture, refund, and webhook reconciliation are implemented through `apps.payments.services.gateway.MoyasarGateway`. + +## Consequences + +- Supports KSA-focused payment methods and SAR by default. +- Operational dependency on Moyasar uptime and API stability. +- Payment flows and webhooks are tied to the Moyasar API surface until a gateway abstraction is expanded. + +## Alternatives Considered + +- Other regional gateways: deferred until the MVP is validated. +- Stripe or similar global providers: not selected for MVP due to KSA-specific coverage priorities. + +## Related + +- `backend/apps/payments/services/gateway.py` +- `docs/runbooks/payments_sanity_check.md` +- `docs/architecture.md` diff --git a/docs/adr/0003-authentica-otp-provider.md b/docs/adr/0003-authentica-otp-provider.md new file mode 100644 index 0000000..7351e55 --- /dev/null +++ b/docs/adr/0003-authentica-otp-provider.md @@ -0,0 +1,30 @@ +# ADR 0003: Authentica As Primary OTP Provider + +## Status + +Accepted + +## Context + +The platform requires phone-first authentication with OTP delivery for KSA. The codebase includes multiple provider adapters (`console`, `twilio`, `unifonic`, `authentica`) but only Authentica is implemented for provider-managed OTP delivery (send/verify) and direct SMS messaging. Twilio and Unifonic adapters are partial or unimplemented; a console provider exists for local development. + +## Decision + +Use Authentica as the primary OTP provider for the MVP, with `OTP_PROVIDER=authentica` in production environments. Keep `console` for local development and tests, and retain Twilio/Unifonic adapters as scaffolds for future expansion. + +## Consequences + +- OTP verification relies on Authentica APIs and credentials in production. +- Local development remains simple with the console provider. +- Adding a second production provider will require completing adapters and updating operational runbooks. + +## Alternatives Considered + +- Twilio as primary provider: not selected due to KSA-focused delivery needs and current adapter gaps. +- Unifonic as primary provider: deferred until the adapter is fully implemented and validated. + +## Related + +- `backend/apps/accounts/services/otp.py` +- `backend/salon_api/settings.py` +- `docs/architecture.md` diff --git a/docs/adr/README.md b/docs/adr/README.md new file mode 100644 index 0000000..58f4608 --- /dev/null +++ b/docs/adr/README.md @@ -0,0 +1,5 @@ +# Architecture Decision Records + +ADRs capture cross-cutting or hard-to-reverse decisions. Add a new ADR when changing providers, async strategy, data model boundaries, or other architectural choices. + +Use the template in `docs/templates/adr.md` and increment the numeric prefix (`0002`, `0003`, ...). diff --git a/docs/architecture.md b/docs/architecture.md index 53daeec..9fd0cb5 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -14,6 +14,16 @@ The Salon platform is a Django REST API backend with a React/Vite frontend, opti | **payments** | Payment model, Moyasar integration (create, capture, refund), webhook reconciliation, idempotency. | | **notifications** | Booking lifecycle notifications (SMS/WhatsApp). Reuses OTP providers; sends on booking created/confirmed/cancelled. | +## Data Model Overview + +The core data model centers on users, salons, and time-bound bookings. A booking ties a customer to a service, a staff member, and a scheduled time. Payments are recorded per booking and reconcile to the external gateway. Notifications are stored for every booking lifecycle message for auditability. + +- `accounts.User` owns phone, locale, and auth preferences. +- `salons.Salon`, `salons.Service`, and `salons.Staff` define the catalog and scheduling surface. +- `bookings.Booking` links customer, staff, service, and scheduled time, with status transitions. +- `payments.Payment` tracks gateway state and idempotency per booking. +- `notifications.Notification` records each SMS/WhatsApp send attempt tied to a booking event. + ## Data Flow ``` @@ -28,6 +38,7 @@ User → React Frontend → Django API ## Async and Observability (MVP Decision) **Decision (MVP):** All OTP sends, booking notifications, and payment gateway calls run **synchronously** in the request/response path. No Celery, RQ, or other task queue for the initial launch. +This is captured in ADR 0001 (`docs/adr/0001-synchronous-external-calls-mvp.md`). **Rationale:** - Reduces deployment complexity (no Redis, no worker processes). diff --git a/docs/documentation.md b/docs/documentation.md new file mode 100644 index 0000000..e924b27 --- /dev/null +++ b/docs/documentation.md @@ -0,0 +1,51 @@ +# Documentation Practices + +These standards aim to keep documentation reliable as the codebase grows. + +## Principles + +- Single source of truth: one canonical doc per topic; link instead of duplicating. +- Proximity: keep docs close to the code they describe when possible. +- Freshness: update docs in the same PR as the code change. +- Observable behavior: describe what someone can see or run to validate the behavior. + +## Required Docs By Area + +- Architecture and major decisions: `docs/architecture.md` and `docs/adr/`. +- Feature delivery plans: `docs/execplans/` (required by `PLANS.md`). +- Operational procedures: `docs/runbooks/`. +- Risks and gaps: `docs/risks.md`. + +## When To Write An ADR + +Use an ADR for any decision that is cross-cutting or hard to reverse, including: + +- External providers or payment/auth strategy changes. +- Async vs synchronous execution decisions. +- Data model changes that affect multiple apps or services. + +ADRs live in `docs/adr/` and use the template in `docs/templates/adr.md`. + +## Runbook Expectations + +Every production-impacting flow should have a runbook that covers: + +- Symptoms and impact. +- Detection and quick checks. +- Safe remediation steps. +- Rollback or escalation path. + +Use the template in `docs/templates/runbook.md`. + +## Writing Style + +- Be explicit: include exact commands, paths, and expected output where useful. +- Keep sections short and focused. +- Avoid unstated assumptions; if a step needs a specific directory, say so. + +## Review Checklist + +- Docs updated or explicitly confirmed unnecessary. +- New runbook added when operational behavior changes. +- ADR added for new cross-cutting decisions. +- `docs/risks.md` updated for meaningful gaps added or closed. diff --git a/docs/runbooks/README.md b/docs/runbooks/README.md new file mode 100644 index 0000000..9d0ecff --- /dev/null +++ b/docs/runbooks/README.md @@ -0,0 +1,9 @@ +# Runbooks + +Operational procedures live here. Each new production-impacting workflow should add or update a runbook. + +Existing runbooks: + +- `docs/runbooks/auth_otp_failures.md` +- `docs/runbooks/booking_failures.md` +- `docs/runbooks/payments_sanity_check.md` diff --git a/docs/runbooks/auth_otp_failures.md b/docs/runbooks/auth_otp_failures.md new file mode 100644 index 0000000..0d6f7c9 --- /dev/null +++ b/docs/runbooks/auth_otp_failures.md @@ -0,0 +1,39 @@ +# Runbook: Auth OTP Failures + +## Summary + +Guide for diagnosing and mitigating OTP send or verify failures in phone-first authentication. + +## Symptoms + +- Users report not receiving OTP codes. +- `/api/auth/otp/request/` or `/api/auth/phone/request/` returns HTTP 500 or rate-limit errors. +- `/api/auth/otp/verify/` or `/api/auth/phone/verify/` returns invalid or expired OTP errors unexpectedly. + +## Impact + +- Users cannot sign in or complete phone verification. +- Booking and payment flows are blocked when auth is required. + +## Quick Checks + +- Confirm the provider configured in `backend/salon_api/settings.py` via `OTP_PROVIDER`. +- Check recent application logs for OTP send errors. +- Verify provider credentials are present in `backend/.env` for the active provider. + +## Mitigation Steps + +- If provider credentials are missing or invalid, fix the environment variables and restart the API process. +- If the provider is down, temporarily switch to `OTP_PROVIDER=console` for non-production environments and notify support. +- If rate limits are triggered, validate `OTP_MAX_PER_WINDOW`, `OTP_WINDOW_MINUTES`, and `OTP_RESEND_COOLDOWN_SECONDS` values and confirm client behavior is not retrying aggressively. +- If verification is failing, confirm server time is correct and `OTP_EXPIRY_MINUTES` is appropriate. + +## Rollback / Escalation + +- Roll back recent auth/OTP changes if the failure coincides with a deployment. +- Escalate to the provider (Authentica) with request IDs and timestamps if external API errors persist. + +## Notes + +- Authentica is the primary OTP provider for MVP; console provider is for local development. +- OTP send/verify logic lives in `backend/apps/accounts/services/otp.py`. diff --git a/docs/runbooks/booking_failures.md b/docs/runbooks/booking_failures.md new file mode 100644 index 0000000..f28284f --- /dev/null +++ b/docs/runbooks/booking_failures.md @@ -0,0 +1,40 @@ +# Runbook: Booking Failures + +## Summary + +Guide for diagnosing booking creation or status update failures (availability, overlap prevention, or validation errors). + +## Symptoms + +- `POST /api/bookings/` returns HTTP 400 or 500. +- `PATCH /api/bookings//` fails when confirming or cancelling. +- Users report bookings not appearing or incorrect status. + +## Impact + +- Customers cannot place bookings. +- Staff schedules become inconsistent. +- Notification and payment flows may not trigger. + +## Quick Checks + +- Confirm the request payload includes a valid `service`, `staff`, and scheduled time. +- Check server logs for booking validation errors or integrity exceptions. +- Verify that staff availability and overlap prevention rules are behaving as expected. + +## Mitigation Steps + +- Reproduce with a known test user and staff member to isolate data issues. +- If overlap rules are too strict, review booking validation logic and confirm time zone assumptions. +- If status updates are blocked, verify role checks and serializer permissions in `backend/apps/bookings/`. +- If notifications are expected but missing, confirm `NOTIFICATION_PROVIDER` configuration and notification records. + +## Rollback / Escalation + +- Roll back recent booking-related changes if failures started after a deployment. +- Escalate to engineering with the booking ID, user ID, and timestamps. + +## Notes + +- Booking validation and status transitions live in `backend/apps/bookings/`. +- Notifications for booking lifecycle are handled in `backend/apps/notifications/`. diff --git a/docs/templates/adr.md b/docs/templates/adr.md new file mode 100644 index 0000000..a2c2818 --- /dev/null +++ b/docs/templates/adr.md @@ -0,0 +1,25 @@ +# ADR : + +## Status + +Proposed | Accepted | Deprecated | Superseded + +## Context + +Explain the problem and the forces at play. Include constraints, risks, or user needs. + +## Decision + +State the decision clearly and explicitly. + +## Consequences + +List the expected positive and negative outcomes, including operational impact. + +## Alternatives Considered + +Briefly document viable alternatives and why they were rejected. + +## Related + +Link to relevant PRs, runbooks, or architecture sections. diff --git a/docs/templates/runbook.md b/docs/templates/runbook.md new file mode 100644 index 0000000..2e99b5d --- /dev/null +++ b/docs/templates/runbook.md @@ -0,0 +1,29 @@ +# Runbook: <Short Title> + +## Summary + +One or two sentences describing the situation this runbook covers. + +## Symptoms + +Describe what an operator or user will observe. + +## Impact + +Who or what is affected. + +## Quick Checks + +Exact commands or checks that confirm the issue. + +## Mitigation Steps + +Step-by-step actions to resolve or reduce impact. + +## Rollback / Escalation + +How to revert or who to contact if the issue persists. + +## Notes + +Any caveats, dependencies, or follow-up actions.