Fleshed out documentation
This commit is contained in:
@@ -64,3 +64,4 @@ The dev server proxies `/api` to `http://localhost:8000`.
|
|||||||
|
|
||||||
- Known gaps and risks: `docs/risks.md`
|
- Known gaps and risks: `docs/risks.md`
|
||||||
- Architecture and async/observability decisions: `docs/architecture.md`
|
- Architecture and async/observability decisions: `docs/architecture.md`
|
||||||
|
- Documentation index and standards: `docs/README.md` and `docs/documentation.md`
|
||||||
|
|||||||
+34
-9
@@ -1,11 +1,36 @@
|
|||||||
# Docs Notes (MVP Alignment)
|
# Documentation Index
|
||||||
|
|
||||||
## High-Level Takeaways
|
This directory is the source of truth for product, engineering, and ops documentation. Keep it current as features change.
|
||||||
- The MVP roadmap aligns with Phase 1 goals but needs tighter documentation around provider readiness and async strategy.
|
|
||||||
- ExecPlan references drift between `AGENTS.md` and `PLANS.md` should be resolved to avoid conflicting guidance.
|
|
||||||
- Observability and operational visibility are thin; errors are stored but not surfaced through clear runbooks/dashboards.
|
|
||||||
|
|
||||||
## Near-Term Focus
|
## Start Here
|
||||||
- Make ExecPlan references consistent and keep active plans clearly labeled.
|
|
||||||
- Document whether MVP uses async jobs (and which system) or remains synchronous with strict timeouts.
|
- Project overview and setup: `README.md` (repo root)
|
||||||
- Keep `docs/risks.md` current as gaps are closed.
|
- Architecture overview: `docs/architecture.md`
|
||||||
|
- Active ExecPlan: `docs/execplans/booking-notifications.md`
|
||||||
|
- Known risks and gaps: `docs/risks.md`
|
||||||
|
|
||||||
|
## Documentation Standards
|
||||||
|
|
||||||
|
See `docs/documentation.md` for documentation goals, update triggers, and templates.
|
||||||
|
|
||||||
|
## Docs Map
|
||||||
|
|
||||||
|
- `docs/architecture.md`: System architecture, boundaries, and MVP async/observability decision.
|
||||||
|
- `docs/adr/`: Architecture Decision Records (ADRs). New cross-cutting decisions must land here.
|
||||||
|
- `docs/execplans/`: Execution plans for significant features or refactors.
|
||||||
|
- `docs/runbooks/`: Operational runbooks and production checklists.
|
||||||
|
- `docs/risks.md`: Tracked risks and gaps.
|
||||||
|
- `docs/templates/`: Reusable templates (ADR, runbook).
|
||||||
|
|
||||||
|
## Update Triggers (Quick Reference)
|
||||||
|
|
||||||
|
- New external dependency, provider, or major flow: add an ADR in `docs/adr/`.
|
||||||
|
- Change to booking/payment/auth logic: update `docs/architecture.md` and relevant runbook(s).
|
||||||
|
- New operational procedure: add a runbook in `docs/runbooks/`.
|
||||||
|
- Close or add a significant risk: update `docs/risks.md`.
|
||||||
|
|
||||||
|
## Ownership And Review
|
||||||
|
|
||||||
|
- Authors own freshness: if you touch an area, update the docs in the same PR.
|
||||||
|
- New production flows require at least one runbook.
|
||||||
|
- Avoid duplicating instructions; link to the single source of truth.
|
||||||
|
|||||||
@@ -0,0 +1,28 @@
|
|||||||
|
# ADR 0001: Synchronous External Calls For MVP
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The MVP relies on OTP delivery, booking notifications, and payment gateway calls. Introducing a task queue (Celery/RQ) would add infrastructure (Redis, workers, retries) and operational complexity that is not required for the early launch.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
For the MVP, OTP sends, booking notifications, and payment gateway calls run synchronously in the request/response path with strict timeouts. A task queue will be revisited when traffic grows or operational needs change.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- Faster initial delivery with fewer moving parts.
|
||||||
|
- Increased latency risk on endpoints that call external providers.
|
||||||
|
- Failures are immediately visible to clients and logged for support.
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
- Celery + Redis for all external calls: rejected for MVP due to infra overhead.
|
||||||
|
- Hybrid async for notifications only: rejected to keep the execution model consistent.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- `docs/architecture.md`
|
||||||
@@ -0,0 +1,30 @@
|
|||||||
|
# ADR 0002: Moyasar As The Payment Gateway
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The platform needs a payment gateway that supports Saudi Arabia, SAR currency defaults, and local payment methods (e.g. STC Pay, Apple Pay, Samsung Pay). The backend already implements a `MoyasarGateway` integration and models `payments.Payment` with a `moyasar` provider option.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Use Moyasar as the payment gateway for the MVP. Payment creation, capture, refund, and webhook reconciliation are implemented through `apps.payments.services.gateway.MoyasarGateway`.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- Supports KSA-focused payment methods and SAR by default.
|
||||||
|
- Operational dependency on Moyasar uptime and API stability.
|
||||||
|
- Payment flows and webhooks are tied to the Moyasar API surface until a gateway abstraction is expanded.
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
- Other regional gateways: deferred until the MVP is validated.
|
||||||
|
- Stripe or similar global providers: not selected for MVP due to KSA-specific coverage priorities.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- `backend/apps/payments/services/gateway.py`
|
||||||
|
- `docs/runbooks/payments_sanity_check.md`
|
||||||
|
- `docs/architecture.md`
|
||||||
@@ -0,0 +1,30 @@
|
|||||||
|
# ADR 0003: Authentica As Primary OTP Provider
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Accepted
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The platform requires phone-first authentication with OTP delivery for KSA. The codebase includes multiple provider adapters (`console`, `twilio`, `unifonic`, `authentica`) but only Authentica is implemented for provider-managed OTP delivery (send/verify) and direct SMS messaging. Twilio and Unifonic adapters are partial or unimplemented; a console provider exists for local development.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Use Authentica as the primary OTP provider for the MVP, with `OTP_PROVIDER=authentica` in production environments. Keep `console` for local development and tests, and retain Twilio/Unifonic adapters as scaffolds for future expansion.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- OTP verification relies on Authentica APIs and credentials in production.
|
||||||
|
- Local development remains simple with the console provider.
|
||||||
|
- Adding a second production provider will require completing adapters and updating operational runbooks.
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
- Twilio as primary provider: not selected due to KSA-focused delivery needs and current adapter gaps.
|
||||||
|
- Unifonic as primary provider: deferred until the adapter is fully implemented and validated.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- `backend/apps/accounts/services/otp.py`
|
||||||
|
- `backend/salon_api/settings.py`
|
||||||
|
- `docs/architecture.md`
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
# Architecture Decision Records
|
||||||
|
|
||||||
|
ADRs capture cross-cutting or hard-to-reverse decisions. Add a new ADR when changing providers, async strategy, data model boundaries, or other architectural choices.
|
||||||
|
|
||||||
|
Use the template in `docs/templates/adr.md` and increment the numeric prefix (`0002`, `0003`, ...).
|
||||||
@@ -14,6 +14,16 @@ The Salon platform is a Django REST API backend with a React/Vite frontend, opti
|
|||||||
| **payments** | Payment model, Moyasar integration (create, capture, refund), webhook reconciliation, idempotency. |
|
| **payments** | Payment model, Moyasar integration (create, capture, refund), webhook reconciliation, idempotency. |
|
||||||
| **notifications** | Booking lifecycle notifications (SMS/WhatsApp). Reuses OTP providers; sends on booking created/confirmed/cancelled. |
|
| **notifications** | Booking lifecycle notifications (SMS/WhatsApp). Reuses OTP providers; sends on booking created/confirmed/cancelled. |
|
||||||
|
|
||||||
|
## Data Model Overview
|
||||||
|
|
||||||
|
The core data model centers on users, salons, and time-bound bookings. A booking ties a customer to a service, a staff member, and a scheduled time. Payments are recorded per booking and reconcile to the external gateway. Notifications are stored for every booking lifecycle message for auditability.
|
||||||
|
|
||||||
|
- `accounts.User` owns phone, locale, and auth preferences.
|
||||||
|
- `salons.Salon`, `salons.Service`, and `salons.Staff` define the catalog and scheduling surface.
|
||||||
|
- `bookings.Booking` links customer, staff, service, and scheduled time, with status transitions.
|
||||||
|
- `payments.Payment` tracks gateway state and idempotency per booking.
|
||||||
|
- `notifications.Notification` records each SMS/WhatsApp send attempt tied to a booking event.
|
||||||
|
|
||||||
## Data Flow
|
## Data Flow
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -28,6 +38,7 @@ User → React Frontend → Django API
|
|||||||
## Async and Observability (MVP Decision)
|
## Async and Observability (MVP Decision)
|
||||||
|
|
||||||
**Decision (MVP):** All OTP sends, booking notifications, and payment gateway calls run **synchronously** in the request/response path. No Celery, RQ, or other task queue for the initial launch.
|
**Decision (MVP):** All OTP sends, booking notifications, and payment gateway calls run **synchronously** in the request/response path. No Celery, RQ, or other task queue for the initial launch.
|
||||||
|
This is captured in ADR 0001 (`docs/adr/0001-synchronous-external-calls-mvp.md`).
|
||||||
|
|
||||||
**Rationale:**
|
**Rationale:**
|
||||||
- Reduces deployment complexity (no Redis, no worker processes).
|
- Reduces deployment complexity (no Redis, no worker processes).
|
||||||
|
|||||||
@@ -0,0 +1,51 @@
|
|||||||
|
# Documentation Practices
|
||||||
|
|
||||||
|
These standards aim to keep documentation reliable as the codebase grows.
|
||||||
|
|
||||||
|
## Principles
|
||||||
|
|
||||||
|
- Single source of truth: one canonical doc per topic; link instead of duplicating.
|
||||||
|
- Proximity: keep docs close to the code they describe when possible.
|
||||||
|
- Freshness: update docs in the same PR as the code change.
|
||||||
|
- Observable behavior: describe what someone can see or run to validate the behavior.
|
||||||
|
|
||||||
|
## Required Docs By Area
|
||||||
|
|
||||||
|
- Architecture and major decisions: `docs/architecture.md` and `docs/adr/`.
|
||||||
|
- Feature delivery plans: `docs/execplans/` (required by `PLANS.md`).
|
||||||
|
- Operational procedures: `docs/runbooks/`.
|
||||||
|
- Risks and gaps: `docs/risks.md`.
|
||||||
|
|
||||||
|
## When To Write An ADR
|
||||||
|
|
||||||
|
Use an ADR for any decision that is cross-cutting or hard to reverse, including:
|
||||||
|
|
||||||
|
- External providers or payment/auth strategy changes.
|
||||||
|
- Async vs synchronous execution decisions.
|
||||||
|
- Data model changes that affect multiple apps or services.
|
||||||
|
|
||||||
|
ADRs live in `docs/adr/` and use the template in `docs/templates/adr.md`.
|
||||||
|
|
||||||
|
## Runbook Expectations
|
||||||
|
|
||||||
|
Every production-impacting flow should have a runbook that covers:
|
||||||
|
|
||||||
|
- Symptoms and impact.
|
||||||
|
- Detection and quick checks.
|
||||||
|
- Safe remediation steps.
|
||||||
|
- Rollback or escalation path.
|
||||||
|
|
||||||
|
Use the template in `docs/templates/runbook.md`.
|
||||||
|
|
||||||
|
## Writing Style
|
||||||
|
|
||||||
|
- Be explicit: include exact commands, paths, and expected output where useful.
|
||||||
|
- Keep sections short and focused.
|
||||||
|
- Avoid unstated assumptions; if a step needs a specific directory, say so.
|
||||||
|
|
||||||
|
## Review Checklist
|
||||||
|
|
||||||
|
- Docs updated or explicitly confirmed unnecessary.
|
||||||
|
- New runbook added when operational behavior changes.
|
||||||
|
- ADR added for new cross-cutting decisions.
|
||||||
|
- `docs/risks.md` updated for meaningful gaps added or closed.
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
# Runbooks
|
||||||
|
|
||||||
|
Operational procedures live here. Each new production-impacting workflow should add or update a runbook.
|
||||||
|
|
||||||
|
Existing runbooks:
|
||||||
|
|
||||||
|
- `docs/runbooks/auth_otp_failures.md`
|
||||||
|
- `docs/runbooks/booking_failures.md`
|
||||||
|
- `docs/runbooks/payments_sanity_check.md`
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
# Runbook: Auth OTP Failures
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Guide for diagnosing and mitigating OTP send or verify failures in phone-first authentication.
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
- Users report not receiving OTP codes.
|
||||||
|
- `/api/auth/otp/request/` or `/api/auth/phone/request/` returns HTTP 500 or rate-limit errors.
|
||||||
|
- `/api/auth/otp/verify/` or `/api/auth/phone/verify/` returns invalid or expired OTP errors unexpectedly.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
- Users cannot sign in or complete phone verification.
|
||||||
|
- Booking and payment flows are blocked when auth is required.
|
||||||
|
|
||||||
|
## Quick Checks
|
||||||
|
|
||||||
|
- Confirm the provider configured in `backend/salon_api/settings.py` via `OTP_PROVIDER`.
|
||||||
|
- Check recent application logs for OTP send errors.
|
||||||
|
- Verify provider credentials are present in `backend/.env` for the active provider.
|
||||||
|
|
||||||
|
## Mitigation Steps
|
||||||
|
|
||||||
|
- If provider credentials are missing or invalid, fix the environment variables and restart the API process.
|
||||||
|
- If the provider is down, temporarily switch to `OTP_PROVIDER=console` for non-production environments and notify support.
|
||||||
|
- If rate limits are triggered, validate `OTP_MAX_PER_WINDOW`, `OTP_WINDOW_MINUTES`, and `OTP_RESEND_COOLDOWN_SECONDS` values and confirm client behavior is not retrying aggressively.
|
||||||
|
- If verification is failing, confirm server time is correct and `OTP_EXPIRY_MINUTES` is appropriate.
|
||||||
|
|
||||||
|
## Rollback / Escalation
|
||||||
|
|
||||||
|
- Roll back recent auth/OTP changes if the failure coincides with a deployment.
|
||||||
|
- Escalate to the provider (Authentica) with request IDs and timestamps if external API errors persist.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Authentica is the primary OTP provider for MVP; console provider is for local development.
|
||||||
|
- OTP send/verify logic lives in `backend/apps/accounts/services/otp.py`.
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
# Runbook: Booking Failures
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Guide for diagnosing booking creation or status update failures (availability, overlap prevention, or validation errors).
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
- `POST /api/bookings/` returns HTTP 400 or 500.
|
||||||
|
- `PATCH /api/bookings/<id>/` fails when confirming or cancelling.
|
||||||
|
- Users report bookings not appearing or incorrect status.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
- Customers cannot place bookings.
|
||||||
|
- Staff schedules become inconsistent.
|
||||||
|
- Notification and payment flows may not trigger.
|
||||||
|
|
||||||
|
## Quick Checks
|
||||||
|
|
||||||
|
- Confirm the request payload includes a valid `service`, `staff`, and scheduled time.
|
||||||
|
- Check server logs for booking validation errors or integrity exceptions.
|
||||||
|
- Verify that staff availability and overlap prevention rules are behaving as expected.
|
||||||
|
|
||||||
|
## Mitigation Steps
|
||||||
|
|
||||||
|
- Reproduce with a known test user and staff member to isolate data issues.
|
||||||
|
- If overlap rules are too strict, review booking validation logic and confirm time zone assumptions.
|
||||||
|
- If status updates are blocked, verify role checks and serializer permissions in `backend/apps/bookings/`.
|
||||||
|
- If notifications are expected but missing, confirm `NOTIFICATION_PROVIDER` configuration and notification records.
|
||||||
|
|
||||||
|
## Rollback / Escalation
|
||||||
|
|
||||||
|
- Roll back recent booking-related changes if failures started after a deployment.
|
||||||
|
- Escalate to engineering with the booking ID, user ID, and timestamps.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Booking validation and status transitions live in `backend/apps/bookings/`.
|
||||||
|
- Notifications for booking lifecycle are handled in `backend/apps/notifications/`.
|
||||||
Vendored
+25
@@ -0,0 +1,25 @@
|
|||||||
|
# ADR <NNNN>: <Title>
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Proposed | Accepted | Deprecated | Superseded
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Explain the problem and the forces at play. Include constraints, risks, or user needs.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
State the decision clearly and explicitly.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
List the expected positive and negative outcomes, including operational impact.
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
Briefly document viable alternatives and why they were rejected.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
Link to relevant PRs, runbooks, or architecture sections.
|
||||||
Vendored
+29
@@ -0,0 +1,29 @@
|
|||||||
|
# Runbook: <Short Title>
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
One or two sentences describing the situation this runbook covers.
|
||||||
|
|
||||||
|
## Symptoms
|
||||||
|
|
||||||
|
Describe what an operator or user will observe.
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
Who or what is affected.
|
||||||
|
|
||||||
|
## Quick Checks
|
||||||
|
|
||||||
|
Exact commands or checks that confirm the issue.
|
||||||
|
|
||||||
|
## Mitigation Steps
|
||||||
|
|
||||||
|
Step-by-step actions to resolve or reduce impact.
|
||||||
|
|
||||||
|
## Rollback / Escalation
|
||||||
|
|
||||||
|
How to revert or who to contact if the issue persists.
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
Any caveats, dependencies, or follow-up actions.
|
||||||
Reference in New Issue
Block a user