KillerKing93's picture
Sync from GitHub 8f6d598
7cd14d8 verified

Project Rules and Workflow (Python FastAPI + Transformers)

These rules are binding for every change. Keep code, docs, and behavior synchronized at all times.

Files referenced below:

1) Documentation rules (must-do on every change)

Always update documentation when code or behavior changes.

Minimum documentation checklist:

  • What changed and where (filenames, sections, or callable links like Python.function chat_completions()).
  • Why the change was made (problem or requirement).
  • How to operate or verify (commands, endpoints, examples).
  • Follow-ups or known limitations.

Where to update:

Never skip documentation. If a change is reverted, document the revert.

2) Git discipline (mandatory)

  • Always use Git. Every change or progress step MUST be committed and pushed.
    • Windows CMD example:
      • git add .
      • git commit -m "type(scope): short description"
      • git push
  • No exceptions. If no remote exists, commit locally and configure a remote as soon as possible. Record any temporary push limitations in README.md and CLAUDE.md, but commits are still required locally.
  • Commit style:
    • Conventional types: chore, docs, feat, fix, refactor, perf, test, build, ci
    • Keep commits small and atomic (one concern per commit).
    • Reference important files in the commit body, for example: updated Python.function chat_completions(), README.md.
  • After updating code or docs, commit immediately. Do not batch unrelated changes.

2.1) Progress log (mandatory)

  • Every commit MUST include a corresponding entry in CLAUDE.md under a “Progress Log” section.
  • Each entry must include:
    • Date/time (Asia/Jakarta)
    • Scope and short summary of the change
    • The final Git commit hash and commit message
    • Files and exact callable anchors touched (use clickable anchors), e.g. Python.function chat_completions(), README.md, ARCHITECTURE.md
    • Verification steps and results (curl examples, expected vs actual, notes)
  • Required sequence:
    1. Make code changes
    2. Update docs: README.md, ARCHITECTURE.md, TODO.md, and add a new progress log entry in CLAUDE.md
    3. Run Git commands:
      • git add .
      • git commit -m "type(scope): short description"
      • git push
    4. Append the final commit hash to the CLAUDE.md entry if it was not known at authoring time
  • No code change may land without a synchronized progress log entry.

3) Large artifacts policy (.gitignore)

Never commit large/generated artifacts. Keep the repository lean and reproducible.

Must be ignored:

  • models/ (downloaded by HF/Transformers cache or tools at runtime)
  • .venv/, venv/
  • pycache/
  • .cache/
  • uploads/, data/, tmp/

See .gitignore and extend as needed for new generated outputs. If you add ignores, document the rationale in CLAUDE.md.

4) Model policy (Hugging Face / Transformers)

Target default model:

  • Qwen/Qwen3-VL-2B-Thinking (Transformers; multimodal).

Rules:

  • Use Hugging Face Transformers (AutoModelForCausalLM + AutoProcessor) with trust_remote_code=True.
  • Do not commit model weights or caches. Let from_pretrained() download to local caches.
  • Handle authentication for gated models via HF_TOKEN in .env.example.
  • The server must remain OpenAI-compatible at /v1/chat/completions and support multimodal inputs (text, images, videos).
  • Keep configuration via environment variables (see Python.os.getenv()).

5) API contract

Provide an OpenAI-compatible endpoint:

  • POST /v1/chat/completions

Minimum behavior:

  • Accept model and messages per OpenAI schema (we honor messages; model is informational since server is pinned via env).
  • Non-streaming JSON response.
  • Streaming SSE response when body.stream=true:
    • Emit OpenAI-style chat.completion.chunk deltas.
    • Include SSE id lines "session_id:index" to support resume via Last-Event-ID.

Resume semantics:

  • Client provides a session_id (or server generates one).
  • Client may reconnect and send Last-Event-ID: session_id:index to replay missed chunks.
  • Session data can be persisted (SQLite) if enabled.

Manual cancel (custom extension):

  • POST /v1/cancel/{session_id} cancels a streaming generation.
  • Note: Not part of legacy OpenAI Chat Completions spec. It mirrors the spirit of the newer OpenAI Responses API cancel endpoint.

All endpoints must validate inputs, handle timeouts/failures, and return structured JSON errors.

6) Streaming, persistence, and cancellation

7) Logging and error handling

  • Log key lifecycle stages (startup, model load, stream start/stop, resume).
  • Redact sensitive fields (e.g., tokens, credentials).
  • User errors → 400; model-not-ready → 503; unexpected failures → 500.
  • Optionally add structured logging and request IDs in a follow-up.

8) Architecture documentation

Keep ARCHITECTURE.md authoritative for:

  • Startup flow and lazy model load
  • Multimodal preprocessing (images/videos)
  • Streaming, resume, persistence, and cancellation flows
  • Error/timeout handling
  • Extensibility (persistence strategies, cancellation hooks, scaling patterns)

Update when code paths or data flows change.

9) TODO hygiene

Track all planned work in TODO.md:

  • Update statuses immediately when tasks start/complete.
  • Add newly discovered tasks as soon as they are identified.
  • Keep TODO focused, scoped, and prioritized.

10) Operational requirements and environment

Required:

  • Python: >= 3.10
  • pip
  • PyTorch: install a wheel matching platform/CUDA (see requirements.txt notes)

Recommended:

  • GPU with sufficient VRAM for the chosen model
  • Windows 11 supported; Linux/macOS should also work

Environment variables (see .env.example):

  • PORT=3000
  • MODEL_REPO_ID=Qwen/Qwen3-VL-2B-Thinking
  • HF_TOKEN=
  • MAX_TOKENS=256
  • TEMPERATURE=0.7
  • MAX_VIDEO_FRAMES=16
  • DEVICE_MAP=auto
  • TORCH_DTYPE=auto
  • PERSIST_SESSIONS=1|0, SESSIONS_DB_PATH, SESSIONS_TTL_SECONDS
  • CANCEL_AFTER_DISCONNECT_SECONDS=3600 (0 to disable)

11) File responsibilities overview

12) Workflow example (single iteration)

  1. Make a small, isolated change (e.g., enable SQLite persistence).
  2. Update docs:
  3. Commit and push:
    • git add .
    • git commit -m "feat(stream): add SQLite persistence for SSE resume"
    • git push
  4. Verify locally; record any issues or follow-ups in CLAUDE.md.

13) Compliance checklist (pre-merge / pre-push)

  • Code runs locally (uvicorn main:app …).
  • Docs updated (README.md, CLAUDE.md, ARCHITECTURE.md, TODO.md).
  • No large artifacts added to git.
  • Commit message follows conventional style.
  • Endpoint contract honored (including streaming/resume semantics and cancel extension).