Deployment

Picochat includes a local OpenAI-compatible server and a Docker path for smoke testing trained checkpoints. This is not a high-throughput production serving stack yet; vLLM, TGI, TensorRT-LLM, and llama.cpp adapters remain future work.

Local Workbench Container

Build and run the dashboard:

docker compose up --build picochat-web

Then open:

http://127.0.0.1:8765

The container mounts ./runs into /workspace/runs, so local run artifacts stay outside the image.

Local OpenAI-Compatible API

After a demo or training run writes a checkpoint:

export PICOCHAT_API_KEY="replace-me"

picochat serve \
  --checkpoint runs/pico-demo/sft/checkpoint \
  --tokenizer runs/pico-demo/tokenizer.json \
  --host 127.0.0.1 \
  --port 8000 \
  --api-key-env PICOCHAT_API_KEY

The Docker smoke path is also available:

PICOCHAT_CHECKPOINT=/workspace/runs/pico-demo/sft/checkpoint \
PICOCHAT_TOKENIZER=/workspace/runs/pico-demo/tokenizer.json \
docker compose --profile serve up --build picochat-serve

Then call:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H 'content-type: application/json' \
  -H "authorization: Bearer $PICOCHAT_API_KEY" \
  -d '{"model":"picochat","messages":[{"role":"user","content":"What is Picochat?"}],"max_tokens":80}'

Streaming response framing is available for local integrations:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H 'content-type: application/json' \
  -H "authorization: Bearer $PICOCHAT_API_KEY" \
  -d '{"model":"picochat","stream":true,"messages":[{"role":"user","content":"Stream one sentence."}],"max_tokens":80}'

Release Rule

Do not deploy a checkpoint as a product claim until the run includes:

The server can load an experimental model for smoke tests, but serving does not turn an experimental run into a release.

Hub Publishing

For public review, export the checkpoint and publish the exact folder that contains the model card, release manifest, serving manifest, tokenizer, and weights:

export HF_TOKEN="hf_..."

picochat export hf \
  --checkpoint runs/<run>/sft/checkpoint \
  --tokenizer runs/<run>/tokenizer.json \
  --out-dir exports/<run> \
  --model-name picochat-<run> \
  --license mit \
  --dataset-summary "See release_manifest.json and preflight report." \
  --eval-summary "See release gate and external benchmark reports." \
  --push-to-hub \
  --repo-id <user-or-org>/picochat-<run>

This is a publication path, not a production-serving path. The exported Transformers adapter requires trust_remote_code=True, and high-throughput serving still needs a native vLLM/TGI/llama.cpp adapter.

Current Limits

Those limits are explicit so users do not confuse the local server with a production inference fleet.