Training Paths

Picochat separates three workflows that are often mixed together in small-model projects.

Path A: Train From Scratch

Use this when the goal is to create a Picochat-native model.

picochat run tiny \
  --dataset-pack runs/my-pack/dataset_pack.json \
  --out-dir runs/my-scratch-model

This path owns the full factory:

This is the right path for the 100M/1B proof runs.

Path B: Fine-Tune an Existing Hugging Face Model

Use this when the goal is to start from an existing model such as SmolLM, Qwen, or Llama and adapt it to a task.

picochat train hf-sft \
  --model HuggingFaceTB/SmolLM2-135M-Instruct \
  --input runs/my-pack/chat_benchmark.jsonl \
  --out-dir runs/smollm-hf-sft-v1 \
  --max-steps 200 \
  --max-length 1024 \
  --device cuda \
  --precision bf16 \
  --gradient-checkpointing \
  --peft lora \
  --done-file done.txt

This path uses Picochat chat JSONL and assistant-only loss masking, but it writes Hugging Face model folders:

It does not train a Picochat-native base model and does not claim a release gate by itself. Use it for hackathons, adapter experiments, and quick task models where pretraining from zero is not the objective.

For tool-calling or agent tasks, prefer multi-turn messages rows and train only the final assistant target. Picochat supports that shape directly:

{"system":"You are a tool-calling assistant.","tools":[{"name":"search_schedule"}],"messages":[{"role":"user","content":"Find tomorrow's meeting and draft an email."},{"role":"assistant","content":"I will check the schedule first."},{"role":"tool","content":"search_schedule returned Standup at 9 AM."},{"role":"assistant","content":"Tool call: send_email\nArguments: {\"subject\":\"Standup\",\"time\":\"9 AM\"}"}]}

The previous turns, system prompt, and tool definitions are context. The loss is masked so the model is trained only on the final assistant message. For Qwen-like models, start with BF16 LoRA before experimenting with 4-bit quantization; some Qwen checkpoints can degrade noticeably under QLoRA.

Path C: Evaluate, Gate, and Serve

Use this after a native Picochat run when the goal is evidence.

picochat eval chat \
  --checkpoint runs/my-scratch-model/sft/checkpoint \
  --tokenizer runs/my-scratch-model/tokenizer.json \
  --input runs/my-pack/eval_benchmark.jsonl \
  --out-dir runs/my-scratch-model/eval

Native release evidence should include:

Existing Hugging Face models can still use Picochat-generated SFT/eval data, but direct HF release gating is intentionally separate until Picochat has a full HF eval bridge.