Add short integration tests and enforce agents to run them on a pod before merging code/running an experiment

kind: infra

track:

Highlight text on any card (body, plan, or an event) to anchor a comment, or leave a whole-task comment here. Mention @claude to summon a reply.

No comments yet.

These tests should test whatever pipeline they are testing end to end but with minimal training steps/eval questions. But they should test that all the models are being properly uploaded, results logged, and evals run

Activity