EPS Dashboard

Parent: #190

Motivation

Quick analysis on the #190 spill matrix shows that hand-coded typological distance predicts contamination magnitude (Spearman rho=-0.52, p=0.003, N=30 bystander cells). But the hand-coded distances are crude (4 levels). The proper test: extract actual language-output direction vectors from Qwen-2.5-7B-Instruct's hidden states and compute pairwise cosine similarity / JS divergence, then correlate with the spill rates.

Key questions:

Does representation-space distance (cos sim of language directions) predict spill better than typological distance?
Do the German outliers (66-95% contamination despite being "different branch") sit closer to Romance languages in representation space than typology suggests?
Is there a linear relationship between representation distance and spill magnitude, or is it a threshold effect?

Design sketch

Load Qwen-2.5-7B-Instruct on an H100
Run forward passes on "Speak in {language}." for all 7 languages × multiple prompts
Extract hidden-state vectors at the last token position (or mean-pool across response tokens from baseline completions)
Compute pairwise: cosine similarity, JS divergence of output logit distributions
Correlate with the 7×7 spill matrix from #190
Visualize: 2D MDS/UMAP of the 7 language directions, colored by contamination received

Compute

~1 GPU-hr (forward passes only, no training). compute:small.

Artifacts needed

Base model: Qwen-2.5-7B-Instruct (already cached on pods)
Spill matrix: from #190 eval results (already on git)
Baseline completions: from #162/#190 eval results (already on git)