⚛️ PhET‑Physics VideoQA

Controllable physics videos for world‑model evaluation (blind review)

Abstract

We present PhET‑Physics‑VideoQA, a controlled benchmark for assessing physics understanding in vision‑language models from video. The corpus comprises short clips sourced from PhET Interactive Simulations, covering topics across mechanics & fluids, optics, electromagnetism & circuits, and quantum mechanics. Each clip is paired with a triad of expert‑validated questions—conceptual, numerical, and error‑detection—yielding a diverse test‑bed for pixel‑grounded reasoning.

Dataset

Topics coverage
Topics overview
How to cite

Use the BibTeX in the References section or download the full entry: PHET‑PHYSICS‑VIDEOQA.bib.

All assets released for research. Please keep links to the dataset and code when you reference this work.

Results (LLM‑as‑a‑judge; 1–5 scale)

CategoryQuestion TypeGPT‑4o‑miniGemini‑2.5‑Flash‑LiteQwen‑VL‑PlusType Avg.
Mechanics & FluidsConceptual4.64.52.33.80
Mechanics & FluidsError Detection3.03.22.52.90
Mechanics & FluidsNumerical4.04.22.13.43
Quantum MechanicsConceptual3.73.81.63.03
Quantum MechanicsError Detection2.42.51.52.13
Quantum MechanicsNumerical3.33.51.62.80
E&M & CircuitsConceptual4.74.63.34.20
E&M & CircuitsError Detection3.83.43.13.43
E&M & CircuitsNumerical4.24.03.23.80
OpticsConceptual4.24.23.74.03
OpticsError Detection2.63.32.32.73
OpticsNumerical4.64.33.94.27

Video Samples

Sample 1 — Mechanics/Fluids
Sample 2 — Optics (Concave Mirror)
Sample 3 — E&M/Circuits
Sample 4 — Quantum/Other

References & Citation

Download BibTeX: PHET‑PHYSICS‑VIDEOQA.bib