Multimodal Reasoning

Advancing abnormality grounding via Vision-Language Models.

Bridging Modalities

Scientific Thesis

"Detecting anomalies is not sufficient for clinical decision-making. Current medical AI systems either detect visual deviations or describe images, but rarely connect the two through reasoning. This work reframes anomaly detection as a multimodal reasoning problem, where visual deviations are treated as hypotheses that must be grounded, explained, and contextualized through language and prior knowledge. We couple generative models with vision–language reasoning to enable systems that can both condition generation on semantic intent and interpret previously unseen anomalies in clinically meaningful terms. This closes the loop between detection, explanation, and reasoning, supporting open-world clinical settings."

Open Challenges

Multimodal Anomaly Grounding

Anchor visually detected anomalies in structured clinical language, ensuring that explanations correspond to anatomically and pathophysiologically valid concepts rather than free-form descriptions.

Language-Conditioned Generation and Repair

Develop generative models that can be steered by semantic constraints, enabling hypothesis-driven synthesis, counterfactual reasoning, and targeted normalization guided by clinical language.

Open-World Reasoning

Enable vision–language systems to reason about unseen anomalies, supporting interpretation of rare, ambiguous, or previously uncharacterized findings.

Key Publications