LLM Evaluation & LLM-as-Judge
Reliable, human-aligned ways to measure what models actually do.
Building and evaluating language models that work across the world's languages, cultures, and real-world constraints.
I am a postdoctoral researcher at Microsoft Research, working with Sunayana Sitaram and Kalika Bali on language models, with a focus on LLM evaluation, multilingual & cultural understanding, controllable generation, reasoning, and agentic systems.
I completed my Ph.D. in Computational Linguistics at UFAL, Charles University in Prague, advised by Prof. Ondřej Dušek. My dissertation, Text Style Transfer using Neural Models, develops methods for rewriting text under attribute constraints — formality, sentiment, politeness, toxicity — across high- and low-resource languages.
Before the Ph.D., I spent 6+ years building production ML and analytics systems as a software and machine-learning engineer, and I held research roles at UKP Lab (TU Darmstadt) with Prof. Iryna Gurevych, MBZUAI with Prof. Monojit Choudhury, Panlingua, and IISc. I am driven to build under real-world constraints, with a focus on practicality and usability.
Language technology that is useful across the world's languages — not only the few well-resourced ones — and trustworthy enough to evaluate, control, and reason about.
Reliable, human-aligned ways to measure what models actually do.
Steering attributes — formality, sentiment, politeness, toxicity.
Rewriting text under attribute constraints across languages.
Models that respect linguistic and cultural diversity.
Probing and improving how models reason and stay consistent.
Tool-using agents and their cross-lingual robustness.
A complete, up-to-date list lives on Google Scholar. Below are selected recent works.
Postdoctoral Researcher — Microsoft Research
2025 — presentMultilingual NLP, LLM evaluation, and controllable generation for Language AI.
Visiting Research Scientist — MBZUAI
2024 — 2025Cultural and cross-lingual dimensions of large language models, with Prof. Monojit Choudhury.
Ph.D. Researcher — UFAL, Charles University
2019 — 2025Research on Text Style Transfer with neural language models. Advisor: Prof. Ondřej Dušek.
Research Intern — Panlingua Language Processing
2022Low-resource machine translation for Indian languages.
Research Assistant — UKP Lab, TU Darmstadt
2018 — 2019Context detection for scientific data-to-text generation.
Data Science Visiting Intern — Indian Institute of Science (IISc)
2018Time-series forecasting and predictive analytics.
Lead ML Technical Specialist — Tricon Infotech
2017 — 2019Product- and domain-specific recommendation engine at scale.
Senior Data Engineer — Avaya
2016 — 2017Log-analysis pipeline with optimized storage and real-time monitoring.
Senior Analytics Engineer — o9 Solutions
2015 — 2016Scalable enterprise planning recommendation framework.
Senior Software Engineer — Amdocs
2014 — 2015Recommendation engine and search for e-commerce platforms.
For Low-Resource Text Style Transfer for Bangla.
Charles University Grant Agency research grant, led as Principal Investigator.
Recognised for outstanding contribution at two organisations.
National-level technical competition organised by IBM.
I'm always glad to talk about LLM evaluation, multilingual & cultural NLP, controllable generation, and applied AI — research collaborations or engineering roles alike.