About Research Publications Experience Awards Curriculum Vitae
Postdoctoral Researcher · Microsoft Research

SourabrataMukherjee

Building and evaluating language models that work across the world's languages, cultures, and real-world constraints.

NLP–LLM Researcher Applied AI Bengaluru, India
Portrait of Sourabrata Mukherjee
Charles University
6+ yrs ML engineering
11+
Peer-reviewed papers
6+
Years in industry ML
4
Research labs worldwide
2
Best Paper Awards
About

Bridging research and engineering

I am a postdoctoral researcher at Microsoft Research, working with Sunayana Sitaram and Kalika Bali on language models, with a focus on LLM evaluation, multilingual & cultural understanding, controllable generation, reasoning, and agentic systems.

I completed my Ph.D. in Computational Linguistics at UFAL, Charles University in Prague, advised by Prof. Ondřej Dušek. My dissertation, Text Style Transfer using Neural Models, develops methods for rewriting text under attribute constraints — formality, sentiment, politeness, toxicity — across high- and low-resource languages.

Before the Ph.D., I spent 6+ years building production ML and analytics systems as a software and machine-learning engineer, and I held research roles at UKP Lab (TU Darmstadt) with Prof. Iryna Gurevych, MBZUAI with Prof. Monojit Choudhury, Panlingua, and IISc. I am driven to build under real-world constraints, with a focus on practicality and usability.

Research

What I work on

Language technology that is useful across the world's languages — not only the few well-resourced ones — and trustworthy enough to evaluate, control, and reason about.

LLM Evaluation & LLM-as-Judge

Reliable, human-aligned ways to measure what models actually do.

Controllable Text Generation

Steering attributes — formality, sentiment, politeness, toxicity.

Text Style Transfer

Rewriting text under attribute constraints across languages.

Multilingual & Cultural NLP

Models that respect linguistic and cultural diversity.

Reasoning

Probing and improving how models reason and stay consistent.

Agentic Development

Tool-using agents and their cross-lingual robustness.

News

Recent updates

  • 2026Building Benchmarks from the Ground Up received an Honorable Mention at CHI 2026.
  • 2025Joined Microsoft Research as a postdoctoral researcher — language AI, multilingual evaluation, and controllable generation.
  • 2025Successfully defended my Ph.D. thesis “Text Style Transfer using Neural Models” at Charles University, Prague.
  • 2025Paper on honorific usage in Wikipedia and LLMs for Bengali & Hindi accepted to EMNLP 2025 (Main).
  • 2024Visiting research scientist at MBZUAI (Abu Dhabi) on culture-aware LLMs with Prof. Monojit Choudhury.
  • 2023Best Paper Award at the BLP Workshop, EMNLP 2023, for low-resource Bangla style transfer.
Selected publications

Papers

A complete, up-to-date list lives on Google Scholar. Below are selected recent works.

01

Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings

Proceedings of CHI 2026★ Honorable Mention2026
02

Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi

Proceedings of EMNLP 2025 (Main)2025
03

Actions Speak Louder than Words: On the Cross-Lingual Invariance of Agentic Tool Use

COLM 2026Under review
04

The Geometry of LLM-as-Judge: Why Inter-LLM Consensus Is Not Human Alignment

EMNLP 2026Under review
05

Are Large Language Models Actually Good at Text Style Transfer?

Sourabrata Mukherjee, Atul Kr. Ojha, Ondřej Dušek

Proceedings of INLG 20252025
06

Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics?

NAACL 2025 — Student Research Workshop2025
07

Multilingual Text Style Transfer: Datasets & Models for Indian Languages

Sourabrata Mukherjee, Atul Kr. Ojha, Akanksha Bansal, Deepak Alok, John P. McCrae, Ondřej Dušek

Proceedings of INLG 20242024
08

A Survey of Text Style Transfer: Applications and Ethical Implications

Sourabrata Mukherjee, Mateusz Lango, Zdeněk Kasner, Ondřej Dušek

Northern European Journal of Language Technology (NEJLT)2024
09

Low-Resource Text Style Transfer for Bangla: Data & Models

Sourabrata Mukherjee, Akanksha Bansal, Pritha Majumdar, Atul Kr. Ojha, Ondřej Dušek

BLP Workshop, EMNLP 2023★ Best Paper2023
10

Polite Chatbot: A Text Style Transfer Application

Sourabrata Mukherjee, Vojtěch Hudeček, Ondřej Dušek

EACL 2023 — Student Research Workshop2023
11

Balancing the Style-Content Trade-off in Sentiment Transfer using Polarity-Aware Denoising

Sourabrata Mukherjee, Zdeněk Kasner, Ondřej Dušek

Int. Conference on Text, Speech, and Dialogue (TSD)2022
View all on Google Scholar
Experience

Research & software development

  • Postdoctoral Researcher — Microsoft Research

    2025 — present

    Multilingual NLP, LLM evaluation, and controllable generation for Language AI.

  • Visiting Research Scientist — MBZUAI

    2024 — 2025

    Cultural and cross-lingual dimensions of large language models, with Prof. Monojit Choudhury.

  • Ph.D. Researcher — UFAL, Charles University

    2019 — 2025

    Research on Text Style Transfer with neural language models. Advisor: Prof. Ondřej Dušek.

  • Research Intern — Panlingua Language Processing

    2022

    Low-resource machine translation for Indian languages.

  • Research Assistant — UKP Lab, TU Darmstadt

    2018 — 2019

    Context detection for scientific data-to-text generation.

  • Data Science Visiting Intern — Indian Institute of Science (IISc)

    2018

    Time-series forecasting and predictive analytics.

  • Lead ML Technical Specialist — Tricon Infotech

    2017 — 2019

    Product- and domain-specific recommendation engine at scale.

  • Senior Data Engineer — Avaya

    2016 — 2017

    Log-analysis pipeline with optimized storage and real-time monitoring.

  • Senior Analytics Engineer — o9 Solutions

    2015 — 2016

    Scalable enterprise planning recommendation framework.

  • Senior Software Engineer — Amdocs

    2014 — 2015

    Recommendation engine and search for e-commerce platforms.

Education

Academic background

2019 — 2025 · Prague, Czechia

Ph.D. in Computational Linguistics — Charles University

  • Thesis: Text Style Transfer using Neural Models
  • Advisor: Prof. Ondřej Dušek
2011 — 2013 · Durgapur, India

M.Tech in Computer Science & Engineering — NIT Durgapur

  • GPA: 9.06 / 10
  • Thesis: Real-world multi-objective optimization using evolutionary algorithms
  • Among the top-ranked students of the class
2007 — 2011 · Kolkata, India

B.Tech in Computer Science & Engineering — WBUT

  • GPA: 8.79 / 10
  • Project: Random number generation using Monte Carlo methods
  • Graduated as college topper
Generative AI with LLMs Deep Neural Networks & Tuning Machine Learning & Statistics via Coursera
Honours

Awards & grants

2023 · EMNLP

Best Paper Award — BLP Workshop

For Low-Resource Text Style Transfer for Bangla.

Charles University

CU Grant Agency Award — as PI

Charles University Grant Agency research grant, led as Principal Investigator.

Amdocs · Tricon Infotech

Star Employee of the Year

Recognised for outstanding contribution at two organisations.

National · India

National Winner — IBM Tech Competition

National-level technical competition organised by IBM.

Let's build something across languages

I'm always glad to talk about LLM evaluation, multilingual & cultural NLP, controllable generation, and applied AI — research collaborations or engineering roles alike.