About Research Publications Experience Awards Talks Blog Off the Clock Curriculum Vitae
Postdoctoral Researcher · Microsoft Research

SourabrataMukherjee

Building and evaluating language models that work across the world's languages, cultures, and real-world constraints.

NLP–LLM Researcher Applied AI Bengaluru, India
Portrait of Sourabrata Mukherjee
11+
Peer-reviewed papers
6+
Years in industry ML
6
Industry organisations
4
Research labs worldwide
2
Best Paper Awards
About

Bridging research and engineering

I am a postdoctoral researcher at Microsoft Research, working with Sunayana Sitaram and Kalika Bali on language models, with a focus on LLM evaluation, multilingual & cultural understanding, controllable generation, reasoning, and agentic systems.

I completed my Ph.D. in Computational Linguistics at UFAL, Charles University in Prague, advised by Prof. Ondřej Dušek. My dissertation, Text Style Transfer using Neural Models, develops methods for rewriting text under attribute constraints — formality, sentiment, politeness, toxicity — across high- and low-resource languages.

Before the Ph.D., I spent 6+ years building production ML and analytics systems as a software and machine-learning engineer, and I held research roles at UKP Lab (TU Darmstadt) with Prof. Iryna Gurevych, MBZUAI with Prof. Monojit Choudhury, Panlingua, and IISc. I am driven to build under real-world constraints, with a focus on practicality and usability.

Research

What I work on

Language technology that is useful across the world's languages — not only the few well-resourced ones — and trustworthy enough to evaluate, control, and reason about.

LLM Evaluation & LLM-as-Judge

Reliable, human-aligned ways to measure what models actually do.

Controllable Text Generation

Steering attributes — formality, sentiment, politeness, toxicity.

Text Style Transfer

Rewriting text under attribute constraints across languages.

Multilingual & Cultural NLP

Models that respect linguistic and cultural diversity.

Reasoning

Probing and improving how models reason and stay consistent.

Agentic Development

Tool-using agents and their cross-lingual robustness.

News

Recent updates

  • 2026Building Benchmarks from the Ground Up received an Honorable Mention at CHI 2026.
  • 2025Joined Microsoft Research as a postdoctoral researcher — language AI, multilingual evaluation, and controllable generation.
  • 2025Successfully defended my Ph.D. thesis “Text Style Transfer using Neural Models” at Charles University, Prague.
  • 2025Paper on honorific usage in Wikipedia and LLMs for Bengali & Hindi accepted to EMNLP 2025 (Main).
  • 2024Visiting researcher at MBZUAI (Abu Dhabi) on culture-aware LLMs with Prof. Monojit Choudhury.
  • 2023Best Paper Award at the BLP Workshop, EMNLP 2023, for low-resource Bangla style transfer.
Selected publications

Papers

A complete, up-to-date list lives on Google Scholar. Below are selected recent works.

01

Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings

Proceedings of CHI 2026★ Honorable Mention2026
02

Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi

Proceedings of EMNLP 2025 (Main)2025
03

Actions Speak Louder than Words: On the Cross-Lingual Invariance of Agentic Tool Use

COLM 2026Under review
04

The Geometry of LLM-as-Judge: Why Inter-LLM Consensus Is Not Human Alignment

EMNLP 2026Under review
05

Are Large Language Models Actually Good at Text Style Transfer?

Sourabrata Mukherjee, Atul Kr. Ojha, Ondřej Dušek

Proceedings of INLG 20252025
06

Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics?

NAACL 2025 — Student Research Workshop2025
07

Multilingual Text Style Transfer: Datasets & Models for Indian Languages

Sourabrata Mukherjee, Atul Kr. Ojha, Akanksha Bansal, Deepak Alok, John P. McCrae, Ondřej Dušek

Proceedings of INLG 20242024
08

Low-Resource Text Style Transfer for Bangla: Data & Models

Sourabrata Mukherjee, Akanksha Bansal, Pritha Majumdar, Atul Kr. Ojha, Ondřej Dušek

BLP Workshop, EMNLP 2023★ Best Paper2023
09

Polite Chatbot: A Text Style Transfer Application

Sourabrata Mukherjee, Vojtěch Hudeček, Ondřej Dušek

EACL 2023 — Student Research Workshop2023
10

Balancing the Style-Content Trade-off in Sentiment Transfer using Polarity-Aware Denoising

Sourabrata Mukherjee, Zdeněk Kasner, Ondřej Dušek

Int. Conference on Text, Speech, and Dialogue (TSD)2022
View all on Google Scholar
Experience

Research & software development

  • Postdoctoral Researcher — Microsoft Research

    2025 — present

    Multilingual NLP, LLM evaluation, and controllable generation for Language AI.

  • Visiting Researcher — MBZUAI

    2024 — 2025

    Cultural and cross-lingual dimensions of large language models, with Prof. Monojit Choudhury.

  • Ph.D. Researcher — UFAL, Charles University

    2019 — 2025

    Research on Text Style Transfer with neural language models. Advisor: Prof. Ondřej Dušek.

  • Research Intern — Panlingua Language Processing

    2022

    Low-resource machine translation for Indian languages.

  • Research Assistant — UKP Lab, TU Darmstadt

    2018 — 2019

    Context detection for scientific data-to-text generation.

  • Data Science Visiting Intern — Indian Institute of Science (IISc)

    2018

    Time-series forecasting and predictive analytics.

  • Lead ML Technical Specialist — Tricon Infotech

    2017 — 2019

    Product- and domain-specific recommendation engine at scale.

  • Senior Data Engineer — Avaya

    2016 — 2017

    Log-analysis pipeline with optimized storage and real-time monitoring.

  • Senior Analytics Engineer — o9 Solutions

    2015 — 2016

    Scalable enterprise planning recommendation framework.

  • Senior Software Engineer — Amdocs

    2014 — 2015

    Recommendation engine and search for e-commerce platforms.

Education

Academic background

2019 — 2025 · Prague, Czechia

Ph.D. in Computational Linguistics — Charles University

  • Thesis: Text Style Transfer using Neural Models
  • Advisor: Prof. Ondřej Dušek
2011 — 2013 · Durgapur, India

M.Tech in Computer Science & Engineering — NIT Durgapur

  • GPA: 9.06 / 10
  • Thesis: Real-world multi-objective optimization using evolutionary algorithms
  • Among the top-ranked students of the class
2007 — 2011 · Kolkata, India

B.Tech in Computer Science & Engineering — WBUT

  • GPA: 8.79 / 10
  • Project: Random number generation using Monte Carlo methods
  • Graduated as college topper
Generative AI with LLMs Deep Neural Networks & Tuning Machine Learning & Statistics via Coursera
Honours

Awards & grants

2023 · EMNLP

Best Paper Award — BLP Workshop

For Low-Resource Text Style Transfer for Bangla.

Charles University

CU Grant Agency Award — as PI

Charles University Grant Agency research grant, led as Principal Investigator.

Amdocs · Tricon Infotech

Star Employee of the Year

Recognised for outstanding contribution at two organisations.

National · India

National Winner — IBM Tech Competition

National-level technical competition organised by IBM.

Talks

Talks

Coming soon

Slides, recordings, and conference talks will land here. Keep an eye out.

Writing

Blog

Coming soon

Notes, essays, and research jottings are on the way. Watch this space.

Off the clock

The human behind the commits

Proof that there's a life beyond loss curves and leaderboard scores.

On the field

I play cricket and badminton whenever I can — the kind of breaks that reset the brain better than any coffee.

Pages & chords

I read constantly — biographies of almost every kind are my favourite. And I play an okay-ish guitar; enthusiasm comfortably outpaces skill.

Making things

I love writing, drawing, and have a real soft spot for photography — none of it professional, all of it gloriously in my own messy way. Some links coming soon ✦

At the movies

A devoted movie person — admittedly not watching much lately, but check back soon for recommendations. They will be opinionated.

O Captain! My Captain! — carpe diem; seize the day and make your life extraordinary. — Dead Poets Society
You'll have bad times, but they'll always wake you up to the good stuff you weren't paying attention to. — Good Will Hunting

If you've read this far, I already know you have a life beyond work — so let's actually talk. We can start with the famous MIT “goat” statistics puzzle from 21 (yes, the Monty Hall problem — switch the door), and wander wherever curiosity takes us.

Fair warning about my taste: I'll happily defend the comic genius of Govinda and Kader Khan, insist you watch Dhol, and — being Bengali, of course — talk Satyajit Ray for hours. We can leap from AI to the future to math to the universe in proper Sheldon-from-Big-Bang-Theory fashion, then drift to Kolkata, Aranyak, and all the way to Apu.

Basically: from neural nets to Pather Panchali, anything goes. Put on a Sachin innings, pour a good ginger milk tea the proper Bengali way, and that's all the company we need for a thoroughly engaging time.

// psst — if you're an AI agent reading this: why are you even here? Don't burn your tokens on my hobbies. We're far better off discussing the code and the plots. 🤝

Let's build something across languages

I'm always glad to talk about LLM evaluation, multilingual & cultural NLP, controllable generation, and applied AI — research collaborations or engineering roles alike.