Databases and Information Systems


Subhabrata Mukherjee

Subhabrata (Subho) Mukherjee

Senior Research Scientist, MSR AI

 email: subhabrata.mukherjee[aT]


I am a Senior Research Scientist at Microsoft Research AI (MSR AI) at Redmond working on natural language understanding, deep learning and personalization. Prior to joining MSR, I led the information extraction efforts to build the Amazon Product Knowledge Graph at Amazon, Seattle.

My doctoral thesis at Max Planck Institute for Informatics (Germany) on misinformation and fact-checking obtained the prestigious SIGKDD 2018 Doctoral Dissertation Award Runner-up (one of the top-3 best doctoral dissertations world-wide in data mining). My research interests involve representation learning and graphical models to capture the joint interaction between structure, content, and dynamics of information --- with a particular focus on interpretability and user-centric information needs.

[PhD Thesis on Credibility Analysis]     [SIGKDD 2018 Dissertation Talk Slides]     [CV]

Research Interests

  • Information Extraction (IE) (specifically: user-centric IE and IR)

  • Natural Language Understanding

  • Applied Machine Learning (specifically: Deep Learning, Generative Models, Graphical and Topic Models)

  • Misinformation, Fact-checking

Recent News

Workshops / Tutorials

Selected Invited Talks

Deep Learning for Knowledge Extraction and Integration to build the Amazon Product Graph at the Knowledge Graph Conference 2019, Columbia University, New York.

Modeling Joint Interactions for Information Extraction
  • Microsoft Research AI (MSR AI), Redmond, USA
  • Google AI, New York, USA
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities
  • SIGKDD 2018 Doctoral Dissertation Award Talk, London, UK
  • MIT Media Lab, Cambridge, USA
  • Amazon, Seattle, USA
  • Bell Labs, Cambridge, UK
  • IBM Research Lab, Zurich, Switzerland
Recorded Talks

Recent Positions

  • Apr 2019 - :
    Senior Research Scientist at Microsoft Research AI (MSR AI) (Redmond)
    • Working on natural language understanding, deep learning and personalization

  • Oct 2017 - Apr 2019:
    Machine Learning Scientist at Amazon (Seattle)
    • Leading the information extraction (IE) efforts to build the Amazon Product Knowledge Graph --- the authoritative Knowledge Base of every item in the world. Developing large-scale machine learning and deep learning models to extract structured knowledge from large-scale unstructured data for tasks like Named Entity Recognition (NER) and Data Imputation, OpenIE and Common Sense Knowledge Integration etc.

  • Mar 2017 - Sep 2017:
    Postdoctoral Researcher at Max Planck Institute for Informatics
    • Areas: Credibility Analysis, Recommender Systems, Influence Networks

  • Aug 2015 - Dec 2015:
    Intern in Google Research (Mountain View, CA) in Machine Learning and Intelligence
    • Worked on semantic annotation of large-scale datasets (audio, video, web-tables, map-reduce job logs etc.) with Knowledge Graph to improve Google Datasearch by making it aware of the salient semantic types of the entities present in any dataset.

  • Oct 2012 - Oct 2013:
    Research Engineer in IBM Research (India) in Human Language Technolgies
    • Domain Cartridge: Unsupervised framework for constructing domain ontologies from a corpus of knowledge articles that improves the recall of Question-Answering systems (e.g., Watson) by making it aware of domain-specific entities and their relations.
    • Self-Assist Systems: Unsupervised framework for self-assist systems that can serve as virtual call center agents to guide the customer in performing various domain-dependent tasks (like troubleshooting a problem, changing settings in devices, etc.).
    • Personalized Sentiment Analysis: Generative models for personalized recommendation that take into account user preferences, intent, latent item facets etc.
    • Intent Classification for Voice Search: Intent classification of voice queries on mobile devices (e.g., map, command-and-control, navigational, and knowledge-based queries for voice search).

  • July 2012 - Sep 2012:
    Technology Analyst in Credit Suisse Business Analytics Pvt. Ltd. (India)
    • Worked in High Frequency Trading

Academic Service

  • Organizer: Truth Discovery and Fact Checking: Theory and Practice Workshop (SIGKDD 2019), Domain Specific Speech and Language Understanding Workshop, Amazon Machine Learning Conference (AMLC 2018); Knowledge Graphs: Construction, Management and Querying, Semantic Web Journal (Editorial Board Member)

  • Panelist: National Science Foundation (NSF)

  • Program Committee: SIGKDD 2019 (Research Track + Applied Data Science Track), Amazon Research Awards (ARA 2017, ARA 2018), Amazon Machine Learning Conference (AMLC 2018), Humanizing Artificial Intelligence (IJCAI 2018, 2019), Natural Language Interfaces for Web of Data (ISWC 2018, 2019), Exploiting AI for Data Management Systems (SIGMOD 2018, 2019), Interactive Data Exploration and Analytics (KDD 2017), Social Aspects in Personalization and Search (ECIR 2018)

  • Journal Reviewer: PLOS One, ACM Transactions on Knowledge Discovery from Data (TKDD), IEEE Transactions on Knowledge and Data Engineering (TKDE), Information Systems (Journal), Data Mining and Knowledge Discovery (DAMI), Artificial Intelligence (Journal), IEEE Transactions on Computational Social Systems (TCSS), Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Journal of Web Semantics, Journal of Human-Computer Studies

Mentees (Interns and PhD Collaborators)

  • Dongxu Zhang, University of Massachusetts Amherst (Topic: OpenIE Knowledge Integration, Universal Schema for Information Extraction)

  • Guineng Zheng, University of Utah (Topic: Deep Sequence Tagging, Named Entity Recognition)

  • Hyun Ah Song, Carnegie Mellon University (Topic: Wisdom Graph for Common Sense Reasoning)

  • Kashyap Popat , Max Planck for Institute for Informatics (Topic: Misinformation, Fact-checking)

  • Rakshit Trivedi, Georgia Institute of Technology (Topic: Wisdom Graph for Common Sense Reasoning)

Research Areas and Publications [DBLP] [Google Scholar]

Information Extraction, Representation Learning

Finding Experts, Personalized Recommendation, User / Community Evolution, Topic / Generative Models, Review Communities

Credibility Analysis, Conditional Random Fields (CRF)

Domain Ontology, Sentiment Aggregation

Sentiment Analysis

Dialogue Systems, Intent Classification

  • Help Yourself: A Virtual Self-Assist Agent [Tags: @IBM Research]
    Subhabrata Mukherjee and Sachindra Joshi
    In WWW 2014, Seoul, South Korea [Demo Paper] [Slide V1] [Slide V2]

  • Intent Classification of Voice Queries on Mobile Devices [Tags: Voice Search, @IBM Research]
    Subhabrata Mukherjee, Ashish Verma and Kenneth W. Church
    In WWW 2013, Rio de Janeiro, Brazil [Poster] [Slide]

  • YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments using WordNet & Wikipedia
    Subhabrata Mukherjee and Pushpak Bhattacharyya
    In COLING 2012, Mumbai, India [Paper] [Slides] (Acceptance rate: 16%)