Databases and Information Systems


Subhabrata Mukherjee

Subhabrata (Subho) Mukherjee

Machine Learning Scientist

 email: subhabrata.mukherjee.ju[aT]


I am a Machine Learning Scientist at Amazon (Seattle) building the Amazon Product Knowledge Graph. I work on developing machine learning and deep learning models for open information extraction and representation learning. My doctoral thesis at Max Planck Institute for Informatics (Germany) obtained the prestigious SIGKDD 2018 Doctoral Dissertation Award Runner-up (one of the top-3 best doctoral dissertations world-wide in data mining). My research interests involve representation learning and graphical models to capture the joint interaction between structure, content, and dynamics of information --- with a particular focus on interpretability.

In my PhD dissertation, I worked on probabilistic graphical models to extract "credible", "trustworthy" and "expert" information from large-scale, non-expert, user-generated online content. I developed machine learning models that exploit the joint interaction between users, language, and their evolution in online communities for tasks like: credibility analysis, personalized content recommendation, (latent) experience-aware item recommendation, finding (latent) topic-specific experts in online communities, spam and anomaly detection etc.

[PhD Thesis on Credibility Analysis]     [SIGKDD 2018 Dissertation Talk Slides]     [CV]

Research Interests

    I develop models and techniques in the following areas
    • Information Extraction
    • Machine Learning (specifically: Deep Learning, Generative Models, Graphical and Topic Models)
    • Text and Data Mining
    • Natural Language Processing

    for real-world problems and applications in
    • Recommender Systems, Personalized Recommendation, Expert Finding
    • Credibility Analysis, Truth Finding / Discovery
    • Sentiment Analysis and Opinion Mining, Spam and Anomaly Detection
    • Social Network and Review Analysis
    • Knowledge Graph Embedding

Recent News


Selected Invited Talks

  • MIT Media Lab, Cambridge, USA, December 2016
  • Amazon, Seattle, USA, December 2016
  • Bell Labs, Cambridge, UK, November 2016
  • IBM Research Lab, Zurich, Switzerland, August 2016

Recorded Talks

Recent Positions

  • Oct 2017 - :
    Machine Learning Scientist at Amazon (Seattle)
    • Building the Amazon Product Knowledge Graph --- the authoritative Knowledge Base of every item in the world --- from large-scale unstructured and structured data.

  • Mar 2017 - Sep 2017:
    Postdoctoral Researcher at Max Planck Institute for Informatics
    • Areas: Credibility Analysis, Recommender Systems, Influence Networks

  • Aug 2015 - Dec 2015:
    Intern in Google Research (Mountain View, CA) in Machine Learning and Intelligence
    • Worked on semantic annotation of large-scale datasets (audio, video, web-tables, map-reduce job logs etc.) with Knowledge Graph to improve Google Datasearch (GOODS) by making it aware of the salient semantic types of the entities present in any dataset.

  • Oct 2012 - Oct 2013:
    Research Engineer in IBM Research (India) in Human Language Technolgies
    • Domain Cartridge: Unsupervised framework for constructing domain ontologies from a corpus of knowledge articles that improves the recall of Question-Answering systems (e.g., Watson) by making it aware of domain-specific entities and their relations.
    • Self-Assist Systems: Unsupervised framework for self-assist systems that can serve as virtual call center agents to guide the customer in performing various domain-dependent tasks (like troubleshooting a problem, changing settings in devices, etc.).
    • Personalized Sentiment Analysis: Generative models for personalized recommendation that take into account user preferences, intent, latent item facets etc.
    • Intent Classification for Voice Search: Intent classification of voice queries on mobile devices (e.g., map, command-and-control, navigational, and knowledge-based queries for voice search).

Academic Service

  • Organizer: Domain Specific Speech and Language Understanding Workshop, Amazon Machine Learning Conference (AMLC 2018); Knowledge Graphs: Construction, Management and Querying, Semantic Web Journal (Editorial Board Member)
  • Panelist: National Science Foundation (NSF)
  • Program Committee: Amazon Research Awards (ARA 2017), Amazon Machine Learning Conference (AMLC 2018), Humanizing Artificial Intelligence (IJCAI 2018), Natural Language Interfaces for Web of Data (ISWC 2018), Exploiting AI for Data Management Systems (SIGMOD 2018), Interactive Data Exploration and Analytics (KDD 2017), Social Aspects in Personalization and Search (ECIR 2018)
  • Journal Reviewer: ACM Transactions on Knowledge Discovery from Data (TKDD), IEEE Transactions on Knowledge and Data Engineering (TKDE), Information Systems (Journal), Data Mining and Knowledge Discovery (DAMI), Artificial Intelligence (Journal), IEEE Transactions on Computational Social Systems (TCSS), Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Journal of Web Semantics, Journal of Human-Computer Studies

Research Areas and Publications [DBLP] [Google Scholar]

Information Extraction, Representation Learning

Finding Experts, Personalized Recommendation, User / Community Evolution, Topic / Generative Models, Review Communities

Credibility Analysis, Conditional Random Fields (CRF)

Domain Ontology, Sentiment Aggregation

Sentiment Analysis

Dialogue Systems, Intent Classification

  • Help Yourself: A Virtual Self-Assist Agent [Tags: @IBM Research]
    Subhabrata Mukherjee and Sachindra Joshi
    In WWW 2014, Seoul, South Korea [Demo Paper] [Slide V1] [Slide V2]

  • Intent Classification of Voice Queries on Mobile Devices [Tags: Voice Search, @IBM Research]
    Subhabrata Mukherjee, Ashish Verma and Kenneth W. Church
    In WWW 2013, Rio de Janeiro, Brazil [Poster] [Slide]

  • YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments using WordNet & Wikipedia
    Subhabrata Mukherjee and Pushpak Bhattacharyya
    In COLING 2012, Mumbai, India [Paper] [Slides] (Acceptance rate: 16%)