Max-Planck-Institut für Informatik
Department 5: Databases and Information Systems
Campus E1 4, Room 408
Phone: +49 681 9325 5008
Fax: +49 681 9325 999
I am a final year PhD candidate (dissertation submitted: March, 2017) in Computer Science at Max Planck Institute for Informatics (Saarbruecken, Germany) advised by Prof. Gerhard Weikum.
I am currently working on developing probabilistic graphical models to capture user interests, expressions, expertise, and evolution over time in online communities by leveraging their social network, and language of interactions. I am working on generative processes, stochastic models (both discrete and continuous time), random fields etc. with a particular focus on interpretability. These are used to perform a wide variety of tasks ranging from personalized recommendation, sentiment analysis & opinion mining, expert-finding, to distilling "credible" and "trustworthy" information from user-generated online content to name a few.
Prior to PhD, I worked at IBM Research Lab (Delhi, India) --- where I worked on developing unsupervised framework for constructing domain ontologies from unannotated corpus for use by Question-Answering systems, Self-assist systems as virtual call center agents for guiding customers, and intent classification of voice queries for Dialogue systems.
- Text and Data Mining
- Probabilistic Graphical Models
- Machine Learning
- Information Extraction
- Natural Language Processing
- Social Networks
- Paper accepted in SDM 2017, Texas, USA
- Paper accepted in WWW 2017 (Web-science track), Perth, Australia
- Paper accepted in KDD 2016, San Francisco, USA
- Paper accepted in ECML 2016, Riva del Garda, Italy
- Paper accepted in CIKM 2016, Indianapolis, USA
- Intern in Google Research (Mountain View, CA) in Machine Learning and Intelligence from Aug, 2015
- Aug 2015 - Dec 2015:
Intern in Google Research (Mountain View, CA) in Machine Learning and Intelligence
- Worked on semantic annotation of large-scale datasets (audio, video, web-tables, map-reduce job logs etc.) with Knowledge Graph to improve Google Datasearch (GOODS) by making it aware of the salient semantic types of the entities present in any dataset.
- Oct 2012 - Oct 2013:
Research Engineer in IBM Research (India) in Human Language Technolgies
- Domain Cartridge: Unsupervised framework for constructing domain ontologies from a corpus of knowledge articles that improves the recall of Question-Answering systems (e.g., Watson) by making it aware of domain-specific entities and their relations.
- Self-Assist Systems: Unsupervised framework for self-assist systems that can serve as virtual call center agents to guide the customer in performing various domain-dependent tasks (like troubleshooting a problem, changing settings in devices, etc.).
- Personalized Sentiment Analysis: Generative models for personalized recommendation that take into account user preferences, intent, latent item facets etc.
- Intent Classification for Voice Search: Intent classification of voice queries on mobile devices (e.g., map, command-and-control, navigational, and knowledge-based queries for voice search).
- July 2012 - Sep 2012:
Technology Analyst in Credit Suisse Business Analytics Pvt. Ltd. (India)
- Worked in High Frequency Trading
Dialogue Systems, Intent Classification
Help Yourself: A Virtual Self-Assist Agent [Tags: @IBM Research]
Subhabrata Mukherjee and Sachindra Joshi
In WWW 2014 [Demo Paper] [Slide V1] [Slide V2]
Intent Classification of Voice Queries on Mobile Devices [Tags: Voice Search, @IBM Research]
Subhabrata Mukherjee, Ashish Verma and Kenneth W. Church
In WWW 2013 [Poster] [Slide]
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments using WordNet & Wikipedia
Subhabrata Mukherjee and Pushpak Bhattacharyya
In COLING 2012 [Paper] [Slides] (Acceptance rate: 16%)
ACM Transactions on Knowledge Discovery from Data (TKDD), IEEE Transactions on Knowledge and Data Engineering (TKDE), Information Systems (Journal), Journal of Web Semantics, Journal of Human-Computer Studies, NSF Data Science Workshop