Hi! I'm a fourth-year PhD student in the Computer Science and Engineering Department at The State University of New York at Buffalo, advised by Professor Rohini K. Srihari. My research focuses on conversational AI, specifically building personalized large language models (LLMs) that adapt to user-specific traits and interactions. I am passionate about advancing dialogue systems that feel natural, empathetic, and human-like.

In my recent work, I’ve been exploring how language models can understand and express human personality in a more continuous and natural way. Our COLING 2025 paper, “Beyond Discrete Personas: Personality Modeling Through Journal-Intensive Conversations”, introduces a novel approach where personality traits are derived from long-form personal narratives rather than fixed labels. You can read more about it on the project page. Building on this foundation, my latest research expands personalization into cross-domain reasoning. In our IJCNLP-AACL 2025 paper, “Harmonious Minds: Benchmarking Intertwined Reasoning of Human Personality and Musical Preference”, we examine how LLMs connect distinct domains, such as personality and musical structure, to reason about them jointly and meaningfully.

My interest in personalization began in the summer of 2023 during a collaborative study with experts in the Department of Communicative Disorders and Sciences at Buffalo, led by Dr. Jeffry Higginbotham. We aimed to create conversational systems for individuals with non-standard speech, helping them communicate more effectively and authentically. This project focused on empowering differently-abled individuals by developing AI systems that could speak on their behalf with greater nuance and expressiveness. A portion of this work was recently published at the EMNLP 2024 Workshop - CustomNLP4U.

Recently, I started working as a Member of Technical Staff Intern at Nutanix, where I learned to work with large-scale system logs and semi-structured industrial data. Our goal is to build an agentic LogRAG system that assists QA engineering teams in triaging service failures and performing root-cause analysis (RCA). Through this experience, I've gained a deep understanding of how retrieval-augmented generation (RAG) pipelines can be adapted to real-world production environments, where challenges extend beyond clean datasets to handling noisy, high-volume, and constantly evolving log streams. Benchmarking the system's retrieval quality and efficiency has shown me how vital retrievers are in industrial AI pipelines: they form the bridge between massive unstructured data and actionable insights. This experience beautifully connects back to the concepts I taught in my Information Retrieval (CSE 535) course, but now applied on a much larger, dynamic scale. It's been an incredible learning journey to see how theoretical IR principles translate into high-impact, production-grade systems.

Apart from spending my time doing research, I hold a keen interest in music and playing instruments!

I am currently looking for summer 2026 internships in areas related to the personalization of LLMs and conversational agents. If my research interests align with your team, please don't hesitate to reach out!

For more details, feel free to check out my CV or drop me an email!

Publications

❄️ Recent
Harmonious Minds: Benchmarking Intertwined Reasoning of Human Personality and Musical Preference
Sayantan Pal, Souvik Das, Rohini K. Srihari
IJCNLP-AACL '25 | International Joint Conference on Natural Language Processing & Asia-Pacific Chapter of the Association for Computational Linguistics
Details coming soon

❄️ Recent
Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations
Sayantan Pal, Souvik Das, Rohini K. Srihari
COLING '25 | International Conference on Computational Linguistics
web| pdf| slides| poster| cite | code | dataset | certificate

Empowering AAC Users: A Systematic Integration of Personal Narratives with Conversational AI
Sayantan Pal, Souvik Das, Rohini K. Srihari, Jeff Higginbotham, Jenna Bizovi
EMNLP '24 Workshop (CustomNLP4U) | Conference on Empirical Methods in Natural Language Processing
pdf| cite | poster | video | code | certificate

Mitigating Clickbait: An Approach to Spoiler Generation Using Multitask Learning
Sayantan Pal, Souvik Das, Rohini K. Srihari
ICON '23 | International Conference on Natural Language Processing
pdf | cite | poster | certificate

Summary Generation using NLP Techniques and Cosine Similarity
Sayantan Pal, Maiga Chang, Maria Fernandez Iriarte
ISDA '21 | Intelligent Systems Design and Applications
web| pdf| cite | slides | video | certificate

Professional Experience

Member of Technical Staff Intern, Nutanix
Jun '25 - Dec '25, San Jose, CA, United States
Manager: Geetha Srikantan
- Developing an Agentic LogRAG system for the NAI team to help QA engineering teams triage service failures and perform root-cause analysis (RCA); benchmarking system quality and efficiency, reducing manual investigation time from days to minutes.

Mitacs Globalink Research Internship
May '21 - Jul '21, Athabasca University, Edmonton, AB, Canada
Supervisor: Dr. Maiga Chang
- Developed a web-based Automatic Answering Service for COVID-19 queries, processing over 236,000 texts with NLP techniques
web| pdf| cite | slides| video | certificate

Amazon Alexa Student Influencer
June '20 - June '21, Amazon, Bangalore, India
Mentor: Karthik Ragubathy
- Built Alexa skills using Node.js, organized developer events, and led a community of 250+ Alexa enthusiasts.
program| community| webinar day 1 | webinar day 2

Summer Research Internship, IIT KGP
June '20 - Aug '20, Indian Institute of Technology, Kharagpur, WB, India
Supervisor: Dr. Pabitra Mitra
- Researched land-cover classification in satellite images, reviewed semantic segmentation algorithms, and analyzed data from various sources.
certificate | notebook

Elahe Technologies Summer Internship
May '20 - Jul '20, Elahe Technologies, Kolkata, WB, India
Supervisor: Dr. Prosenjit Gupta
- Conducted sentiment analysis on 346,355 Amazon HPC reviews, vectorized data with Tf-Idf and Word2Vec, achieving ~85% accuracy.
code | certificate

Teaching Experience

❄️ Instructor
CSE 535 - Information Retrieval (Fall '23)
This course delves deep into text-based information retrieval (IR) techniques, offering a comprehensive look at search engines
course| feedback

❄️ Teaching Assistant
CSE 635, CSE 4/560, CSE 4/587
- Spring '24: Graduate Teaching Assistant for CSE 635 - NLP and Text Mining
- Spring '23: Graduate Teaching Assistant for CSE 4/560 - Data Mining Query Language
- Fall '22: Graduate Teaching Assistant for CSE 4/587 - Data Intensive Computing

Invited Talks and Interviews

❄️ Guest Speaker
Project Converse 2023
Project Converse is aimed at improving in-person expressive communication for individuals with complex communication needs (CCN) who use augmentative and alternative communication (AAC).
home| participants| slides

Achievements and Leadership

❄️ Reviewer
EMNLP '23, '25
IUI '26

❄️ Grand Finalists
TVS Credit E.P.I.C Season 2 2021
Qualified for the Grand Finale of TVS Credit E.P.I.C Season 2 among the top 10 teams
award | certificate

❄️ International Rank 21
International Informatics Olympiad 2014
Secured International Rank 21 in International Informatics Olympiad (IIO) 2014, Level 2 (Obtained silver medal)
certificate