Research Interests

I have detailed my motivations for graduate studies and long-term academic goals in my Statement of Objectives. The document articulates my vision for AI research—what I hope to pursue in the near future—and the deeper motivations in joining academia: to conduct meaningful research that benefits diverse communities and to guide and uplift underprivileged aspiring researchers.

Current Research

I am interested in multimodal representation. In particular, probing and investigating the embedding space of LMMs, unimodal and multimodal encoders to shed light on how different modalities come together to generate outputs. One particular aspect that intrigues me is the concept of synergy [1,2] where two or more modalities combine to generate new information that is not present in any single modality alone. Exploring and advancing this concept can be very powerful in understanding social cues, which would in turn enable a LMM to have more meaningful interactions with humans.

Along with synergy, there is redundant and unique information [1,2] in individual modalities that can help inform the model on the task. I am also interested in exploring how LMMs can reason with these types of information through agentic workflows [3,4] - decomposing the bigger picture into fine-grained details for subtler social cues (e.g. body language) alongside self-reflective thought processes for sanity checks.

Currently, I contextualize my research using humor. Humor intrigues me not only because it is a complex and challenging idea for intelligent systems to comprehend [5], but also because it requires a deep understanding of social and cultural norms [6]. Hence, it is an important marker of social intelligence that is unique to humans to further collaborative AI. Yet, humor can be strategically misused by adversaries to target specific community groups by hiding toxic and dangerous messages [7]. In this light, having a socially intelligent multimodal AI would not only further collaborative AI, but also safeguard against the manipulation of social signals for malicious purposes.

  1. P. P. Liang et al., “Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework,” Advances in Neural Information Processing Systems, vol. 36, pp. 27351–27393, Dec. 2023.

  2. N. Bertschinger, J. Rauh, E. Olbrich, J. Jost, and N. Ay, “Quantifying unique information,” Entropy, vol. 16, no. 4, pp. 2161–2183, Apr. 2014, doi: 10.3390/e16042161.

  3. “Building Effective AI Agents.” Accessed: Apr. 20, 2025. [Online]. Available: https://www.anthropic.com/engineering/building-effective-agents

  4. “What Is Agentic Reasoning? | IBM.” Accessed: Apr. 20, 2025. [Online]. Available: https://www.ibm.com/think/topics/agentic-reasoning

  5. Y. Chang et al., “A Survey on Evaluation of Large Language Models,” ACM Trans. Intell. Syst. Technol., vol. 15, no. 3, p. 39:1-39:45, Mar. 2024, doi: 10.1145/3641289.

  6. T. Jiang, H. Li, and Y. Hou, “Cultural Differences in Humor Perception, Usage, and Implications,” Front. Psychol., vol. 10, Jan. 2019, doi: 10.3389/fpsyg.2019.00123.

  7. J. Ji, W. Ren, and U. Naseem, “Identifying Creative Harmful Memes via Prompt based Approach,” in Proceedings of the ACM Web Conference 2023, in WWW ’23. New York, NY, USA: Association for Computing Machinery, Apr. 2023, pp. 3868–3872. doi: 10.1145/3543507.3587427.


Page design by Ankit Sultana