Publications

My research spans graph machine learning, computational biology, and network science. Below is a curated selection of key publications. For the complete list, visit my Google Scholar profile.

Transfer Learning for Temporal Link Prediction

Published in 2025 International Joint Conference on Neural Networks (IJCNN), 2025

Temporal graphs—networks that evolve over time—present challenges for link prediction models, which must generalize from one time window to future unknown connections. This paper investigates transfer learning for temporal link prediction, leveraging models pre-trained on diverse source graphs to improve performance on new target graphs with limited training data. We show that cross-graph transfer significantly boosts predictive accuracy and robustness over conventional baselines, with particular gains in sparse historical data settings.

Recommended citation: Chatterjee, A., Ikica, B., Ravandi, B., & Palowitch, J. (2025). Transfer Learning for Temporal Link Prediction. 2025 International Joint Conference on Neural Networks (IJCNN), 1-8.

Topology-Driven Negative Sampling Enhances Generalizability in Protein-Protein Interaction Prediction

Published in Journal / Oxford (Accepted), 2024

Prediction of protein-protein interactions (PPI) plays a central role in therapeutic drug development. Graph neural networks are increasingly popular in this context. In our work, we investigate negative sampling strategies. By systematically decoupling node attributes from the graph structure, we highlight that models can leverage topology to identify shortcuts, often sacrificing generalization. We propose a topology-driven sampling technique to force the model into learning biologically meaningful patterns instead of structural fallacies, achieving improved generalizability in hold-out interaction data.

Recommended citation: Chatterjee, Ayan, et al. "Topology-Driven Negative Sampling Enhances Generalizability in Protein-Protein Interaction Prediction."

SIENNA: Lightweight Generalizable Machine Learning Platform for Brain Tumor Diagnostics

Published in medRxiv preprint, 2024

SIENNA is a lightweight, generalizable machine learning platform designed for automated brain tumor diagnostics using MRI imaging. The platform integrates CNN-based classification with efficient preprocessing pipelines, enabling deployment in resource-constrained clinical environments. SIENNA demonstrates robust performance across diverse tumor types and imaging protocols, achieving high diagnostic accuracy with minimal computational overhead. This work contributes toward democratizing AI-assisted diagnostics in neurosurgical settings.

Recommended citation: Sunil, S., Rajeev, R. S., Chatterjee, A., Pilitsis, J., Mukherjee, A., & Paluh, J. L. (2024). SIENNA: Lightweight Generalizable Machine Learning Platform for Brain Tumor Diagnostics. medRxiv, 2024.04.03.24305210. https://www.medrxiv.org/content/10.1101/2024.04.03.24305210

Representation Learning of Human Disease Mechanisms for a Foundation Model in Rare and Common Diseases

Published in bioRxiv preprint, 2024

This work addresses a critical gap in biomedical AI: developing foundation models capable of representing human disease mechanisms at scale for both rare and common diseases. We leverage large-scale biological knowledge graphs and multi-modal data to learn unified disease representations, enabling zero-shot generalization to unseen conditions. Our approach integrates gene expression, protein interactions, and clinical phenotype data into a joint embedding space, demonstrating strong performance in disease classification, comorbidity prediction, and drug repurposing tasks — with implications for accelerating rare disease research.

Recommended citation: Ravandi, B., Mowrey, W. R., Chatterjee, A., Haddadi, P., Abdelmessih, M., et al. (2024). Representation Learning of Human Disease Mechanisms for a Foundation Model in Rare and Common Diseases. bioRxiv, 2024.11.19.624381. https://www.biorxiv.org/content/10.1101/2024.11.19.624381

Generating Human Understandable Explanations for Node Embeddings

Published in arXiv preprint arXiv:2406.07642, 2024

Graph neural networks and embedding methods have achieved strong performance in many tasks, yet their outputs lack human-interpretable explanations. This paper proposes a framework that generates natural-language explanations for node embeddings in graph machine learning models. By linking learned representations to subgraph structures and semantic node attributes, the framework helps practitioners understand why a model assigns certain embeddings — an important step toward explainable AI for graph-structured data. We evaluate our approach on citation, social, and biological networks, showing strong alignment between model rationale and human expert judgment.

Recommended citation: Shafi, Z., Chatterjee, A., & Eliassi-Rad, T. (2024). Generating Human Understandable Explanations for Node Embeddings. arXiv:2406.07642. https://arxiv.org/abs/2406.07642

Inductive Link Prediction in Static and Temporal Graphs for Isolated Nodes

Published in Temporal Graph Learning Workshop @ NeurIPS 2023, 2023

Link prediction for low-degree and isolated nodes remains an open challenge in graph machine learning, where most models overfit to topological neighborhoods. This paper proposes inductive link prediction methods that generalize to isolated nodes in both static and temporal graph settings. We leverage unsupervised pre-training on large corpora to generate rich node representations independent of graph structure, enabling accurate predictions even for newly introduced nodes with no observed edges. Experiments across multiple benchmark datasets confirm substantial improvements in inductive generalization over existing baselines.

Recommended citation: Chatterjee, A., Walters, R., Menichetti, G., & Eliassi-Rad, T. (2023). Inductive Link Prediction in Static and Temporal Graphs for Isolated Nodes. Temporal Graph Learning Workshop @ NeurIPS 2023.

Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction

Published in Conference on Complex Networks, 2023

A primary problem in link prediction lies in models memorizing training network configurations instead of learning meaningful node features. We study how to correctly disentangle topological constraints (like path length or degree biases) from innate node attributes. This paper provides experimental observations proving the lack of generalizability caused by observational biases and introduces novel formulations to achieve unbiased inductive link prediction over unseen environments.

Recommended citation: Chatterjee, Ayan. "Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction." (2023).

Improving the generalizability of protein-ligand binding predictions with AI-Bind

Published in Nature Communications, 2023

Developing accurate and robust predictors for protein-ligand interactions remains extremely challenging in early-stage drug discovery, especially for previously unseen targets and drug molecules. Existing machine learning models often fail to generalize, relying on topological shortcuts. AI-Bind maximizes inductive test performance by combining network-based negative sampling with unsupervised pre-training for molecular embeddings. This provides interpretable predictions, actively identifies binding sites, and greatly accelerates the computational biology pipeline.

Recommended citation: Chatterjee, Ayan, et al. "Improving the generalizability of protein-ligand binding predictions with AI-Bind." (2023).

Dr. Ayan Chatterjee

Publications