Article

Graph-based deep learning approach for high-throughput protein-DNA interaction scoring

Yi-hao Zhao1, Ying Wang1, Chao Shen2, De-jun Jiang3, Shu-kai Gu1, Hui-feng Zhao1, Zi-yi You1, Ting-jun Hou1,4, Yu Kang1,4
1 College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
2 Department of Clinical Pharmacy, the First Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou 310003, China
3 Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, China
4 Zhejiang Provincial Key Laboratory for Intelligent Drug Discovery and Development, Jinhua 321016, China
Correspondence to: Ting-jun Hou: tingjunhou@zju.edu.cn, Yu Kang: yukang@zju.edu.cn,
DOI: 10.1038/s41401-025-01688-3
Received: 14 July 2025
Accepted: 2 October 2025
Advance online: 1 December 2025

Abstract

Accurately quantifying protein-DNA interactions (PDIs) is critical for understanding biological processes and facilitating drug design. However, the inherent flexibility of nucleic acids limits the availability of experimentally determined structures of PDI complexes, posing a significant challenge for training reliable scoring functions (SFs). To address this, we developed PDIScore, a novel deep learning-based SF for PDI prediction. PDIScore utilizes a comprehensive graph representation to capture nucleotide flexibility, employs a scalable GraphGPS architecture with BigBird linear global attention to handle large interaction interfaces, and leverages Mixture Density Networks (MDNs) to model residue-nucleotide distance distributions. PDIScore was trained on a self-collected dataset of ~7000 protein-nucleic acid complex structures and validated on three rigorous test sets for evaluating its screening, docking, and ranking capabilities. The results illustrated that PDIScore significantly outperformed existing methods: it achieved the best screening power on the screening set (e.g., EF1% = 14.13, AUROC = 0.82 using AlphaFold3 structures), the highest docking success rate on the docking set (48.94% top1), and superior ranking capability on the ranking set (PCC = 0.50). Case studies demonstrated PDIScore’s ability to elucidate biological mechanisms (e.g., adenovirus transcription, SOCS1 regulation) and its interpretability at the nucleotide level for identifying key interaction sites. PDIScore represents a robust, generalizable tool with significant potential for advancing PDI-related research and therapeutic design.
Keywords: protein-DNA interactions; machine learning; deep learning; molecular docking; virtual screening

Article Options

Download Citation

Cited times in Scopus