Enhancing Splice Site Detection Through Deep Learning and Retrieval-Augmented Generation
Poster Number
25
Faculty Mentor Name
Dr Julia Olivieri
Research or Creativity Area
Engineering & Computer Science
Abstract
Splice site recognition plays a critical role in understanding transcriptomic regulation, yet detecting these sites in single-cell RNA-seq data remains a major challenge due to sparsity and biological variability. This project focuses on building a deep learning framework that combines Convolutional Neural Networks (CNNs) for learning sequence patterns and Retrieval-Augmented Generation (RAG) for enhancing interpretability. CNNs are trained to distinguish donor and acceptor splice sites using k-mer encoded DNA sequences, capturing both local motifs and broader sequence context. In parallel, RAG is used to retrieve relevant biological knowledge and annotations that contextualize model predictions, offering a more interpretable explanation of splicing signals.
Purpose
We aim to find and better understand where splicing happens in genes by looking at the DNA sequences around those spots. Using a deep learning model called a CNN, we look for patterns that show where splicing starts and ends. To make the results easier to understand, we also use a method called Retrieval-Augmented Generation (RAG), which helps explain why the model made certain predictions by bringing in related information from trusted sources. This project helps us not only find splicing locations but also understand what makes them important, which could support future research in genetics and disease.
Results
Our CNN model achieved an accuracy of approximately 85% in predicting donor and acceptor splice sites from short DNA sequences. The use of k-mer features improved the model’s ability to detect important sequence patterns linked to splicing. Early tests with the RAG module showed it could retrieve useful biological information to help explain why certain predictions were made. These results suggest our approach is effective in both detecting and interpreting splicing regions in single-cell RNA-seq data.
Significance
By identifying and interpreting splice sites at the single-cell level, our work helps improve the understanding of how genes are regulated through splicing. The combination of deep learning and retrieval-based interpretation supports the development of more accurate and explainable tools for RNA analysis. This approach may also support future studies on gene function, disease-related splicing changes, and the design of new diagnostic tools based on transcriptomic data.
Location
University of the Pacific, DeRosa University Center
Start Date
26-4-2025 10:00 AM
End Date
26-4-2025 1:00 PM
Enhancing Splice Site Detection Through Deep Learning and Retrieval-Augmented Generation
University of the Pacific, DeRosa University Center
Splice site recognition plays a critical role in understanding transcriptomic regulation, yet detecting these sites in single-cell RNA-seq data remains a major challenge due to sparsity and biological variability. This project focuses on building a deep learning framework that combines Convolutional Neural Networks (CNNs) for learning sequence patterns and Retrieval-Augmented Generation (RAG) for enhancing interpretability. CNNs are trained to distinguish donor and acceptor splice sites using k-mer encoded DNA sequences, capturing both local motifs and broader sequence context. In parallel, RAG is used to retrieve relevant biological knowledge and annotations that contextualize model predictions, offering a more interpretable explanation of splicing signals.