Research & Creativity Showcase - All Projects

Deep Learning for Genomic Feature Prediction Using Semantic k-mer Embeddings and a CNN-LSTM Architecture

Abdullah Tariq Choudhry, University of the PacificFollow
Julia Eve Olivieri, University of the PacificFollow

Poster Number

Lead Author Affiliation

Computer Science

Lead Author Status

Undergraduate - Senior

Second Author Affiliation

Computer Science

Second Author Status

Faculty Mentor

Faculty Mentor Name

Julia Olivieri

Research or Creativity Area

Engineering & Computer Science

Abstract

This study presents a novel deep learning approach for the prediction of genomic features in genomic sequences. The identification of functional elements within genomes remains a fundamental challenge in computational biology. Traditional methods often rely on sequence conservation or specific motifs but may miss complex patterns and dependencies that characterize genomic features. We developed a hybrid convolutional neural network and long short-term memory (CNN-LSTM) architecture enhanced with k-mer based Word2Vec embeddings to capture both local motifs and long-range dependencies in DNA sequences. Our approach transforms DNA sequences into overlapping k-mers of varying lengths (3-5 bp), which are then converted to dense vector representations using Word2Vec, allowing the model to learn semantic relationships between sequence patterns. The multi-scale CNN component employs convolutional layers with different kernel sizes to detect sequence motifs at various scales, while the bidirectional LSTM captures long-range interactions and positional context. Using human chromosome 1 data (hg38), we trained our model to distinguish exon regions from non-exon regions. The model demonstrated the ability to distinguish exon from non-exon regions by learning complex sequence signatures, even in the absence of explicit splice site cues. This approach offers several advantages over traditional methods: (1) it requires no prior knowledge of sequence motifs or conservation, (2) it automatically learns relevant features at multiple scales, and (3) it can potentially be adapted to identify diverse genomic elements.

Location

University of the Pacific, DeRosa University Center

Start Date

26-4-2025 10:00 AM

End Date

26-4-2025 1:00 PM

This document is currently not available here.

COinS

Apr 26th, 10:00 AM Apr 26th, 1:00 PM

Deep Learning for Genomic Feature Prediction Using Semantic k-mer Embeddings and a CNN-LSTM Architecture

University of the Pacific, DeRosa University Center

Research & Creativity Showcase - All Projects

Deep Learning for Genomic Feature Prediction Using Semantic k-mer Embeddings and a CNN-LSTM Architecture

Poster Number

Lead Author Affiliation

Lead Author Status

Second Author Affiliation

Second Author Status

Faculty Mentor Name

Research or Creativity Area

Abstract

Location

Start Date

End Date

Search

Browse

Author Corner

Research & Creativity Showcase - All Projects

Deep Learning for Genomic Feature Prediction Using Semantic k-mer Embeddings and a CNN-LSTM Architecture

Authors

Poster Number

Lead Author Affiliation

Lead Author Status

Second Author Affiliation

Second Author Status

Faculty Mentor Name

Research or Creativity Area

Abstract

Location

Start Date

End Date

Share

Search

Browse

Author Corner