2D RNA Folding ML Class Competetion

Built a DeepResUNet-Transformer hybrid from scratch, achieving the highest F1 score in a class of PhD and Master's students.

Oct 2025 — Dec 2025

Academic

completed

Machine LearningPythonPyTorchTransformersU-NetResNetBioinformaticsKaggle

View Source

Overview

Course project for COEN 432 (Evolutionary Algorithms and Machine Learning) that turned into one of my biggest academic wins. The task: predict how RNA sequences fold in 3D space — which nucleotides pair with which.

Result: Highest F1 score in the entire class.

The class was full of PhD and Master's students. I'm an undergraduate. I beat them all — and not by a small margin.

The Challenge

Given an RNA sequence (a string of A, U, G, C nucleotides), predict its 3D folding pattern. Specifically: output a contact map showing which bases pair with each other.

This is a real bioinformatics problem. RNA structure determines function, and accurate prediction has implications for drug design, genetic research, and understanding disease mechanisms.

My Approach

Instead of using an off-the-shelf architecture, I built a hybrid model from scratch:

DeepResUNet-Transformer

Component	Purpose
ResNet backbone	Extract hierarchical features from sequence data
U-Net architecture	Encoder-decoder with skip connections for spatial precision
Custom Transformer	Capture long-range dependencies between nucleotides

I implemented the transformer attention mechanism myself — not imported from a library, but built from scratch to understand exactly how it works.

The Learning Process

This wasn't just about winning. I spent immense time learning:

How U-Net works (and why skip connections matter)
How ResNet's residual connections enable deep networks
How transformers capture relationships across long sequences
Training dynamics, hyperparameter sweeps, and debugging loss curves

Competition Format

This was structured as a Kaggle competition with:

Public leaderboard: Feedback during development
Private leaderboard: Final evaluation (hidden test set)

I had the highest score on both.

When I asked the professor for feedback, he confirmed: "You have the highest score by far, by a big margin."

Why Solo?

The project allowed teams of two. I chose to work alone — not because I don't like collaboration, but because I wanted to learn everything deeply. Every architecture choice, every training run, every failure was mine to understand.

Technical Details

Framework: PyTorch
Training: Multiple architecture experiments, hyperparameter sweeps
Metrics: F1 score optimization for imbalanced base-pair prediction
Hardware: GPU training with mixed precision (FP16)

What This Proved

I can compete with graduate students in ML when I put in the work. The key wasn't being smarter — it was investing the time to truly understand what I was building instead of copy-pasting someone else's solution.

Gallery

2D RNA Folding ML Class Competetion gallery 3

2D RNA Folding ML Class Competetion gallery 4

2D RNA Folding ML Class Competetion gallery 5

2D RNA Folding ML Class Competetion gallery 6