grupoarrfug.com

Streamlining RNA-Seq Data Processing with MATLAB

Written on

Chapter 1: Introduction to RNA-Seq Data Processing

In this section, I will share my experience using MATLAB for the pre-processing of RNA-Seq data utilizing the bowtie program. As a beginner in coding, this narrative reflects my journey rather than serving as a definitive guide. I encourage any advice on improving data handling or MATLAB practices.

Previously, I discussed the foundational concepts behind RNA-Seq analysis in two earlier articles. The objective of my project was to identify the transcriptional start site, which indicates the precise location where RNA synthesis begins within a gene.

To achieve this, I first purified the unique start of the RNA and attached a designated RNA sequence, known as a 5' RNA adapter, at the RNA's beginning. Using reverse transcriptase, a viral enzyme, I converted all RNA into complementary DNA (cDNA). I amplified the cDNA specific to my gene of interest using a specific primer, and subsequently sequenced the amplified cDNA to analyze the resulting sequence reads.

RNA-Seq data processing steps

Chapter 2: Data Import and Initial Processing

The sequence data was provided to me as a FASTA file. I must admit that while the code I wrote initially produced the expected results, I did not instruct MATLAB to generate the correct output for the bowtie program, as discussed in my second article.

The code I present here processes the data in a manner compatible with bowtie.

Importing Files and Reverse Complementing Reads

My first task involved reading the fastq data files, which MATLAB simplified via a built-in function that imports the file into a structured format. Since I requested paired-end reads, I imported two files:

CopyFASTQStruct1 = fastqread('97–21-K_R1_001.fastq');

FASTQStruct2 = fastqread('97–21-K_R2_001.fastq');

To clarify paired-end reads, they allow for the sequencing of longer RNA or DNA fragments. The two imported files represent the two reads.

As read 2 is derived from the opposite end and direction of read 1, I needed to obtain the reverse complement of read 2 to align the sequences correctly. The following code accomplished this:

CopyFASTQStruct2RC = FASTQStruct2;

N2 = numel(FASTQStruct2);

for n = 1:N2;

FASTQStruct2RC(n).Sequence = seqrcomplement(FASTQStruct2RC(n).Sequence);

end;

Searching for Specific Sequences

Next, I aimed to locate my 5' adapter and gene-specific primer within these reads. To streamline the search, I merged the two sequences for alignment:

for n = 1:N2;

FASTQStruct2RC(n).MergeSeq = [FASTQStruct1(n).Sequence, FASTQStruct2RC(n).Sequence];

end;

Following that, I converted structure files into tables for easier management:

FASTQtable1 = struct2table(FASTQStruct1);

FASTQtable1.Properties.VariableNames = {'Header1', 'Sequence1', 'Quality1'};

FASTQtable2RC = struct2table(FASTQStruct2RC);

FASTQtable2RC.Properties.VariableNames = {'Header2', 'Sequence2', 'Quality2', 'MergeSeq'};

FASTQtableConcat = [FASTQtable1, FASTQtable2RC];

Now I will search for the specific sequences:

RNA5prAdapter = "GTTCAGAGTTCTACAGTCCGACGATC";

The MATLAB function contains searches each merged sequence for the sequences I was interested in and marks the results in a new column.

Orientation of Reads

To ensure uniformity, I wanted all my 5' adapters positioned to the left. If a reverse complement of the 5' adapter was detected, both reads were reversed:

FASTQOrientedStruct = table2struct(FASTQtableConcat);

N = height(FASTQtableConcat);

for n = 1:N;

if FASTQOrientedStruct(n).RNA5prAdapterRC == 1

L = FASTQOrientedStruct(n).Sequence1;

R = FASTQOrientedStruct(n).Sequence2;

FASTQOrientedStruct(n).Sequence1 = seqrcomplement(R);

FASTQOrientedStruct(n).Sequence2 = seqrcomplement(L);

end

end

FASTQOrientedTable = struct2table(FASTQOrientedStruct);

At each transformation phase, I performed checks to ensure the data was processed correctly.

Filtering Sequences

With the reads oriented, filtering became simpler. I retained sequences containing a 5' adapter on the left and a primer on the right:

FilteredSeqTable1 = FASTQOrientedTable(FASTQOrientedTable.RNA5prAdapter == 1,:);

SjFilteredSeqTable1 = FilteredSeqTable1(FilteredSeqTable1.SjNR226RC == 1,:);

SosFilteredSeqTable1 = FilteredSeqTable1(FilteredSeqTable1.SosMO484RC == 1,:);

Next, I removed any sequences not part of the RNA, editing out the 5' adapter using the extractAfter function.

Final Checks and Exporting Data

I performed final quality checks to ensure all RNA 5' adapters had been removed while confirming the presence of primers.

Subsequently, I converted the filtered data into FASTQ files, ensuring quality scores were aligned with the sequences:

fastqwrite('97_21_K_Sj.fq', SjStruct);

fastqwrite('97_21_K_Sos.fq', SosStruct);

These data files were then utilized in the bowtie alignment program, which I elaborated on in the second article.

Conclusion

In summary, we explored how I utilized MATLAB to preprocess RNA-Seq read data for analysis in a bowtie program. The steps included:

  • Reading Genewiz FASTQ files
  • Reverse complementing the second read pair
  • Filtering for sequences containing both the 5' RNA adapter and gene-specific sequences
  • Orienting the sequences for consistency
  • Trimming excess data and saving as new FASTQ files.

For further insights into the experiment and data files, refer to the previous articles, although the coding specifics will not be discussed there.

Chapter 3: Analyzing RNA-Seq Data

The first video titled "How to analyze RNA-Seq data? Find differentially expressed genes in your research." provides an overview of methods for analyzing RNA-Seq data, focusing on identifying differentially expressed genes.

The second video, "How to Perform RNA-seq Analysis - YouTube," offers a comprehensive guide on conducting RNA-Seq analysis, perfect for beginners and experienced researchers alike.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Can Humans Truly Develop Affection for Robots? Insights Ahead

Explore the complexities of human-robot relationships and why true affection may not be as achievable as it seems.

Exciting Innovations in macOS Sonoma: What to Expect

Discover the exciting new features of macOS Sonoma, focusing on quality improvements and enhanced functionality for a better user experience.

Embracing Authenticity: An Unfiltered Journey of Self-Discovery

A candid exploration of struggles with mental health and the quest for purpose through writing.

Transform Your Sundays for a More Productive Week Ahead

Discover essential tips to conquer the Sunday scaries and prepare for a successful week ahead.

The Lasting Impact of Mathematical Thought on the Church

An exploration of how mathematical rigor influenced Jesuit education and the Catholic Church.

Maximize Productivity: Prioritizing Important Over Urgent Tasks

Discover how to prioritize important tasks over urgent ones for a more productive life.

How to Shed Winter Weight: Insights from a Yoga Instructor

Discover how a yoga teacher navigates winter weight gain and embraces healthier habits for spring.

Harnessing Light Energy to Enhance Longevity in Roundworms

A study on how light-activated proton pumps in roundworms may extend their lifespan through enhanced mitochondrial function.