Streamlining RNA-Seq Data Processing with MATLAB

Chapter 1: Introduction to RNA-Seq Data Processing

In this section, I will share my experience using MATLAB for the pre-processing of RNA-Seq data utilizing the bowtie program. As a beginner in coding, this narrative reflects my journey rather than serving as a definitive guide. I encourage any advice on improving data handling or MATLAB practices.

Previously, I discussed the foundational concepts behind RNA-Seq analysis in two earlier articles. The objective of my project was to identify the transcriptional start site, which indicates the precise location where RNA synthesis begins within a gene.

To achieve this, I first purified the unique start of the RNA and attached a designated RNA sequence, known as a 5' RNA adapter, at the RNA's beginning. Using reverse transcriptase, a viral enzyme, I converted all RNA into complementary DNA (cDNA). I amplified the cDNA specific to my gene of interest using a specific primer, and subsequently sequenced the amplified cDNA to analyze the resulting sequence reads.

Chapter 2: Data Import and Initial Processing

The sequence data was provided to me as a FASTA file. I must admit that while the code I wrote initially produced the expected results, I did not instruct MATLAB to generate the correct output for the bowtie program, as discussed in my second article.

The code I present here processes the data in a manner compatible with bowtie.

Importing Files and Reverse Complementing Reads

My first task involved reading the fastq data files, which MATLAB simplified via a built-in function that imports the file into a structured format. Since I requested paired-end reads, I imported two files:

CopyFASTQStruct1 = fastqread('97–21-K_R1_001.fastq');

FASTQStruct2 = fastqread('97–21-K_R2_001.fastq');

To clarify paired-end reads, they allow for the sequencing of longer RNA or DNA fragments. The two imported files represent the two reads.

As read 2 is derived from the opposite end and direction of read 1, I needed to obtain the reverse complement of read 2 to align the sequences correctly. The following code accomplished this:

CopyFASTQStruct2RC = FASTQStruct2;

N2 = numel(FASTQStruct2);

for n = 1:N2;

FASTQStruct2RC(n).Sequence = seqrcomplement(FASTQStruct2RC(n).Sequence);

end;

Searching for Specific Sequences

Next, I aimed to locate my 5' adapter and gene-specific primer within these reads. To streamline the search, I merged the two sequences for alignment:

for n = 1:N2;

FASTQStruct2RC(n).MergeSeq = [FASTQStruct1(n).Sequence, FASTQStruct2RC(n).Sequence];

end;

Following that, I converted structure files into tables for easier management:

FASTQtable1 = struct2table(FASTQStruct1);

FASTQtable1.Properties.VariableNames = {'Header1', 'Sequence1', 'Quality1'};

FASTQtable2RC = struct2table(FASTQStruct2RC);

FASTQtable2RC.Properties.VariableNames = {'Header2', 'Sequence2', 'Quality2', 'MergeSeq'};

FASTQtableConcat = [FASTQtable1, FASTQtable2RC];

Now I will search for the specific sequences:

RNA5prAdapter = "GTTCAGAGTTCTACAGTCCGACGATC";

The MATLAB function contains searches each merged sequence for the sequences I was interested in and marks the results in a new column.

Orientation of Reads

To ensure uniformity, I wanted all my 5' adapters positioned to the left. If a reverse complement of the 5' adapter was detected, both reads were reversed:

FASTQOrientedStruct = table2struct(FASTQtableConcat);

N = height(FASTQtableConcat);

for n = 1:N;

if FASTQOrientedStruct(n).RNA5prAdapterRC == 1

L = FASTQOrientedStruct(n).Sequence1;

R = FASTQOrientedStruct(n).Sequence2;

FASTQOrientedStruct(n).Sequence1 = seqrcomplement(R);

FASTQOrientedStruct(n).Sequence2 = seqrcomplement(L);

end

end

FASTQOrientedTable = struct2table(FASTQOrientedStruct);

At each transformation phase, I performed checks to ensure the data was processed correctly.

Filtering Sequences

With the reads oriented, filtering became simpler. I retained sequences containing a 5' adapter on the left and a primer on the right:

FilteredSeqTable1 = FASTQOrientedTable(FASTQOrientedTable.RNA5prAdapter == 1,:);

SjFilteredSeqTable1 = FilteredSeqTable1(FilteredSeqTable1.SjNR226RC == 1,:);

SosFilteredSeqTable1 = FilteredSeqTable1(FilteredSeqTable1.SosMO484RC == 1,:);

Next, I removed any sequences not part of the RNA, editing out the 5' adapter using the extractAfter function.

Final Checks and Exporting Data

I performed final quality checks to ensure all RNA 5' adapters had been removed while confirming the presence of primers.

Subsequently, I converted the filtered data into FASTQ files, ensuring quality scores were aligned with the sequences:

fastqwrite('97_21_K_Sj.fq', SjStruct);

fastqwrite('97_21_K_Sos.fq', SosStruct);

These data files were then utilized in the bowtie alignment program, which I elaborated on in the second article.

Conclusion

In summary, we explored how I utilized MATLAB to preprocess RNA-Seq read data for analysis in a bowtie program. The steps included:

Reading Genewiz FASTQ files
Reverse complementing the second read pair
Filtering for sequences containing both the 5' RNA adapter and gene-specific sequences
Orienting the sequences for consistency
Trimming excess data and saving as new FASTQ files.

For further insights into the experiment and data files, refer to the previous articles, although the coding specifics will not be discussed there.

Chapter 3: Analyzing RNA-Seq Data

The first video titled "How to analyze RNA-Seq data? Find differentially expressed genes in your research." provides an overview of methods for analyzing RNA-Seq data, focusing on identifying differentially expressed genes.

The second video, "How to Perform RNA-seq Analysis - YouTube," offers a comprehensive guide on conducting RNA-Seq analysis, perfect for beginners and experienced researchers alike.

grupoarrfug.com

Streamlining RNA-Seq Data Processing with MATLAB

Chapter 1: Introduction to RNA-Seq Data Processing

Chapter 2: Data Import and Initial Processing

Importing Files and Reverse Complementing Reads

Searching for Specific Sequences

Orientation of Reads

Filtering Sequences

Final Checks and Exporting Data

Conclusion

Chapter 3: Analyzing RNA-Seq Data

Share the page:

Recent Post:

Can Humans Truly Develop Affection for Robots? Insights Ahead

Exciting Innovations in macOS Sonoma: What to Expect

Embracing Authenticity: An Unfiltered Journey of Self-Discovery

Transform Your Sundays for a More Productive Week Ahead

The Lasting Impact of Mathematical Thought on the Church

Maximize Productivity: Prioritizing Important Over Urgent Tasks

How to Shed Winter Weight: Insights from a Yoga Instructor

Harnessing Light Energy to Enhance Longevity in Roundworms