High Throughput Sequencing Data Pipeline, Part. 1: Quality Control
Table of contents
No headings in the article.
Good job!
By now you should be able to download FASTA/FASTQ files with ease. The next series of write-ups will be about the high throughput sequence data analysis pipeline and the tools involved.
Before we start, let's take a look at the workflow of RNA sequencing.
Picture source: "Read Mapping." Biocorecrg.Github.Io, 2022.
Once reads (ATCG's) have been sequenced using Illumina sequencer, they are stored in FASTQ format as raw data.
The next step is called Quality Control (QC). The purpose of QC is to check the quality of the reads making sure that there are no severe abnormalities in the samples and running some statistical analysis on them (filtered reads, % GC, aligned reads, etc).
If you want to learn more about this concept, you can click on the image above to read more about it and here too.
There are many tools one can use to perform QC: FastQC, FastQScreen, FASTX etc. Depending on what your lab uses or what is available the tools may differ, but the basic schematics are the same.
In the next article, I will be using FastQC to demonstrate how you can perform quality control so stick around! :)