Sequence Data Quality Control, Part. 1: FastQC

Assessing Sequence Data Quality with FastQC

Sequence Data Quality Control, Part. 1: FastQC

Welcome back!

If you are new to my blog please click here to start.
Now let's get begin, shall we?

In this article, we will be learning about how to conduct Quality Control on our sequenced data.

We will use the same example data which we have been using from the previous articles (click here and here if you need to download the data).

To download the tool, go here and scroll down to locate FastQC.

For Windows/Linux Users:

  1. Hover your mouse cursor over FastQC v0.11.9 (Win/Linux zip file).

  2. Right-click and copy the link address.

  3. Open up a fresh, new Terminal. Login to your ssh if necessary.

  4. Go to the directory where you want to download your FastQC tool using the cd command. If you want to download the tool in a new directory, use the mkdir command to do so.
    I will be downloading mine in the sra_data directory continuing from the previous demonstration.

  5. Once you are in the designated directory, paste the following command:

    wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip
    

    Note: Make sure you have the most recent version. If not go here to find the latest version of the tool.

  6. Once the download has been completed, unzip the file using this command :

    unzip fastqc_v0.11.9.zip
    
  7. Type the command ls to check if the tool has been properly unzipped. As a result, you should see a FastQC directory.

  8. Go to the FastQC directory using the below command:

    cd FastQC
    
  9. Enter the command below in the Terminal to configure the tool:

    chmod +x FastQC/fastqc
    

    Note: Linux by default doesn't allow direct execution so we need to use the chmod command for configuration.

  10. Check whether FastQC has been properly installed by running the following command:

FastQC/fastqc --version

It should show as FastQC v0.11.9 after running the above command.

  1. Create a PATH environment for FastQC using the vi .profile command.
    If you don't remember how to set PATH, refer to the previous articles. Make sure you input the correct pathway where your FastQC tool is located.

  2. To start QC use this command:

fastqc <name of the FASTQ file>

Note: Your FASTQ file name includes the .gz part as well so don't forget to include it.

Once the process is done, it should produce a .html and a .zip per FASTQ file. The .html files produced are the FastQC reports which you need to export/download to your PC from the server.

If you are trying to download the .html file(s) to Windows PC, click here to learn how to download files to/from the ssh server.

For Mac Users:

You don't need the Terminal for Mac OS.

  1. Click the FastQC v0.11.9 (Mac DMG image).

  2. Once the download has been completed, open the program and follow the software instruction to set it up.

  3. Since our FASTQ files are saved in the ssh server, you would need to download the file from the server to your local PC e.g. Desktop.

    Open up a fresh, new Terminal screen (or tab) and do not log in to your ssh server.

  4. Use the scp command format below and replace the sections in brackets with your information. Enter your scp command on the new Terminal:

    scp -P [port number] [username]@[server name or IP]:[path to file on server] [path to file on local PC]
    

    For example, the command for downloading the file onto the Desktop for me is:
    scp -P 1234 compbio@567.890.xx.yy:/sra_data/SRR8238941_1_fastq.gz /Users/compbio/Desktop

  5. Once the download is done, open the FastQC tool. Go to 'File' and load the files to produce QC reports.

The QC reports for both Mac and Windows/Linux users should look like the example picture below:

Based on your QC report you will decide trimming process for poor-quality reads or samples.

But wait! We have a batch of FASTQ files that needs to run through FastQC.
How can we process the FASTQ files in bulk?

For Mac, since it has a GUI interface, you can open multiple tabs to load the files and run the program. For Linux/Windows there are a couple of options in which I will demonstrate in the following article on how you can run sequence data through FastQC in bulk .

Until then, try practicing what you have learned so far getting used to handling the tools and commands. See you next time!

Did you find this article valuable?

Support ShortLong-Seq Reads Bioinformatics by becoming a sponsor. Any amount is appreciated!