Week 5 - Homework

week05

homework

Import Data

We need to use two data for this exercise:

the bulk RNAseq gene expression data from the read-counts.csv file.
the diffenrential expression (DE) analysis results toy_DEanalysis.

Reminder: the DE results were obtained by comparing SET1 samples to WT samples using data from read-counts.csv

Import DE analysis result (toy_DEanalysis.csv) and name it as de_res.

(You can either use the basic function read.csv() or the function read_csv() from the {readr} package.)

Import the read-counts.csv file and name it counts.

You can either use the first column (Feature) to name the rows of your data frame or keep the fisrt column as is. It will just change how you filter data later.

Find Genes of Interest

Find the genes which satisfy the following conditions:

log2 fold change < -1 or > 1 (a.k.a, the absolute log2 fold change is bigger than 1)
adjusted p-value < 0.05

Store the results in a variable target_genes.

Draw Boxplots

Load the {ggplot2} package.
Draw a boxplot for the 1st gene of the target_genes to show the expression level between SET1 and WT samples.

Hints: you need to extract the expression data for the gene from the counts and build a data frame for the boxplot.

Refine the boxplot from question 5 to include the following customizations:

A subtitle showing the gene’s log2 fold change.
Fill the boxplot with different colors for the “WT” and “SET1” groups.
Apply the theme_minimal() theme.
Hide the legend.

The homework correction is available here: link