Week 5 - Homework
Import Data
We need to use two data for this exercise:
- the bulk RNAseq gene expression data from the
read-counts.csv
file. - the diffenrential expression (DE) analysis results
toy_DEanalysis
.
Reminder: the DE results were obtained by comparing SET1 samples to WT samples using data from read-counts.csv
- Import DE analysis result (
toy_DEanalysis.csv
) and name it asde_res
.
(You can either use the basic function read.csv()
or the function read_csv()
from the {readr
} package.)
- Import the
read-counts.csv
file and name itcounts
.
You can either use the first column (Feature) to name the rows of your data frame or keep the fisrt column as is. It will just change how you filter data later.
Find Genes of Interest
- Find the genes which satisfy the following conditions:
- log2 fold change < -1 or > 1 (a.k.a, the absolute log2 fold change is bigger than 1)
- adjusted p-value < 0.05
Store the results in a variable target_genes
.
Draw Boxplots
Load the {
ggplot2
} package.Draw a boxplot for the 1st gene of the
target_genes
to show the expression level between SET1 and WT samples.
Hints: you need to extract the expression data for the gene from the counts
and build a data frame for the boxplot.
- Refine the boxplot from question 5 to include the following customizations:
- A subtitle showing the gene’s log2 fold change.
- Fill the boxplot with different colors for the “WT” and “SET1” groups.
- Apply the
theme_minimal()
theme. - Hide the legend.
The homework correction is available here: link