Week 7 - Homework
Write your code directly in a Quarto document.
To create a Quarto document: go to File -> New File -> Quarto Document, then click Create.
Ranking Genes
To target the genes of interest, we can rank genes based on different metrics:
- by p-values (raw or adjusted p-value)
- by log2 fold change
- by combinaison of p-value and log2 fold change (weighted sum), e.g., \(0.7\times|log_{2}(\text{FC})|+0.3\times(-log_{10}(\text{p-value}))\)
Import the differential gene expression analysis (SET1 vs. WT) results file toy_DEanalysis.csv into RStudio and name it as
de_res
.With the help of the
rank()
function (?rank
), rank genes based on the negative absolute (abs()
) log2 fold change, so that larger absolute log2FC → higher rank (smaller rank number). Store the rank in a new column calledrank_lfc
.With the help of the
rank()
function, rank genes based on the log10 of adjusted p-values (pajd
), so that smaller adjusted p-value → higher rank. Store the rank in a new column calledrank_padj
.With the help of the
rank()
function, rank genes based on the weighted sum of absolute log2 fold change and -log10 adjusted p-value, with 0.7 the weight for log2 fold change and 0.3 the weight for -log10 adjusted p-value. Store the rank in a new column calledrand_weighted
.Identify the best and the worst rank for each gene.
Compute the average rank for each gene. Which gene has the smallest average rank?
Add Significance Level on ggplot
During the exercise, we’ve created a boxplot with p-value shown using the {ggsignif
} package’s function. We can also annotate the level of significance (ns, *, **, ***) instead of showing the p-value.
Tasks:
- Import gene expression data counts.csv.
- From the differential expression results (toy_DEanalysis.csv):
- create a new column called
signif
, which shows:- “***” if adjusted p-value < 0.001,
- “**” if adjusted p-value < 0.01,
- “*” if adjusted p-value < 0.05,
- “ns” if adjusted p-value >= 0.05;
- extract the 3 genes with the smallest adjusted p-value.
- create a new column called
- Build data frame for boxplot for the PIR3 gene of WT and SET1 samples.
- Draw boxplot with individual data points and significance level annotation (
geom_signif()
) - Generalise the previous steps to draw boxplot for any genes, e.g.: “TOS6”
- Apply your function for a list of any genes from the “toy_DEanalysis” results.
Click “Render” to generate your Quarto report.
The homework correction is available here: link