Week 7 - Homework

week07
homework

Write your code directly in a Quarto document.

To create a Quarto document: go to File -> New File -> Quarto Document, then click Create.

Ranking Genes

To target the genes of interest, we can rank genes based on different metrics:

  • by p-values (raw or adjusted p-value)
  • by log2 fold change
  • by combinaison of p-value and log2 fold change (weighted sum), e.g., \(0.7\times|log_{2}(\text{FC})|+0.3\times(-log_{10}(\text{p-value}))\)
  1. Import the differential gene expression analysis (SET1 vs. WT) results file toy_DEanalysis.csv into RStudio and name it as de_res.

  2. With the help of the rank() function (?rank), rank genes based on the negative absolute (abs()) log2 fold change, so that larger absolute log2FC → higher rank (smaller rank number). Store the rank in a new column called rank_lfc.

  3. With the help of the rank() function, rank genes based on the log10 of adjusted p-values (pajd), so that smaller adjusted p-value → higher rank. Store the rank in a new column called rank_padj.

  4. With the help of the rank() function, rank genes based on the weighted sum of absolute log2 fold change and -log10 adjusted p-value, with 0.7 the weight for log2 fold change and 0.3 the weight for -log10 adjusted p-value. Store the rank in a new column called rand_weighted.

  5. Identify the best and the worst rank for each gene.

  6. Compute the average rank for each gene. Which gene has the smallest average rank?

Add Significance Level on ggplot

During the exercise, we’ve created a boxplot with p-value shown using the {ggsignif} package’s function. We can also annotate the level of significance (ns, *, **, ***) instead of showing the p-value.

Tasks:

  • Import gene expression data counts.csv.
  • From the differential expression results (toy_DEanalysis.csv):
    • create a new column called signif, which shows:
      • “***” if adjusted p-value < 0.001,
      • “**” if adjusted p-value < 0.01,
      • “*” if adjusted p-value < 0.05,
      • “ns” if adjusted p-value >= 0.05;
    • extract the 3 genes with the smallest adjusted p-value.
  • Build data frame for boxplot for the PIR3 gene of WT and SET1 samples.
  • Draw boxplot with individual data points and significance level annotation (geom_signif())
  • Generalise the previous steps to draw boxplot for any genes, e.g.: “TOS6”
  • Apply your function for a list of any genes from the “toy_DEanalysis” results.

Click “Render” to generate your Quarto report.

The homework correction is available here: link