Week 4 - Homework

week04

homework

Import the read-counts.csv file.

Quick reminder: this data file contains gene expression values of samples from four groups, sample names are prefixed by “WT”, “SET1”, “SET1.RRP6” and “RRP6”. Each group has 10 samples.

Calculate the average gene expression per gene across the 10 samples in the “WT” group.

Now, repeat the previous step to calculate the average expression for the remaining three groups: “SET1”, “SET1.RRP6” and “RRP6”

Store the four average values in a list named avg_list, using the group names as the names of the list.
Display the first 5 average values for the “SET1.RRP6” group.

Transform the list obtained in question 3 to a data frame using as.data.frame(). Show the head lines of your data frame.

Tip

A data frame can be considered as a list of equal-length vectors.

What are the genes having an average greater than 10000 in WT and SET1 samples? Compare if there are genes in common using learned operator or the intersect() function.

Check if the average expression of the “RRP6” group is normally distributed (?shapiro.test()) using significance level at 5%. What is the p-value of normality test? If it’s normally distributed, draw directly a histogram (?hist()) for the values. Otherwise, draw a histogram for the log-transformed values.

The homework correction is available here: link