apply
FamilyIOC-R Week 7
Syntax:
sum()
, mean()
, log2()
)library(package_name)
), or use the function with its prefix (package_name::function_name()
)## moke gene data
gene_data <- data.frame(
gene = c("GeneA", "GeneB", "GeneC", "GeneD", "GeneE"),
log2FC = c(2.5, -1.8, 0.8, 1.6, -0.5),
p_value = c(0.0001, 0.03, 0.2, 0.0005, 0.05)
)
gene_data
gene log2FC p_value
1 GeneA 2.5 1e-04
2 GeneB -1.8 3e-02
3 GeneC 0.8 2e-01
4 GeneD 1.6 5e-04
5 GeneE -0.5 5e-02
apply
Family Functionsapply
Family?What happens when we need to apply a function multiple times?
E.g.: calculate the median of each row of the following matrix.
apply()
FunctionApplies a function across rows or columns.
X
: Matrix (or data frame)MARGIN
= 1: Apply function to rows; MARGIN
= 2: Apply function to columnsFUN
: The function to applyapply()
FunctionHow to apply a more complex/custom function?
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 2 3 5
[3,] 3 3 3 6
E.g.: how many unique value each row contains?
lapply()
FunctionApplies a function to each element of a list or vector.
X
: A list or vectorFUN
: The function to apply
How many exons does each gene have?
sapply()
– A Simpler lapply()
sapply()
simplifies lapply()
’s output, it tries to return a vector or matrix when possible.
sapply()
– A Simpler lapply()
# Function to return exon count and whether it has more than 2 exons
exon_info <- function(exons) {
count <- length(exons)
more_than_2 <- ifelse(count > 2, "yes", "no")
# Returns a vector of length 2
return(c(count, more_than_2))
}
lapply(gene_exons, exon_info)
$gene1
[1] "3" "yes"
$gene2
[1] "2" "no"
$gene3
[1] "4" "yes"
$gene4
[1] "1" "no"
gene1 gene2 gene3 gene4
[1,] "3" "2" "4" "1"
[2,] "yes" "no" "yes" "no"
Results are simplified into matrix.
# Function that returns exon count + some info for each gene
exon_info2 <- function(exons) {
count <- length(exons)
if (count > 2) {
return(c(count, "high exon number")) # Returns 2 elements
} else {
return(count) # Returns 1 element if <= 2
}
}
sapply(gene_exons, exon_info2)
$gene1
[1] "3" "high exon number"
$gene2
[1] 2
$gene3
[1] "4" "high exon number"
$gene4
[1] 1
Results cannot be simplified, still stored in a list.
apply()
for column-wise and row-wise operations (e.g., calculation variance of each rows or columns)lapply()
for list-based computations (e.g., repeating generation of plot for a list of genes)