IOC-R Week 6
Use R Projects to manage files and scripts in a structured way.
Keep data, scripts, and outputs in separate folders to avoid chaos.
Example:
Data Type | Example in Biology |
---|---|
Numeric | Expression levels (25.3 ) |
Character | Gene names ("TP53" ,"BRCA1" ) |
Logical | Mutation status (TRUE for mutated, FALSE for WT) |
Vectors
[]
for indexingMatrices:
[, ]
for indexing.Data Frames
[, ]
for indexing$
and [[ ]]
to extract one column.[1] "TP53" "BRCA1" "EGFR"
[1] 25.3 12.5 30.1
[1] 25.3 12.5 30.1
Gene Expression Mutation
1 TP53 25.3 TRUE
Gene Expression Mutation
3 EGFR 30.1 TRUE
Gene Expression Mutation
1 TP53 25.3 TRUE
3 EGFR 30.1 TRUE
Lists
[]
to subset a list.[[ ]]
or $
to select a specific component of a list.$genes
[1] "TP53" "BRCA1"
$expression
[,1] [,2]
[1,] 20 30
[2,] 15 25
$metadata
Sample Condition
1 A WT
2 B KO
$genes
[1] "TP53" "BRCA1"
[1] "TP53" "BRCA1"
[,1] [,2]
[1,] 20 30
[2,] 15 25
[1] "WT" "KO"
Type | Operator | Example |
---|---|---|
Comparison | ==, !=, >, <, >=, <= | p_value < 0.05 |
Logical | & (AND), | (OR), ! (NOT) |
TRUE & FALSE |
genes <- data.frame(
gene = c("TP53", "BRCA1", "EGFR"),
expr = c(25.3, 12.5, 30.1),
gene_family = c("A", "A", "B")
)
genes[genes$expr > 20, ]
gene expr gene_family
1 TP53 25.3 A
3 EGFR 30.1 B
# subset(genes, expr > 20) # idem
subset(genes, expr > 20, select = c("gene", "expr")) # filter and select some columns
gene expr
1 TP53 25.3
3 EGFR 30.1
Syntax: ggplot(data, aes(x, y)) + geom_*()
# use {base}'s functions
## Import
text_file <- read.table("path/to/file.txt")
csv_file <- read.csv("path/to/file.csv")
## Export
write.table(
x = df,
file = "outputs/cleaned_gene_expression.txt"
)
write.csv(
x = df,
file = "outputs/cleaned_gene_expression.csv"
)
# use {readr}'s functions
## Import
readr::read_csv(
file = "outputs/cleaned_gene_expression.csv"
)
## Export
readr::write_csv(
x = df,
file = "outputs/cleaned_gene_expression.csv"
)
Both .RDS
and .Rdata
preserve data structures, such as column data types (numeric, character or factor).
Different AI tools: ChatGPT, Gemini, Perplexity, Claude, Le Chat, DeepSeek, etc.
AI is a great assistant, but it can make mistakes, always verify outputs!
✅ Be specific: Instead of “Why is my code not working?”, ask:
“I’m trying to filter a data frame in R where Expression > 10, but I get an error. Here’s my code: df[df$Expression > 10]. How can I fix it?”
✅ Provide context: