4 Analysis
Let’s now look at the genes we obtained and their functions.
- Basal: “CDCA8” “CAVIN2” “HJURP”
- Her2: “RRM2” “HJURP” “KIF4A”
- LumA: “LYVE1” “VEGFD” “DMD”
- LumB: “PAMR1” “CAVIN2” “DMD”
Looking at what each of the genes does: - CDCA8: Cell Division Cycle Associated 8. It is a component of the chromosomal passenger complex (CPC), a complex that acts as a key regulator of mitosis. - CAVIN2: Caveolae Associated Protein 2, plays an important role in caveolar biogenesis and morphology. - HJURP: Holliday Junction Recognition Protein, participates in DNA damage and mitotic cell cycle pathways. Genecards mentions it is associated with Luminal A subtype. - RRM2: Ribonucleotide Reductase Regulatory Subunit M2, catalyzes the biosynthesis of deoxyribonucleotides from the corresponding ribonucleotides, necessary for DNA synthesis. - KIF4A: Kinesin Family Member 4A, ATP dependent microtubule-based motor protein that is involved in the intracellular transport of membranous organelles. - LYVE1: Lymphatic Vessel Endothelial Hyaluronan Receptor 1, plays a role in autocrine regulation of cell growth. - VEGFD: Vascular Endothelial Growth Factor D, growth factor active in angiogenesis, lymphangiogenesis and endothelial cell growth, stimulating their proliferation and migration and also has effects on the permeability of blood vessels. - DMD: Dystrophin, forms a component of the dystrophin-glycoprotein complex (DGC), which bridges the inner cytoskeleton and the extracellular matrix. - PAMR1: Peptidase Domain Containing Associated With Muscle Regeneration 1, may play a role in regeneration of skeletal muscle.
We can see that most of these genes are related to cell growth and mitotic cell division, two pathways that are commonly misregulated in cancer. One, HJURP, is associated with Luminal A breast cancer, but in our analysis turned out to be more significant for Basal and Her2 subtypes. Interestingly, of the 9 different genes considered here, only one, “RRM2” is among the 50 genes in the PAM50 list defining the subtypes. However, we only considered the three with lowest values for each subtype, expanding this analysis to more genes may result in more genes in the list arising.
We can visualize the difference in expression of these genes through a heatmap, using the pheatmap
library.
library("pheatmap")
# Col data
<- as.data.frame(rse_BRCA$subtype)
df_colcolnames(df_col) <- "Subtype"
rownames(df_col) <- colnames(rse_BRCA)
# PAM50 genes
<- c("UBE2T","BIRC5","NUF2","CDC6","CCNB1","TYMS","MYBL2","CEP55","MELK","NDC80","RRM2","UBE2C","CENPF","PTTG1","EXO1","ORC6","ANLN","CCNE1","CDC20","MKI67","KIF2C","ACTR3B" ,"MYC","EGFR","KRT5","PHGDH","CDH3","MIA","KRT17","FOXC1","SFRP1","KRT14","ESR1","SLC39A6","BAG1","MAPT","PGR","CXXC5","MLPH","BCL2","MDM2","NAT1","FOXA1","BLVRA","MMP11","GPR160","FGFR4","GRB7","TMEM45B","ERBB2")
PAM50
# Matching genes to ids
<- match(PAM50, rowData(rse_BRCA)$gene_name)
PAM50_ids
# Filtering table
<- vGene$E[PAM50_ids, ]
PAM50_exp
pheatmap(
PAM50_exp, cluster_cols = TRUE,
cluster_rows = TRUE,
show_rownames = FALSE,
show_colnames = FALSE,
annotation_col = df_col
)
We can see that the subtypes do not all cluster together, even with the genes that differentiate between them. The Basal subtype seems to be the most different from the rest, and it is also the most aggressive of all subtypes, with the worst clinical outcome.
4.1 Differential Expression by Chromosome
Lastly, we will look at which chromosomes contain the most differentially expressed genes by subtype, focusing on the 1K protein-coding genes with smallest adjusted p-values.
# Function to extract expression of protein coding genes with lowest p vals
<- function(de_results, n=1000) {
diff_exp_genes
# Filtering protein coding
<- de_results[de_results$gene_type == 'protein_coding', ]
protein_coding
# Selecting top n
<- protein_coding$gene_id[rank(protein_coding$adj.P.Val) <= n]
smallest_p_vals
# Getting these genes for heatmap
<- vGene$E[match(smallest_p_vals, rownames(vGene)), ]
exprs_heatmap
return(exprs_heatmap)
}
To visualize how many differentially expressed genes there are in each chromosome, we can use the chromosome data from seqnames
in rowRanges(rse_BRCA)
.
<- function(exprs) {
chr_df
# Get genes from row names
<- rownames(exprs)
genes
# Access chr name from rowRanges
<- seqnames(rowRanges(rse_BRCA)[genes])
chrs
# Drop unused levels, count occurrences
<- table(droplevels(chrs))
df <- as.data.frame(df)
df colnames(df) <- c("Chromosome", "Frequency")
return(df)
}
4.2 Basal tumors
<- diff_exp_genes(de_results_basal)
diff_basal
pheatmap(
diff_basal, cluster_cols = TRUE,
cluster_rows = TRUE,
show_rownames = FALSE,
show_colnames = FALSE,
annotation_col = df_col
)
<- chr_df(diff_basal)
df
ggplot(df, aes(x=Chromosome, y=Frequency)) + geom_bar(stat="identity", fill="palegreen4")
4.3 Her2 tumors
<- diff_exp_genes(de_results_her2)
diff_her2
pheatmap(
diff_her2, cluster_cols = TRUE,
cluster_rows = TRUE,
show_rownames = FALSE,
show_colnames = FALSE,
annotation_col = df_col
)
<- chr_df(diff_her2)
df
ggplot(df, aes(x=Chromosome, y=Frequency)) + geom_bar(stat="identity", fill="palegreen4")
4.4 LumA tumors
<- diff_exp_genes(de_results_luma)
diff_luma
pheatmap(
diff_luma, cluster_cols = TRUE,
cluster_rows = TRUE,
show_rownames = FALSE,
show_colnames = FALSE,
annotation_col = df_col
)
<- chr_df(diff_luma)
df
ggplot(df, aes(x=Chromosome, y=Frequency)) + geom_bar(stat="identity", fill="palegreen4")
4.5 LumB tumors
<- diff_exp_genes(de_results_lumb)
diff_lumb
pheatmap(
diff_lumb, cluster_cols = TRUE,
cluster_rows = TRUE,
show_rownames = FALSE,
show_colnames = FALSE,
annotation_col = df_col
)
<- chr_df(diff_lumb)
df
ggplot(df, aes(x=Chromosome, y=Frequency)) + geom_bar(stat="identity", fill="palegreen4")