Each function calls will make ER balance subset and subsequent PAM50 call.
TEST.ihc and TEST.matrix are provided → substitute these to make calls for your data.
To make conventional intrinsic call that is only using IHC, use the function “makeCalls.ihc”.
source("functions_PCA-PAM50.R")
Form Secondary ER-balanced set (refer to paper) leveraging PCA and subsequent intermediate intrinsic subtype
This is first step of the PCA-PAM50 approach.
1) Load the IHC subtype:
We have provided the Test.ihc and matrix file for testing purposes.
load("./Rdat/Sample_IHC_PAM-Mat.Rdat") # [1] "Test.ihc" "Test.matrix"
2) Add the ER status for the purpose of sorting:
Benefit of sorting → function assumptions are easily met.
Assumption is majority of ER+ was on PC1 neg and ER- on the PC1 pos.
Test.ihc$ER_status = rep("NA",length(Test.ihc$PatientID)) Test.ihc$ER_status[which(Test.ihc$IHC %in% c("LA","LB1","LB2"))] = "pos" Test.ihc$ER_status[which(Test.ihc$IHC %in% c("TN","Her2+"))] = "neg" Test.ihc = Test.ihc[order(Test.ihc$ER_status, decreasing = T),] Test.matrix = Test.matrix[,Test.ihc$PatientID] # Order test matrix using Test.ihc
Note
Clinical subtype data.frame required for the function should have a column “PatientID” with which mat colnames are also named.
IHC subtype column should be named “IHC”
IHC subtype are labelled like this “LA”, “LB1”, “LB2”, “TN”, “Her2+”
df.cln = data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = F) # Even though Test.ihc has those two columns.
3) Setting the global variable for PAM50 function to use:
paramDir = "./PAM50/bioclassifier_R/" # PAM50dir from below function param -- PAM50 is provided here. # Obtained from https://genome-publications.bioinf.unc.edu/PAM50/. # *For more information and detailed instructions, please read the Bioclassifier section. inputDir = "./Call_wth_PC1ihcMd/" ;dir.create(inputDir) # Warning if directory exists. # *It is safe to ignore the warning. predFiles = "./Rdat/TEST_pam_mat.txt" # Provided inputFile for predictions. # write.table(Test.matrix, "./Rdat/TEST_pam_mat.txt", sep="\t", col.name=NA) short = "TEST.PC1ihc.Mdns" # Short name that will be used for output files -- ANY name is fine. calibrationParameters = "Mdns.PC1ihc" # Used for NM.mdns in function below -- Suffix for naming int-sbs column in o/p. hasClinical = FALSE # We do not use clinical info anyways. collapseMethod = "mean" # Can be mean or iqr (probe with max iqr is selected). # *Pertains only to micorarray data. Out.mdns.fl = "mdns_TEST_PC1IHC.txt" # Where new medians will be written -- Used for mdns.outFile calibrationFile = paste(paramDir, Out.mdns.fl, sep = "/") # mdns.outFile below -- *DO NOT change this*
4) Calling function to make secondary ER-balanced set and subsequent intermediate intrinsic subtype:
LIST is returned and the elements are:
# Int.sbs = Intrinsic subtype # score.fl = PAM50 o/p score # mdns.fl = medians file # SBS.colr = PAM50 o/p subtypeColors # outList = PAM50 o/p outList # PC1cutoff = data frame of Percentage of missclassified cases # DF.PC1 = Provided IHC data frame with PC1 and PC2 sbs.PC1ihcMd = makeCalls.PC1ihc(df.cln, seed = 118, mat = Test.matrix, NM.mdns = calibrationParameters, PAM50dir = paramDir, mdns.outFile = Out.mdns.fl)
save(sbs.PC1ihcMd, file = "./Rdat/Intrinsic-TESTdata_with_PC1ihcMedians.Rdat") addmargins(table(sbs.PC1ihcMd$Int.sbs$IHC, sbs.PC1ihcMd$Int.sbs$Int.SBS.Mdns.PC1ihc)) # Just to check consistency.
Compute Tertiary ER-balanced Subset (refer to paper) using intermediate intrinisc made above
The final step of PCA-PAM50 approach → provides REFINED intrinisc subtype.
rm(list = ls()) # Remove everything from before. source("functions_PCA-PAM50.R") # If you run this block separately.
Load the following:
# Load the intermediate PAM50 subtype. load("./Rdat/Intrinsic-TESTdata_with_PC1ihcMedians.Rdat") # Loads sbs.PC1ihcMd # Load matrix pam50 norm.
Provided Test.ihc and matrix file for testing purposes:
load("./Rdat/Sample_IHC_PAM-Mat.Rdat") # [1] "Test.ihc" "Test.matrix"
1) Call function – needs a data frame for PAM50 with –PatientID– and —PAM50—
df.pc1pam = data.frame(PatientID = sbs.PC1ihcMd$Int.sbs$PatientID, PAM50 = sbs.PC1ihcMd$Int.sbs$Int.SBS.Mdns.PC1ihc, IHC = sbs.PC1ihcMd$Int.sbs$IHC, stringsAsFactors = F)
Note
The IHC column is optional.
2) Setting the global variable for PAM50 function to use
paramDir = "./PAM50/bioclassifier_R/" # PAM50dir from below function param -- PAM50 is provided here. # Obtained from https://genome-publications.bioinf.unc.edu/PAM50/. # *For more information and detailed instructions, please read the Bioclassifier section. inputDir = "./Call_wth_v1pamMd/" ;dir.create(inputDir) # Warning if directory exists. # *It is safe to ignore the warning. predFiles = "./Rdat/TEST_pam_mat.txt" # Provided inputFile for predictions. # write.table(Test.matrix, "./Rdat/TEST_pam_mat.txt", sep = "\t", col.name = NA) short = "METd.v1pam.Mdns" # Short name that will be used for output files -- ANY name is fine. calibrationParameters = "Mdns.v1pam" # Used for NM.mdns in function below -- Suffix for naming int-sbs column in o/p. hasClinical = FALSE # We do not use clinical info anyways. collapseMethod = "mean" # Can be mean or iqr (probe with max iqr is selected). # *Pertains only to micorarray data. Out.mdns.fl = "mdns_TEST_v1pam.txt" # Where new medians will be written -- Used for mdns.outFile calibrationFile = paste(paramDir, Out.mdns.fl, sep = "/") # mdns.outFile below -- *DO NOT change this*
3) Calling function to make tertiary ER-balanced set and subsequent refined intrinsic subtype
LIST is returned and the elements are:
# Int.sbs = Intrinsic subtype # score.fl = PAM50 o/p score # mdns.fl = medians file # SBS.colr = PAM50 o/p subtypeColors # outList = PAM50 o/p outList sbs.v1pamMd = makeCalls.v1PAM(df.pam = df.pc1pam, seed = 118, mat = Test.matrix, NM.mdns = calibrationParameters, PAM50dir = paramDir, mdns.outFile = Out.mdns.fl)
save(sbs.v1pamMd, file = "./Rdat/Intrinsic-TESTdata_with_v1pamMedians.Rdat") addmargins(table(sbs.v1pamMd$Int.sbs$IHC, sbs.v1pamMd$Int.sbs$Int.SBS.Mdns.v1pam)) # To check consistency if optional IHC provided.
To form Primary ER-balanced set (refer to paper) and subsequent conventional intrinsic subtype
rm(list = ls()) # Remove everything from before. source("functions_PCA-PAM50.R") # If you run this block separately.
Load the following:
# Load the IHC subtype.
Provided Test.ihc and matrix file for testing purposes:
load("./Rdat/Sample_IHC_PAM-Mat.Rdat") # [1] "Test.ihc" "Test.matrix"
Note
Clinical subtype data.frame required for the function should have a column “PatientID” with which mat colnames are also named.
IHC subtype column should be named “IHC”
IHC subtype are labelled like this “LA”, “LB1”, “LB2”, “TN”, “Her2+”
df.cln = data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = F) # Even though Test.ihc has those two columns.
1) Setting the global variable for PAM50 function to use:
paramDir = "./PAM50/bioclassifier_R/" # PAM50dir from below function param -- PAM50 is provided here. # Obtained from https://genome-publications.bioinf.unc.edu/PAM50/. # *For more information and detailed instructions, please read the Bioclassifier section. inputDir = "./Call_wth_IHCMd/" ;dir.create(inputDir) # Warning if directory exists. # *It is safe to ignore the warning. # The input data matrix as a tab delimited text file. predFiles = "./Rdat/TEST_pam_mat.txt" # Provided inputFile for predictions. # write.table(Test.matrix, "./Rdat/TEST_pam_mat.txt", sep = "\t", col.name = NA) short = "IHC.Mdns" # Short name that will be used for output files -- ANY name is fine. calibrationParameters = "Mdns.ihc" # Used for NM.mdns in function below -- Suffix for naming int-sbs column in o/p. hasClinical = FALSE # We do not use clinical info anyways. collapseMethod = "mean" # Can be mean or iqr (probe with max iqr is selected). # *Pertains only to micorarray data. Out.mdns.fl = "mdns_TEST_IHC.txt" # Where new medians will be written -- Used for mdns.outFile calibrationFile = paste(paramDir, Out.mdns.fl, sep = "/") # mdns.outFile below -- *DO NOT change this*
2) Calling function to make primary ER-balanced set and subsequent conventional intrinsic subtype
LIST is returned and the elements are:
# Int.sbs = Intrinsic subtype # score.fl = PAM50 o/p score # mdns.fl = medians file # SBS.colr = PAM50 o/p subtypeColors # outList = PAM50 o/p outList sbs.IHCmd = makeCalls.ihc(df.cln, seed = 118, mat = Test.matrix, NM.mdns = calibrationParameters, PAM50dir = paramDir, mdns.outFile = Out.mdns.fl)
save(sbs.IHCmd, file = "./Rdat/Intrinsic-TESTdata_with_ihcMedians.Rdat") addmargins(table(sbs.IHCmd$Int.sbs$Int.SBS.Mdns.ihc, sbs.IHCmd$Int.sbs$IHC)) # To check consistency. # 8% consistency improvement (b/n conventional and refined intrinsic) with this TEST data.
OPTIONAL code to make PCA PLOT
library(DESeq) library(gplots) pData = data.frame(condition = Test.ihc$IHC) rownames(pData) = Test.ihc$PatientID phenoData = new("AnnotatedDataFrame", data = pData)#, varMetadata = Metadata XSet = ExpressionSet(assayData = Test.matrix, phenoData = phenoData) my.plotPCA(XSet, intgroup = pData$condition, ablne = 2.4, colours = c("hotpink","darkblue","lightblue","lightblue3","red"))