PCA-PAM50 Instructions

PCA-PAM50 Instructions

Each function calls will make ER balance subset and subsequent PAM50 call.
TEST.ihc and TEST.matrix are provided → substitute these to make calls for your data.
To make conventional intrinsic call that is only using IHC, use the function “makeCalls.ihc”.

source("functions_PCA-PAM50.R")

Form Secondary ER-balanced set (refer to paper) leveraging PCA and subsequent intermediate intrinsic subtype
This is first step of the PCA-PAM50 approach.

1) Load the IHC subtype:


We have provided the Test.ihc and matrix file for testing purposes.

load("./Rdat/Sample_IHC_PAM-Mat.Rdat") # [1] "Test.ihc"    "Test.matrix"

2) Add the ER status for the purpose of sorting:


Benefit of sorting → function assumptions are easily met.
Assumption is majority of ER+ was on PC1 neg and ER- on the PC1 pos.

Test.ihc$ER_status = rep("NA",length(Test.ihc$PatientID))
Test.ihc$ER_status[which(Test.ihc$IHC %in% c("LA","LB1","LB2"))] = "pos"
Test.ihc$ER_status[which(Test.ihc$IHC %in% c("TN","Her2+"))] = "neg"

Test.ihc      = Test.ihc[order(Test.ihc$ER_status, decreasing = T),]
Test.matrix   = Test.matrix[,Test.ihc$PatientID] # Order test matrix using Test.ihc

Note
Clinical subtype data.frame required for the function should have a column “PatientID” with which mat colnames are also named.
IHC subtype column should be named “IHC”
IHC subtype are labelled like this “LA”, “LB1”, “LB2”, “TN”, “Her2+”

df.cln = data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = F) # Even though Test.ihc has those two columns.

3) Setting the global variable for PAM50 function to use:


paramDir              = "./PAM50/bioclassifier_R/" # PAM50dir from below function param -- PAM50 is provided here.
                        # Obtained from https://genome-publications.bioinf.unc.edu/PAM50/.
                        # *For more information and detailed instructions, please read the Bioclassifier section.
inputDir              = "./Call_wth_PC1ihcMd/" ;dir.create(inputDir) # Warning if directory exists.
                        # *It is safe to ignore the warning.
predFiles             = "./Rdat/TEST_pam_mat.txt" # Provided inputFile for predictions.
                        # write.table(Test.matrix, "./Rdat/TEST_pam_mat.txt", sep="\t", col.name=NA)
short                 = "TEST.PC1ihc.Mdns" # Short name that will be used for output files -- ANY name is fine.
calibrationParameters = "Mdns.PC1ihc" # Used for NM.mdns in function below -- Suffix for naming int-sbs column in o/p.
hasClinical 	      = FALSE # We do not use clinical info anyways. 
collapseMethod	      = "mean" # Can be mean or iqr (probe with max iqr is selected).
                        # *Pertains only to micorarray data.
Out.mdns.fl           = "mdns_TEST_PC1IHC.txt" # Where new medians will be written -- Used for mdns.outFile
calibrationFile       = paste(paramDir, Out.mdns.fl, sep = "/") # mdns.outFile below -- *DO NOT change this* 

4) Calling function to make secondary ER-balanced set and subsequent intermediate intrinsic subtype:


LIST is returned and the elements are:

# Int.sbs = Intrinsic subtype
# score.fl = PAM50 o/p score
# mdns.fl = medians file
# SBS.colr = PAM50 o/p subtypeColors
# outList = PAM50 o/p outList
# PC1cutoff = data frame of Percentage of missclassified cases
# DF.PC1 = Provided IHC data frame with PC1 and PC2

sbs.PC1ihcMd = makeCalls.PC1ihc(df.cln, seed = 118, mat = Test.matrix, NM.mdns = calibrationParameters, PAM50dir = paramDir, mdns.outFile = Out.mdns.fl)
save(sbs.PC1ihcMd, file = "./Rdat/Intrinsic-TESTdata_with_PC1ihcMedians.Rdat")
addmargins(table(sbs.PC1ihcMd$Int.sbs$IHC, sbs.PC1ihcMd$Int.sbs$Int.SBS.Mdns.PC1ihc)) # Just to check consistency.

Compute Tertiary ER-balanced Subset (refer to paper) using intermediate intrinisc made above
The final step of PCA-PAM50 approach → provides REFINED intrinisc subtype.

rm(list = ls()) # Remove everything from before.
source("functions_PCA-PAM50.R") # If you run this block separately.

Load the following:

# Load the intermediate PAM50 subtype.
load("./Rdat/Intrinsic-TESTdata_with_PC1ihcMedians.Rdat") # Loads sbs.PC1ihcMd
# Load matrix pam50 norm.

Provided Test.ihc and matrix file for testing purposes:

load("./Rdat/Sample_IHC_PAM-Mat.Rdat") # [1] "Test.ihc"    "Test.matrix"

1) Call function – needs a data frame for PAM50 with –PatientID– and —PAM50—


df.pc1pam = data.frame(PatientID = sbs.PC1ihcMd$Int.sbs$PatientID, PAM50 = sbs.PC1ihcMd$Int.sbs$Int.SBS.Mdns.PC1ihc, IHC = sbs.PC1ihcMd$Int.sbs$IHC, stringsAsFactors = F)

Note
The IHC column is optional.

2) Setting the global variable for PAM50 function to use


paramDir              = "./PAM50/bioclassifier_R/" # PAM50dir from below function param -- PAM50 is provided here.
                        # Obtained from  https://genome-publications.bioinf.unc.edu/PAM50/.
                        # *For more information and detailed instructions, please read the Bioclassifier section.
inputDir              = "./Call_wth_v1pamMd/" ;dir.create(inputDir) # Warning if directory exists.
                        # *It is safe to ignore the warning.
predFiles             = "./Rdat/TEST_pam_mat.txt" # Provided inputFile for predictions.
                        # write.table(Test.matrix, "./Rdat/TEST_pam_mat.txt", sep = "\t", col.name = NA)
short                 = "METd.v1pam.Mdns" # Short name that will be used for output files -- ANY name is fine.
calibrationParameters = "Mdns.v1pam" # Used for NM.mdns in function below -- Suffix for naming int-sbs column in o/p.
hasClinical 	      = FALSE # We do not use clinical info anyways. 
collapseMethod	      = "mean" # Can be mean or iqr (probe with max iqr is selected).
                        # *Pertains only to micorarray data.
Out.mdns.fl           = "mdns_TEST_v1pam.txt" # Where new medians will be written -- Used for mdns.outFile
calibrationFile       = paste(paramDir, Out.mdns.fl, sep = "/") # mdns.outFile below -- *DO NOT change this* 

3) Calling function to make tertiary ER-balanced set and subsequent refined intrinsic subtype


LIST is returned and the elements are:

# Int.sbs = Intrinsic subtype
# score.fl = PAM50 o/p score
# mdns.fl = medians file
# SBS.colr = PAM50 o/p subtypeColors
# outList = PAM50 o/p outList

sbs.v1pamMd = makeCalls.v1PAM(df.pam = df.pc1pam, seed = 118, mat = Test.matrix, NM.mdns = calibrationParameters, PAM50dir = paramDir, mdns.outFile = Out.mdns.fl)
save(sbs.v1pamMd, file = "./Rdat/Intrinsic-TESTdata_with_v1pamMedians.Rdat")
addmargins(table(sbs.v1pamMd$Int.sbs$IHC, sbs.v1pamMd$Int.sbs$Int.SBS.Mdns.v1pam)) # To check consistency if optional IHC provided.

To form Primary ER-balanced set (refer to paper) and subsequent conventional intrinsic subtype

rm(list = ls()) # Remove everything from before.
source("functions_PCA-PAM50.R") # If you run this block separately.

Load the following:

# Load the IHC subtype.

Provided Test.ihc and matrix file for testing purposes:

load("./Rdat/Sample_IHC_PAM-Mat.Rdat") # [1] "Test.ihc"    "Test.matrix"

Note
Clinical subtype data.frame required for the function should have a column “PatientID” with which mat colnames are also named.
IHC subtype column should be named “IHC”
IHC subtype are labelled like this “LA”, “LB1”, “LB2”, “TN”, “Her2+”

df.cln = data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = F) # Even though Test.ihc has those two columns. 

1) Setting the global variable for PAM50 function to use:


paramDir              = "./PAM50/bioclassifier_R/" # PAM50dir from below function param -- PAM50 is provided here.
                        # Obtained from  https://genome-publications.bioinf.unc.edu/PAM50/.
                        # *For more information and detailed instructions, please read the Bioclassifier section.
inputDir              = "./Call_wth_IHCMd/" ;dir.create(inputDir) # Warning if directory exists.
                        # *It is safe to ignore the warning.
                        # The input data matrix as a tab delimited text file.
predFiles             = "./Rdat/TEST_pam_mat.txt" # Provided inputFile for predictions.
                        # write.table(Test.matrix, "./Rdat/TEST_pam_mat.txt", sep = "\t", col.name = NA)
short                 = "IHC.Mdns" # Short name that will be used for output files -- ANY name is fine.
calibrationParameters = "Mdns.ihc" # Used for NM.mdns in function below -- Suffix for naming int-sbs column in o/p.
hasClinical 	      = FALSE # We do not use clinical info anyways. 
collapseMethod	      = "mean" # Can be mean or iqr (probe with max iqr is selected).
                        # *Pertains only to micorarray data.
Out.mdns.fl           = "mdns_TEST_IHC.txt" # Where new medians will be written -- Used for mdns.outFile
calibrationFile       = paste(paramDir, Out.mdns.fl, sep = "/") # mdns.outFile below -- *DO NOT change this* 

2) Calling function to make primary ER-balanced set and subsequent conventional intrinsic subtype


LIST is returned and the elements are:

# Int.sbs = Intrinsic subtype
# score.fl = PAM50 o/p score
# mdns.fl = medians file
# SBS.colr = PAM50 o/p subtypeColors
# outList = PAM50 o/p outList

sbs.IHCmd = makeCalls.ihc(df.cln, seed = 118, mat = Test.matrix, NM.mdns = calibrationParameters, PAM50dir = paramDir, mdns.outFile = Out.mdns.fl)
save(sbs.IHCmd, file = "./Rdat/Intrinsic-TESTdata_with_ihcMedians.Rdat")
addmargins(table(sbs.IHCmd$Int.sbs$Int.SBS.Mdns.ihc, sbs.IHCmd$Int.sbs$IHC)) # To check consistency.

# 8% consistency improvement (b/n conventional and refined intrinsic) with this TEST data.

OPTIONAL code to make PCA PLOT

library(DESeq)
library(gplots)

pData           = data.frame(condition = Test.ihc$IHC)
rownames(pData) = Test.ihc$PatientID
phenoData       = new("AnnotatedDataFrame", data = pData)#, varMetadata = Metadata
XSet            = ExpressionSet(assayData = Test.matrix, phenoData = phenoData)

my.plotPCA(XSet, intgroup = pData$condition, ablne = 2.4, colours = c("hotpink","darkblue","lightblue","lightblue3","red"))