FDA Validation of a PCR test: Pre-processing data (Part 2)

All of the data analyzed used the same format to make it easy to re-use code .

The data for each PCR experiment was in an Excel file or a comma-delimited (.csv) file with the following format.

WellFluorTargetContentSampleBiological Set NameCqCq MeanCq Std. Dev
A01HEXInternal ControlUnkn-1Fusion30.8251246930.81717730.193826672
A01Texas RedGene AUnkn-1FusionNaN00

To be able to complete the FDA sections above, data from the different runs need to be combined together. So I manually added an additional column with a unique run name to every run, which allows me to combine the runs together and still distinguish between wells.

RunWellFluorTargetContentSampleBiological Set NameCqCq MeanCq Std. Dev
VP-xxxx-yyy_001A01HEXInternal ControlUnkn-1Fusion30.8251246930.81717730.193826672
VP-xxxx-yyy_001A01Texas RedGene AUnkn-1FusionNaN00

Sometimes we have to exclude a few wells due to operator or technical errors.In a separate file that called “wells_to_exclude.csv”, I listed the Run and Well ID.

VP-xxxx-yyy_001A08Operator error
VP-xxxx-yyy_002A12Forgot to add template

I can remove these wells with the following code.

For all of my analyses, the first 10 lines of my R code is the same: the data from the PCR runs are read into a data frame and then bad wells are excluded.

Here’s the code:

# libraries that I use a lot
library (tolerance)
library (plyr)
library (EnvStats)

# folder to print all my output
folder = “C:\\Users\\pauline\\Documents\\Fusion\\AnalyticalSpecificity\\”

# folder containing all of the runs combined for this particular experiment.
# See format in Table 2 above.
filename = “C:\\Users\\pauline\\Documents\\Fusion\\EDTA_EtOH_combined.csv”

# read in data
whole_df <- read.csv (filename, header=TRUE)

# make a unique id combining the Run ID and Well ID
whole_df[,”UniqueID”] = paste (whole_df[,”Run”],whole_df[,”Well”])

# remove more problematic wells
df_problem <- read.csv (“C:\\Users\\pauline\\Documents\\Fusion\\wells_to_exclude.csv”, header=TRUE)
problem_samples <- paste (df_problem[,”Run”], df_problem[,”Well”])
whole_df <- whole_df[!(whole_df$UniqueID %in% problem_samples), ]

whole_df is a data frame containing all the data, excluding the problematic wells.

To call a fusion, we use ΔCt. For the same well, we have to calculate ΔCt = FAM Ct – Texas Red Ct.

In all my scripts, I’ll create a new data frame called “channels”, where ΔCt is calculated by merging FAM and Texas Red into to the same row (based on run and well), and then doing the subtraction.

# get Texas Red values & other important stuff
df_TexasRed <- df[df$Fluor == "Texas Red", c("UniqueID", "Fluor", "Target", "Sample", "Spikein_Level", "Sample_no_number", "Content", "Cq")]

# get FAM values only
df_FAM <- df[df$Fluor == "FAM", c("UniqueID", "Fluor", "Cq")]

# merge into same row, based on UniqueID (run & well)
channels <- merge (df_TexasRed ,df_FAM, by="UniqueID")

# calculate deltaCq
channels[,"deltaCq"] <- channels["Cq.y"] - channels["Cq.x"]

If I'm lazy and want to use the same functions as the FAM and Texas Red channels which use the "Ct column", I might also create channels["Ct"], and assign the column ΔCt values.

No comment yet

1 ping

  1. FDA Validation of Companion Diagnostic (PCR test) – Part 1 / n » Pauline PI - investigating science, math, and biology says:

    […] validation is a long process that takes a few hundred experiments to examine: Part 2. Pre-processing Part 3. Analytical Specificity Part 4. Accuracy Part 5. Run Control Specification Part 6. […]

Comments have been disabled.