FDA Validation of a PCR Test: Precision (Repeatability & Reproducibility)

Precision measures the consistancy of results. Precision is not accuracy. While accuracy is getting the answer right, precision is about getting the same answer over and over again. You don’t want to get an answer one day, and the next day, because it’s raining or the lab technician didn’t have their cup of coffee, it’s another answer!

A test needs to be accurate and precise. Accuracy was addressed in another post, this post will assess precision.

To assess precision, the lab runs the same sample under different conditions — different operators, machines, days, batches, etc. We’re trying to see that there are no significant differences, otherwise we need to address it. For example, if there’s a difference between machines, then this suggests the machines need to be re-calibrated.

ΔCt from two batches. Batch 1 and 2 look equivalent, as seen by the red boxplot (Batch 1) pairing nicely with a blue boxplot (Batch 2) for a given concentration.

x-axis labels describe the different concentrations below, at, or near the limit of detection (LoD).  FFPE indicates Formaldehyde Fixed-Paraffin Embedded samples.

ΔCt between the two instruments. Instrument 2 gives slightly higher values than Instrument 1, is this significant? The numbers will tell…

ΔCt between the two operators. The boxplots from the 2 different operators are similar  — nice job, guys!

ΔCt measures across the different runs. It’s a little noisy — ANOVA will tell us whether this is significant.

ΔCt measures across the different wells. For the most part, the boxplots at the different concentrations are grouped together.

While a picture is worth a thousand words, are these differences significant?  To explore this, we’ll use ANOVA or analysis of variance.

ANOVA stands for “analysis of variance” and is used to analyze the differences among groups. We expect the sample concentrations to contribute to differences in ΔCt. We hope that other factors like instrument and batch do not. If they do, something is wrong.

Here are some example ANOVA results.

 VariancePrecisionLower 95% CIUpper 95% CIp-value
Sample34.295.8611476.723 *10-141
Sample-Operator Interaction0.

It’s expected that sample concentrations contribute a lot to the variance (p =3*10^-141). The variance that the operators contribute is small (0.02) and not significant p=0.50. There is no interaction between the sample concentration and the operator.

 VariancePrecisionLower 95% CIUpper 95% CIp-value
Sample3.891.971.2554.071.2 * 10-43
Sample-Run Interaction0.

The variance that the run contributes is small (0.09) but significant p=0.001.

Here is the code that generates the statistics:

First, label the data by the different effects
In the data frame, we have a column labelled “Run”, and takes on values “Run1” and “Run2”. There is another column labelled “Well” and contains the well values.

effects <- c("Run", "Well", "Batch", "Operator")

Loop through the for loop to do ANOVA for each effect (variable).

for (effect in effects) {
 anova_results <- Anova (aov (deltaCq ~ factor (Sample) * factor ( eval (as.name (effect)) ),  
   data=cleaned_channels), type = "III", singular.ok=TRUE)
  anova_stats <- anova_stat_function (anova_results)


The anova_stat_function calculates the metrics seen in the tables:

anova_stat_function <- function (anova_results) {
   anova_df <- as.data.frame(matrix(nrow = nrow (anova_results) ))
   row.names (anova_df) <- row.names (anova_results)

   sum_of_squares_column = 1
   degrees_of_freedom_column = 2
   anova_df["Variance"] = anova_results[sum_of_squares_column]/anova_results[degrees_of_freedom_column]
   anova_df["Degrees of Freedom"] <- anova_results[degrees_of_freedom_column]
   anova_df["Precision"] = sqrt (anova_df["Variance"])
   # confidence interval is sum of squares divided by chi square
   chisq97.5 <- qchisq (0.975, anova_results[[degrees_of_freedom_column]])
   chisq2.5 <- qchisq (0.025, anova_results[[degrees_of_freedom_column]])
   anova_df["Lower 95% CI"] = anova_results[sum_of_squares_column]/chisq97.5
   anova_df["Upper 95% CI"] = anova_results[sum_of_squares_column]/chisq2.5

   return (anova_df)

Complete code can be found at Github under Rscripts_repeatability.txt and Rscripts_reproducibility.txt