Identifying the best-fitting model category

Umut Caglar, Claus O. Wilke

2021-05-07

In high-throughput scenarios, we often don’t know a priori whether a given dataset represents a sigmoidal curve, a double-sigmoidal curve, or neither. In these cases, we need to fit multiple different models to the data and then determine which model fits the best. This process is normally done automatically by the function sicegar::fitAndCategorize(). In this vignette, we describe how this process works and how it can be done manually.

We will demonstrate this process for an artificial dataset representing a double-sigmoidal function. We first generate the data:

time <- seq(3, 24, 0.5)
noise_parameter <- 0.2
intensity_noise <- runif(n = length(time), min = 0, max = 1) * noise_parameter
intensity <- doublesigmoidalFitFormula(time,
                                       finalAsymptoteIntensityRatio = .3,
                                       maximum = 4,
                                       slope1Param = 1,
                                       midPoint1Param = 7,
                                       slope2Param = 1,
                                       midPointDistanceParam = 8)
intensity <- intensity + intensity_noise
dataInput <- data.frame(time, intensity)

We need to fit the sigmoidal and double-sigmoidal models to this dataset, using multipleFitFunction(). This requires first normalizing the data:

normalizedInput <- normalizeData(dataInput = dataInput, 
                                                dataInputName = "doubleSigmoidalSample")

# Fit sigmoidal model
sigmoidalModel <- multipleFitFunction(dataInput = normalizedInput,
                                          model = "sigmoidal",
                                          n_runs_min = 20,
                                          n_runs_max = 500,
                                          showDetails = FALSE)

# Fit double-sigmoidal model
doubleSigmoidalModel <- multipleFitFunction(dataInput = normalizedInput,
                                                model = "doublesigmoidal",
                                                n_runs_min = 20,
                                                n_runs_max = 500,
                                                showDetails = FALSE)

We also need to perform the additional parameter calculations, as these are required by the categorize() function we use below.

# Calculate additional parameters
sigmoidalModel <- parameterCalculation(sigmoidalModel)

# Calculate additional parameters
doubleSigmoidalModel <- parameterCalculation(doubleSigmoidalModel)

This is what the two fits look like:

Clearly the sigmoidal fit is not appropriate but the double-sigmoidal one is. Next we demonstrate how to arrive at this conclusion computationally, using the function categorize(). It takes as input the two fitted models as well as a number of parameters that are used in the decision process (explained below under “The decision process”).

# now we can categorize the fits
decisionProcess <- categorize(threshold_minimum_for_intensity_maximum = 0.3,
                                      threshold_intensity_range = 0.1,
                                      threshold_t0_max_int = 0.05,
                                      parameterVectorSigmoidal = sigmoidalModel,
                                      parameterVectorDoubleSigmoidal = doubleSigmoidalModel)

The object returned by categorize() contains extensive information about the decision process, but the key component is the decision variable. Here, it states that the data fits the double-sigmoidal model:

print(decisionProcess$decision)
## [1] "double_sigmoidal"

(The possible values here are “no_signal”, “sigmoidal”, “double_sigmoidal”, and “ambiguous”.)

The decision process

The decision process consists of two parts. First, the categorize() function checks whether all provided input data are valid. The steps of this verification are as follows:

After these steps, the primary decision process begins. It takes a list of four possible outcomes (“no_signal”, “sigmoidal”, “double_sigmoidal”, “ambiguous”) and systematically removes options until only one remains.

First, the algorithm checks if the provided data includes a signal or not.

Next the algorithm checks if the sigmoidal and double sigmoidal models make sense.

In step eight, the algorithm checks whether the data should be labelled as "ambiguous" or not.

In the last step; the algorithm checks whether the data should be labeled as "sigmoidal" or "double_sigmoidal".

The only option that is left at this point will be the label of the data and thus the final decision.