]> nmode's Git Repositories - Rnaught/blob - R/wp.R
Update web app entry point
[Rnaught] / R / wp.R
1 #' White and Pagano (WP)
2 #'
3 #' This function implements an R0 estimation due to White and Pagano (Statistics
4 #' in Medicine, 2008). The method is based on maximum likelihood estimation in a
5 #' Poisson transmission model. See details for important implementation notes.
6 #'
7 #' This method is based on a Poisson transmission model, and hence may be most
8 #' most valid at the beginning of an epidemic. In their model, the serial
9 #' distribution is assumed to be discrete with a finite number of possible
10 #' values. In this implementation, if `mu` is not `NA`, the serial distribution
11 #' is taken to be a discretized version of a gamma distribution with shape
12 #' parameter `1` and scale parameter `mu` (and hence mean `mu`). When `mu` is
13 #' `NA`, the function implements a grid search algorithm to find the maximum
14 #' likelihood estimator over all possible gamma distributions with unknown shape
15 #' and scale, restricting these to a prespecified grid (see the parameters
16 #' `grid_length`, `max_shape` and `max_scale`). In both cases, the largest value
17 #' of the support is chosen such that the cumulative distribution function of
18 #' the original (pre-discretized) gamma distribution has cumulative probability
19 #' of no more than 0.999 at this value.
20 #'
21 #' When the serial distribution is known (i.e., `mu` is not `NA`), sensitivity
22 #' testing of `mu` is strongly recommended. If the serial distribution is
23 #' unknown (i.e., `mu` is `NA`), the likelihood function can be flat near the
24 #' maximum, resulting in numerical instability of the optimizer. When `mu` is
25 #' `NA`, the implementation takes considerably longer to run. Users should be
26 #' careful about units of time (e.g., are counts observed daily or weekly?) when
27 #' implementing.
28 #'
29 #' The model developed in White and Pagano (2008) is discrete, and hence the
30 #' serial distribution is finite discrete. In our implementation, the input
31 #' value `mu` is that of a continuous distribution. The algorithm discretizes
32 #' this input, and so the mean of the estimated serial distribution returned
33 #' (when `serial` is set to `TRUE`) will differ from `mu` somewhat. That is to
34 #' say, if the user notices that the input `mu` and the mean of the estimated
35 #' serial distribution are different, this is to be expected, and is caused by
36 #' the discretization.
37 #'
38 #' @param cases Vector of case counts. The vector must be of length at least two
39 #' and only contain positive integers.
40 #' @param mu Mean of the serial distribution (defaults to `NA`). This must be a
41 #' positive number or `NA`. If a number is specified, the value should match
42 #' the case counts in time units. For example, if case counts are weekly and
43 #' the serial distribution has a mean of seven days, then `mu` should be set
44 #' to `1`. If case counts are daily and the serial distribution has a mean of
45 #' seven days, then `mu` should be set to `7`.
46 #' @param serial Whether to return the estimated serial distribution in addition
47 #' to the estimate of R0 (defaults to `FALSE`). This must be a value identical
48 #' to `TRUE` or `FALSE`.
49 #' @param grid_length The length of the grid in the grid search (defaults to
50 #' 100). This must be a positive integer. It will only be used if `mu` is set
51 #' to `NA`. The grid search will go through all combinations of the shape and
52 #' scale parameters for the gamma distribution, which are `grid_length` evenly
53 #' spaced values from `0` (exclusive) to `max_shape` and `max_scale`
54 #' (inclusive), respectively. Note that larger values will result in a longer
55 #' search time.
56 #' @param max_shape The largest possible value of the shape parameter in the
57 #' grid search (defaults to 10). This must be a positive number. It will only
58 #' be used if `mu` is set to `NA`. Note that larger values will result in a
59 #' longer search time, and may cause numerical instabilities.
60 #' @param max_scale The largest possible value of the scale parameter in the
61 #' grid search (defaults to 10). This must be a positive number. It will only
62 #' be used if `mu` is set to `NA`. Note that larger values will result in a
63 #' longer search time, and may cause numerical instabilities.
64 #'
65 #' @return If `serial` is identical to `TRUE`, a list containing the following
66 #' components is returned:
67 #' * `r0` - the estimate of R0
68 #' * `supp` - the support of the estimated serial distribution
69 #' * `pmf` - the probability mass function of the estimated serial
70 #' distribution
71 #'
72 #' Otherwise, if `serial` is identical to `FALSE`, only the estimate of R0 is
73 #' returned.
74 #'
75 #' @references [White and Pagano (Statistics in Medicine, 2008)](
76 #' https://doi.org/10.1002/sim.3136)
77 #'
78 #' @seealso `vignette("wp_serial", package="Rnaught")` for examples of using the
79 #' serial distribution.
80 #'
81 #' @importFrom stats pgamma qgamma
82 #'
83 #' @export
84 #'
85 #' @examples
86 #' # Weekly data.
87 #' cases <- c(1, 4, 10, 5, 3, 4, 19, 3, 3, 14, 4)
88 #'
89 #' # Obtain R0 when the serial distribution has a mean of five days.
90 #' wp(cases, mu = 5 / 7)
91 #'
92 #' # Obtain R0 when the serial distribution has a mean of three days.
93 #' wp(cases, mu = 3 / 7)
94 #'
95 #' # Obtain R0 when the serial distribution is unknown.
96 #' # Note that this will take longer to run than when `mu` is known.
97 #' wp(cases)
98 #'
99 #' # Same as above, but specify custom grid search parameters. The larger any of
100 #' # the parameters, the longer the search will take, but with potentially more
101 #' # accurate estimates.
102 #' wp(cases, grid_length = 40, max_shape = 4, max_scale = 4)
103 #'
104 #' # Return the estimated serial distribution in addition to the estimate of R0.
105 #' estimate <- wp(cases, serial = TRUE)
106 #'
107 #' # Display the estimate of R0, as well as the support and probability mass
108 #' # function of the estimated serial distribution returned by the grid search.
109 #' estimate$r0
110 #' estimate$supp
111 #' estimate$pmf
112 wp <- function(cases, mu = NA, serial = FALSE,
113 grid_length = 100, max_shape = 10, max_scale = 10) {
114 validate_cases(cases, min_length = 2, min_count = 1)
115 if (!identical(serial, TRUE) && !identical(serial, FALSE)) {
116 stop(
117 paste("The serial distribution flag (`serial`) must be set to",
118 "`TRUE` or `FALSE`."
119 ), call. = FALSE
120 )
121 }
122
123 if (identical(mu, NA)) {
124 if (!is_integer(grid_length) || grid_length < 1) {
125 stop("The grid length must be a positive integer.", call. = FALSE)
126 }
127 if (!is_real(max_shape) || max_shape <= 0) {
128 stop(
129 paste("The largest value of the shape parameter (`max_shape`)",
130 "must be a positive number."
131 ), call. = FALSE
132 )
133 }
134 if (!is_real(max_scale) || max_scale <= 0) {
135 stop(
136 paste("The largest value of the scale parameter (`max_scale`)",
137 "must be a positive number."
138 ), call. = FALSE
139 )
140 }
141
142 search <- wp_search(cases, grid_length, max_shape, max_scale)
143 r0 <- search$r0
144 serial_supp <- search$supp
145 serial_pmf <- search$pmf
146 } else {
147 if (!is_real(mu) || mu <= 0) {
148 stop("The serial interval (`mu`) must be a positive number or `NA`.",
149 call. = FALSE
150 )
151 }
152
153 max_range <- ceiling(qgamma(0.999, shape = 1, scale = mu))
154 serial_supp <- seq_len(max_range)
155 serial_pmf <- diff(pgamma(0:max_range, shape = 1, scale = mu))
156 serial_pmf <- serial_pmf / sum(serial_pmf)
157 r0 <- sum(cases[-1]) / sum(wp_mu_t_sigma(cases, serial_pmf))
158 }
159
160 if (!serial) {
161 return(r0)
162 }
163 list(r0 = r0, supp = serial_supp, pmf = serial_pmf)
164 }
165
166 #' White and Pagano (WP) Grid Search
167 #'
168 #' This is a background/internal function called by [wp()]. It computes the
169 #' maximum likelihood estimator of R0 assuming that the serial distribution is
170 #' unknown (i.e., [wp()] is called with `mu` set to `NA`) but comes from a
171 #' discretized gamma distribution. The function implements a simple grid search
172 #' to obtain the maximum likelihood estimator of R0 as well as the gamma
173 #' parameters.
174 #'
175 #' @param cases Vector of case counts.
176 #' @param grid_length The length of the grid in the grid search.
177 #' @param max_shape The largest possible value of the shape parameter in the
178 #' grid search.
179 #' @param max_scale The largest possible value of the scale parameter in the
180 #' grid search.
181 #'
182 #' @return A list containing the following components is returned:
183 #' * `r0` - the estimate of R0
184 #' * `supp` - the support of the estimated serial distribution
185 #' * `pmf` - the probability mass function of the estimated serial
186 #' distribution
187 #'
188 #' @references [White and Pagano (Statistics in Medicine, 2008)](
189 #' https://doi.org/10.1002/sim.3136)
190 #'
191 #' @seealso [wp()] for the function in which this grid search is called.
192 #'
193 #' @importFrom stats pgamma qgamma
194 #'
195 #' @noRd
196 wp_search <- function(cases, grid_length, max_shape, max_scale) {
197 shapes <- seq(0, max_shape, length.out = grid_length + 1)[-1]
198 scales <- seq(0, max_scale, length.out = grid_length + 1)[-1]
199
200 best_log_like <- -Inf
201 best_serial_pmf <- NA
202 best_max_range <- NA
203 r0 <- NA
204
205 for (i in seq_len(grid_length)) {
206 for (j in seq_len(grid_length)) {
207 max_range <- ceiling(qgamma(0.999, shape = shapes[i], scale = scales[j]))
208
209 serial_pmf <- diff(
210 pgamma(0:max_range, shape = shapes[i], scale = scales[j])
211 )
212 serial_pmf <- serial_pmf / sum(serial_pmf)
213
214 mu_t_sigma <- wp_mu_t_sigma(cases, serial_pmf)
215 mle <- sum(cases[-1]) / sum(mu_t_sigma)
216 mu_t <- mle * mu_t_sigma
217
218 log_like <- sum(cases[-1] * log(mu_t)) - sum(mu_t)
219 if (log_like > best_log_like) {
220 best_log_like <- log_like
221 best_serial_pmf <- serial_pmf
222 best_max_range <- max_range
223 r0 <- mle
224 }
225 }
226 }
227
228 list(r0 = r0, supp = seq_len(best_max_range), pmf = best_serial_pmf)
229 }
230
231 #' White and Pagano (WP) Mu Function Helper
232 #'
233 #' This is a background/internal function called by [wp()] and [wp_search()]. It
234 #' computes the sum inside the function `mu(t)`, which is present in the log
235 #' likelihood function. See the referenced article for more details.
236 #'
237 #' @param cases Vector of case counts.
238 #' @param serial_pmf The probability mass function of the serial distribution.
239 #'
240 #' @return The sum inside the function `mu(t)` of the log likelihood.
241 #'
242 #' @references [White and Pagano (Statistics in Medicine, 2008)](
243 #' https://doi.org/10.1002/sim.3136)
244 #'
245 #' @seealso [wp()] and [wp_search()] for the functions which require this sum.
246 #'
247 #' @noRd
248 wp_mu_t_sigma <- function(cases, serial_pmf) {
249 mu_t_sigma <- rep(0, length(cases) - 1)
250 for (i in seq_len(length(cases) - 1)) {
251 mu_t_sigma[i] <- sum(
252 serial_pmf[seq_len(min(length(serial_pmf), i))] *
253 cases[i:max(1, i - length(serial_pmf) + 1)]
254 )
255 }
256 mu_t_sigma
257 }