]> nmode's Git Repositories - Rnaught/blob - R/wp.R
Update docs
[Rnaught] / R / wp.R
1 #' White and Pagano (WP)
2 #'
3 #' This function implements an R0 estimation due to White and Pagano (Statistics
4 #' in Medicine, 2008). The method is based on maximum likelihood estimation in a
5 #' Poisson transmission model. See details for important implementation notes.
6 #'
7 #' This method is based on a Poisson transmission model, and hence may be most
8 #' most valid at the beginning of an epidemic. In their model, the serial
9 #' distribution is assumed to be discrete with a finite number of possible
10 #' values. In this implementation, if `mu` is not `NA`, the serial distribution
11 #' is taken to be a discretized version of a gamma distribution with shape
12 #' parameter `1` and scale parameter `mu` (and hence mean `mu`). When `mu` is
13 #' `NA`, the function implements a grid search algorithm to find the maximum
14 #' likelihood estimator over all possible gamma distributions with unknown shape
15 #' and scale, restricting these to a prespecified grid (see the parameters
16 #' `grid_length`, `max_shape` and `max_scale`). In both cases, the largest value
17 #' of the support is chosen such that the cumulative distribution function of
18 #' the original (pre-discretized) gamma distribution has cumulative probability
19 #' of no more than 0.999 at this value.
20 #'
21 #' When the serial distribution is known (i.e., `mu` is not `NA`), sensitivity
22 #' testing of `mu` is strongly recommended. If the serial distribution is
23 #' unknown (i.e., `mu` is `NA`), the likelihood function can be flat near the
24 #' maximum, resulting in numerical instability of the optimizer. When `mu` is
25 #' `NA`, the implementation takes considerably longer to run. Users should be
26 #' careful about units of time (e.g., are counts observed daily or weekly?) when
27 #' implementing.
28 #'
29 #' The model developed in White and Pagano (2008) is discrete, and hence the
30 #' serial distribution is finite discrete. In our implementation, the input
31 #' value `mu` is that of a continuous distribution. The algorithm discretizes
32 #' this input, and so the mean of the estimated serial distribution returned
33 #' (when `serial` is set to `TRUE`) will differ from `mu` somewhat. That is to
34 #' say, if the user notices that the input `mu` and the mean of the estimated
35 #' serial distribution are different, this is to be expected, and is caused by
36 #' the discretization.
37 #'
38 #' @param cases Vector of case counts. The vector must be of length at least two
39 #' and only contain positive integers.
40 #' @param mu Mean of the serial distribution (defaults to `NA`). This must be a
41 #' positive number or `NA`. If a number is specified, the value should match
42 #' the case counts in time units. For example, if case counts are weekly and
43 #' the serial distribution has a mean of seven days, then `mu` should be set
44 #' to `1`. If case counts are daily and the serial distribution has a mean of
45 #' seven days, then `mu` should be set to `7`.
46 #' @param serial Whether to return the estimated serial distribution in addition
47 #' to the estimate of R0 (defaults to `FALSE`). This must be a value identical
48 #' to `TRUE` or `FALSE`.
49 #' @param grid_length The length of the grid in the grid search (defaults to
50 #' 100). This must be a positive integer. It will only be used if `mu` is set
51 #' to `NA`. The grid search will go through all combinations of the shape and
52 #' scale parameters for the gamma distribution, which are `grid_length` evenly
53 #' spaced values from `0` (exclusive) to `max_shape` and `max_scale`
54 #' (inclusive), respectively. Note that larger values will result in a longer
55 #' search time.
56 #' @param max_shape The largest possible value of the shape parameter in the
57 #' grid search (defaults to 10). This must be a positive number. It will only
58 #' be used if `mu` is set to `NA`. Note that larger values will result in a
59 #' longer search time, and may cause numerical instabilities.
60 #' @param max_scale The largest possible value of the scale parameter in the
61 #' grid search (defaults to 10). This must be a positive number. It will only
62 #' be used if `mu` is set to `NA`. Note that larger values will result in a
63 #' longer search time, and may cause numerical instabilities.
64 #'
65 #' @return If `serial` is identical to `TRUE`, a list containing the following
66 #' components is returned:
67 #' * `r0` - the estimate of R0
68 #' * `supp` - the support of the estimated serial distribution
69 #' * `pmf` - the probability mass function of the estimated serial
70 #' distribution
71 #'
72 #' Otherwise, if `serial` is identical to `FALSE`, only the estimate of R0 is
73 #' returned.
74 #'
75 #' @references [White and Pagano (Statistics in Medicine, 2008)](
76 #' https://doi.org/10.1002/sim.3136)
77 #'
78 #' @seealso `vignette("wp_serial", package="Rnaught")` for examples of using the
79 #' serial distribution.
80 #'
81 #' @importFrom stats pgamma qgamma
82 #'
83 #' @export
84 #'
85 #' @examples
86 #' # Weekly data.
87 #' cases <- c(1, 4, 10, 5, 3, 4, 19, 3, 3, 14, 4)
88 #'
89 #' # Obtain R0 when the serial distribution has a mean of five days.
90 #' wp(cases, mu = 5 / 7)
91 #'
92 #' # Obtain R0 when the serial distribution has a mean of three days.
93 #' wp(cases, mu = 3 / 7)
94 #'
95 #' # Obtain R0 when the serial distribution is unknown.
96 #' # Note that this will take longer to run than when `mu` is known.
97 #' wp(cases)
98 #'
99 #' # Same as above, but specify custom grid search parameters. The larger any of
100 #' # the parameters, the longer the search will take, but with potentially more
101 #' # accurate estimates.
102 #' wp(cases, grid_length = 40, max_shape = 4, max_scale = 4)
103 #'
104 #' # Return the estimated serial distribution in addition to the estimate of R0.
105 #' estimate <- wp(cases, serial = TRUE)
106 #'
107 #' # Display the estimate of R0, as well as the support and probability mass
108 #' # function of the estimated serial distribution returned by the grid search.
109 #' estimate$r0
110 #' estimate$supp
111 #' estimate$pmf
112 wp <- function(cases, mu = NA, serial = FALSE,
113 grid_length = 100, max_shape = 10, max_scale = 10) {
114 if (is.na(mu)) {
115 search <- wp_search(cases, grid_length, max_shape, max_scale)
116 r0 <- search$r0
117 serial_supp <- search$supp
118 serial_pmf <- search$pmf
119 } else {
120 max_range <- ceiling(qgamma(0.999, shape = 1, scale = mu))
121 serial_supp <- seq_len(max_range)
122 serial_pmf <- diff(pgamma(0:max_range, shape = 1, scale = mu))
123 serial_pmf <- serial_pmf / sum(serial_pmf)
124 r0 <- sum(cases[-1]) / sum(wp_mu_t_sigma(cases, serial_pmf))
125 }
126
127 if (!serial) {
128 return(r0)
129 }
130 list(r0 = r0, supp = serial_supp, pmf = serial_pmf)
131 }
132
133 #' White and Pagano (WP) Grid Search
134 #'
135 #' This is a background/internal function called by [wp()]. It computes the
136 #' maximum likelihood estimator of R0 assuming that the serial distribution is
137 #' unknown (i.e., [wp()] is called with `mu` set to `NA`) but comes from a
138 #' discretized gamma distribution. The function implements a simple grid search
139 #' to obtain the maximum likelihood estimator of R0 as well as the gamma
140 #' parameters.
141 #'
142 #' @param cases Vector of case counts.
143 #' @param grid_length The length of the grid in the grid search.
144 #' @param max_shape The largest possible value of the shape parameter in the
145 #' grid search.
146 #' @param max_scale The largest possible value of the scale parameter in the
147 #' grid search.
148 #'
149 #' @return A list containing the following components is returned:
150 #' * `r0` - the estimate of R0
151 #' * `supp` - the support of the estimated serial distribution
152 #' * `pmf` - the probability mass function of the estimated serial
153 #' distribution
154 #'
155 #' @references [White and Pagano (Statistics in Medicine, 2008)](
156 #' https://doi.org/10.1002/sim.3136)
157 #'
158 #' @seealso [wp()] for the function in which this grid search is called.
159 #'
160 #' @importFrom stats pgamma qgamma
161 #'
162 #' @noRd
163 wp_search <- function(cases, grid_length, max_shape, max_scale) {
164 shapes <- seq(0, max_shape, length.out = grid_length + 1)[-1]
165 scales <- seq(0, max_scale, length.out = grid_length + 1)[-1]
166
167 best_log_like <- -Inf
168 best_serial_pmf <- NA
169 best_max_range <- NA
170 r0 <- NA
171
172 for (i in seq_len(grid_length)) {
173 for (j in seq_len(grid_length)) {
174 max_range <- ceiling(qgamma(0.999, shape = shapes[i], scale = scales[j]))
175
176 serial_pmf <- diff(
177 pgamma(0:max_range, shape = shapes[i], scale = scales[j])
178 )
179 serial_pmf <- serial_pmf / sum(serial_pmf)
180
181 mu_t_sigma <- wp_mu_t_sigma(cases, serial_pmf)
182 mle <- sum(cases[-1]) / sum(mu_t_sigma)
183 mu_t <- mle * mu_t_sigma
184
185 log_like <- sum(cases[-1] * log(mu_t)) - sum(mu_t)
186 if (log_like > best_log_like) {
187 best_log_like <- log_like
188 best_serial_pmf <- serial_pmf
189 best_max_range <- max_range
190 r0 <- mle
191 }
192 }
193 }
194
195 list(r0 = r0, supp = seq_len(best_max_range), pmf = best_serial_pmf)
196 }
197
198 #' White and Pagano (WP) Mu Function Helper
199 #'
200 #' This is a background/internal function called by [wp()] and [wp_search()]. It
201 #' computes the sum inside the function `mu(t)`, which is present in the log
202 #' likelihood function. See the referenced article for more details.
203 #'
204 #' @param cases Vector of case counts.
205 #' @param serial_pmf The probability mass function of the serial distribution.
206 #'
207 #' @return The sum inside the function `mu(t)` of the log likelihood.
208 #'
209 #' @references [White and Pagano (Statistics in Medicine, 2008)](
210 #' https://doi.org/10.1002/sim.3136)
211 #'
212 #' @seealso [wp()] and [wp_search()] for the functions which require this sum.
213 #'
214 #' @noRd
215 wp_mu_t_sigma <- function(cases, serial_pmf) {
216 mu_t_sigma <- rep(0, length(cases) - 1)
217 for (i in seq_len(length(cases) - 1)) {
218 mu_t_sigma[i] <- sum(
219 serial_pmf[seq_len(min(length(serial_pmf), i))] *
220 cases[i:max(1, i - length(serial_pmf) + 1)]
221 )
222 }
223 mu_t_sigma
224 }