Extrapolate a set of quantiles at new quantile levels: parametric in the tails, nonparametric in the middle.
quantile_extrapolate(
tau,
qvals,
tau_out = c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99),
sort = TRUE,
iso = FALSE,
nonneg = FALSE,
round = FALSE,
qfun_left = qnorm,
qfun_right = qnorm,
n_tau_left = 1,
n_tau_right = 1,
middle = c("cubic", "linear"),
param0 = NULL,
param1 = NULL,
grid_size = 1000,
tol = 0.01,
max_iter = 10
)
Vector of quantile levels. Assumed to be distinct, and sorted in increasing order.
Vector or matrix quantiles; if a matrix, each row is a separate
set of quantiles, at the same (common) quantile levels, given by
tau
.
Vector of quantile levels at which to perform extrapolation. Default is a sequence of 23 quantile levels from 0.01 to 0.99.
Should the returned quantile estimates be sorted? Default is TRUE.
Should the returned quantile estimates be passed through isotonic
regression? Default is FALSE; if TRUE, takes priority over sort
.
Should the returned quantile estimates be truncated at 0? Natural for count data. Default is FALSE.
Should the returned quantile estimates be rounded? Natural for count data. Default is FALSE.
Quantile functions on which to base extrapolation
in the left and right tails, respectively; each must be a function whose
first two arguments are a quantile level and a distribution parameter (such
as a mean parameter); these are assumed to be vectorized in the first
argument when the second argument is fixed, and also vectorized in the
second argument when the first argument is fixed. Default is
qnorm
. See details for further explanation.
Integers between 1 and the length of
tau
, indicating how many elements quantile levels from the left and
right ends, respectively, to use in defining the tails. For example, if
n_tau_left=1
, the default, then only the leftmost quantile is used
for the left tail extrapolation; if n_tau_left=2
, then the two
leftmost quantiles are used, etc; and similarly for n_tau_right
. See
details for further explanation.
One of "cubic" or "linear", indicating the interpolation method
to use in the middle (outside of the tails, as determined by
n_tau_left
and n_tau_right
). If "cubic", the default, then a
monotone cubic spline interpolant is fit to the given quantiles, and used
to estimate quantiles in the middle. If "linear", then linear interpolation
is used to estimate quantiles in the middle.
Arguments for the algorithm used for parameter-fitting for tail extrapolation. See details.
A matrix of dimension (number of rows in qvals
) x (length of
tau_out
), where each row is the extrapolation of the set of
quantiles in the corresponding row of qvals
, at the quantile levels
specified in tau_out
.
This function interpolates/extrapolates an initial sparser set of quantiles, say \(q_1,\ldots,q_m\) at the levels \(\tau_1 < \ldots < \tau_m\) into a denser set of quantiles, say \(q_1^*,\ldots,q^*_n\) at the levels \(\tau^*_1 < \ldots < \tau^*_n\). At a high-level, the strategy is to nonparametrically interpolate the quantiles whose levels fall in the interval \([\tau_1, \tau_m]\), and parametrically extrapolate the quantiles whose levels fall in \([0, \tau_1)\) or \((\tau_m, 1]\). Let us call these the "middle" and "tail" strategies, respectively.
To give more details on the middle strategy: a monotone spline
interpolant---either a cubic spline (if middle="cubic"
) or linear
spline interpolant (if middle="linear"
)---is fit to the points
$$(\tau_i,q_i), \; i=1,\ldots,m.$$
Denoting \(f\) by this interpolant, we then set
$$q^*_i = f(\tau^*_i), \;\; \tau^*_i \in [\tau_1, \tau_m].$$
To give more details on the tail strategy: in each tail, left and right,
the user specifies a tail function \(q(\tau; \theta)\) which depends on a
parameter \(\theta\). This is done via the functions qfun_left
and qfun_right
; the default is qnorm
for both, in which case
\(\theta\) represents the mean of the normal distribution (and the
standard deviation is fixed at 1, as per the default in
qnorm
). Given this tail function, we then find the parameter value
\(\hat\theta\) that best matches the given quantile, and use this for
extrapolation. That is, for the left tail, we first fit \(\hat\theta\)
such that
$$q(\tau_1; \hat\theta) \approx q_1$$
and we then set
$$q^*_i = q(\tau^*_i; \hat\theta), \;\; \tau^*_i < \tau_1.$$
The right tail is similar.
The fitting algorithm used for determining \(\hat\theta\) in each tail is
a kind of iterative grid search that proceeds in "rounds". The arguments
param0,param1
give the left and right endpoints of the initial
interval used in the first round of the search---this interval typically
contracts as the rounds proceed, but can also expand as needed; the
argument grid_size
is the number of grid points to consider in each
round; the argument tol
is the error tolerance for stopping; and the
argument max_iter
is the maximum number of rounds to consider. This
fitting algorithm is robust to the case when the optimal parameter value
that matches the given quantile, as per the above display, is not unqiue;
in this case we take the mean of the range of optimal parameter values.
Finally, when the arguments n_tau_left
and n_tau_right
are
changed from their defaults, then this changes the definition of the
"middle" and the "tail" ranges, but otherwise the analogous strategies are
employed. In fact, the middle strategy is unchanged, just applied to a
different range. The tail strategy is similar, but now in each tail, left
and right, we fit a separate parameter value \(\hat\theta\) for each
given quantile level in the tail range (for example, for each of the two
leftmost quantile levels if ntau_left=2
), and then take the mean of
these parameters as a single parameter value on which to base tail
extrapolation.