Extrapolate a set of quantiles at new quantile levels: parametric in the tails, nonparametric in the middle.

quantile_extrapolate(
  tau,
  qvals,
  tau_out = c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99),
  sort = TRUE,
  iso = FALSE,
  nonneg = FALSE,
  round = FALSE,
  qfun_left = qnorm,
  qfun_right = qnorm,
  n_tau_left = 1,
  n_tau_right = 1,
  middle = c("cubic", "linear"),
  param0 = NULL,
  param1 = NULL,
  grid_size = 1000,
  tol = 0.01,
  max_iter = 10
)

Arguments

tau

Vector of quantile levels. Assumed to be distinct, and sorted in increasing order.

qvals

Vector or matrix quantiles; if a matrix, each row is a separate set of quantiles, at the same (common) quantile levels, given by tau.

tau_out

Vector of quantile levels at which to perform extrapolation. Default is a sequence of 23 quantile levels from 0.01 to 0.99.

sort

Should the returned quantile estimates be sorted? Default is TRUE.

iso

Should the returned quantile estimates be passed through isotonic regression? Default is FALSE; if TRUE, takes priority over sort.

nonneg

Should the returned quantile estimates be truncated at 0? Natural for count data. Default is FALSE.

round

Should the returned quantile estimates be rounded? Natural for count data. Default is FALSE.

qfun_left, qfun_right

Quantile functions on which to base extrapolation in the left and right tails, respectively; each must be a function whose first two arguments are a quantile level and a distribution parameter (such as a mean parameter); these are assumed to be vectorized in the first argument when the second argument is fixed, and also vectorized in the second argument when the first argument is fixed. Default is qnorm. See details for further explanation.

n_tau_left, n_tau_right

Integers between 1 and the length of tau, indicating how many elements quantile levels from the left and right ends, respectively, to use in defining the tails. For example, if n_tau_left=1, the default, then only the leftmost quantile is used for the left tail extrapolation; if n_tau_left=2, then the two leftmost quantiles are used, etc; and similarly for n_tau_right. See details for further explanation.

middle

One of "cubic" or "linear", indicating the interpolation method to use in the middle (outside of the tails, as determined by n_tau_left and n_tau_right). If "cubic", the default, then a monotone cubic spline interpolant is fit to the given quantiles, and used to estimate quantiles in the middle. If "linear", then linear interpolation is used to estimate quantiles in the middle.

param0, param1, grid_size, tol, max_iter

Arguments for the algorithm used for parameter-fitting for tail extrapolation. See details.

Value

A matrix of dimension (number of rows in qvals) x (length of

tau_out), where each row is the extrapolation of the set of quantiles in the corresponding row of qvals, at the quantile levels specified in tau_out.

Details

This function interpolates/extrapolates an initial sparser set of quantiles, say \(q_1,\ldots,q_m\) at the levels \(\tau_1 < \ldots < \tau_m\) into a denser set of quantiles, say \(q_1^*,\ldots,q^*_n\) at the levels \(\tau^*_1 < \ldots < \tau^*_n\). At a high-level, the strategy is to nonparametrically interpolate the quantiles whose levels fall in the interval \([\tau_1, \tau_m]\), and parametrically extrapolate the quantiles whose levels fall in \([0, \tau_1)\) or \((\tau_m, 1]\). Let us call these the "middle" and "tail" strategies, respectively.

To give more details on the middle strategy: a monotone spline interpolant---either a cubic spline (if middle="cubic") or linear spline interpolant (if middle="linear")---is fit to the points $$(\tau_i,q_i), \; i=1,\ldots,m.$$ Denoting \(f\) by this interpolant, we then set $$q^*_i = f(\tau^*_i), \;\; \tau^*_i \in [\tau_1, \tau_m].$$

To give more details on the tail strategy: in each tail, left and right, the user specifies a tail function \(q(\tau; \theta)\) which depends on a parameter \(\theta\). This is done via the functions qfun_left and qfun_right; the default is qnorm for both, in which case \(\theta\) represents the mean of the normal distribution (and the standard deviation is fixed at 1, as per the default in qnorm). Given this tail function, we then find the parameter value \(\hat\theta\) that best matches the given quantile, and use this for extrapolation. That is, for the left tail, we first fit \(\hat\theta\) such that $$q(\tau_1; \hat\theta) \approx q_1$$ and we then set $$q^*_i = q(\tau^*_i; \hat\theta), \;\; \tau^*_i < \tau_1.$$ The right tail is similar.

The fitting algorithm used for determining \(\hat\theta\) in each tail is a kind of iterative grid search that proceeds in "rounds". The arguments param0,param1 give the left and right endpoints of the initial interval used in the first round of the search---this interval typically contracts as the rounds proceed, but can also expand as needed; the argument grid_size is the number of grid points to consider in each round; the argument tol is the error tolerance for stopping; and the argument max_iter is the maximum number of rounds to consider. This fitting algorithm is robust to the case when the optimal parameter value that matches the given quantile, as per the above display, is not unqiue; in this case we take the mean of the range of optimal parameter values.

Finally, when the arguments n_tau_left and n_tau_right are changed from their defaults, then this changes the definition of the "middle" and the "tail" ranges, but otherwise the analogous strategies are employed. In fact, the middle strategy is unchanged, just applied to a different range. The tail strategy is similar, but now in each tail, left and right, we fit a separate parameter value \(\hat\theta\) for each given quantile level in the tail range (for example, for each of the two leftmost quantile levels if ntau_left=2), and then take the mean of these parameters as a single parameter value on which to base tail extrapolation.