Quantile ensemble — quantile_ensemble • quantgen

Fit ensemble weights, given a set of quantile predictions.

quantile_ensemble(
  qarr,
  y,
  tau,
  weights = NULL,
  tau_groups = rep(1, length(tau)),
  intercept = FALSE,
  nonneg = TRUE,
  unit_sum = TRUE,
  noncross = TRUE,
  q0 = NULL,
  lp_solver = c("glpk", "gurobi"),
  time_limit = NULL,
  params = list(),
  verbose = FALSE
)

Arguments

qarr: Array of predicted quantiles, of dimension (number of prediction points) x (number or ensemble components) x (number of quantile levels).
y: Vector of responses (whose quantiles are being predicted by qarr).
tau: Vector of quantile levels at which predictions are made. Assumed to be distinct, and sorted in increasing order.
weights: Vector of observation weights (to be used in the loss function). Default is NULL, which is interpreted as a weight of 1 for each observation.
tau_groups: Vector of group labels, having the same length as tau. Common labels indicate that the ensemble weights for the corresponding quantile levels should be tied together. Default is rep(1,length(tau)), which means that a common set of ensemble weights should be used across all levels. See details.
intercept: Should an intercept be included in the ensemble model? Default is FALSE.
nonneg: Should the ensemble weights be constrained to be nonnegative? Default is TRUE.
unit_sum: Should the ensemble weights be constrained to sum to 1? Default is TRUE.
noncross: Should noncrossing constraints be enforced? Default is TRUE. Note: this option only matters when there is more than group of ensemble weights, as determined by tau_groups. See details.
q0: Array of points used to define the noncrossing constraints. Must have dimension (number of points) x (number of ensemble components) x (number of quantile levels). Default is NULL, which means that we consider noncrossing constraints at the training points qarr.
lp_solver: One of "glpk" or "gurobi", indicating which LP solver to use. If possible, "gurobi" should be used because it is much faster and more stable; default is "glpk"; however, because it is open-source.
time_limit: This sets the maximum amount of time (in seconds) to allow Gurobi or GLPK to solve any single quantile generalized lasso problem (for a single tau and lambda value). Default is NULL, which means unlimited time.
params: List of control parameters to pass to Gurobi or GLPK. Default is list() which means no additional parameters are passed. For example: with Gurobi, we can use list(Threads=4) to specify that Gurobi should use 4 threads when available. (Note that if a time limit is specified through this params list, then its value will be overriden by the last argument time_limit, assuming the latter is not NULL.)
verbose: Should progress be printed out to the console? Default is FALSE.

Value

A list with the following components:

alpha: Vector or matrix of ensemble weights. If tau_groups has only one unique label, then this is a vector of length = (number of ensemble components); otherwise, it is a matrix, of dimension (number of ensemble components) x (number of quantile levels)
tau: Vector of quantile levels used
weights,tau_groups,...,params: Values of these other arguments used in the function call

Details

This function solves the following quantile ensemble optimization problem, over quantile levels $\tau_k, k=1,\ldots,r$: $$\mathop{\mathrm{minimize}}_{\alpha_j, j=1,\ldots,p} \; \sum_{k=1}^r \sum_{i=1}^n w_i \psi_{\tau_k} \bigg(y_i - \sum_{j=1}^p \alpha_j q_{ijk} \bigg)$$ $$\mathrm{subject \; to} \;\; \sum_{j=1}^p \alpha_j = 1, \; \alpha_j \geq 0, \; j=1,\ldots,p$$ for a response vector $y$ and quantile array $q$, where $q_{ijk}$ is an estimate of the quantile of $y_i$ at the level $\tau_k$, from ensemble component member $j$. Here $\psi_\tau(v) = \max\{\tau v, (\tau-1) v\}$ is the "pinball" or "tilted $\ell_1$" loss. A more advanced version allows us to estimate a separate ensemble weight $\alpha_{jk}$ per component method $j$, per quantile level $k$: $$\mathop{\mathrm{minimize}}{\alpha_{jk}, j=1,\ldots,p, k=1,\ldots,r} \; \sum_{k=1}^r \sum_{i=1}^n w_i \psi_{\tau_k} \bigg(y_i - \sum_{j=1}^p \alpha_{jk} q_{ijk} \bigg)$$ $$\mathrm{subject \; to} \;\; \sum_{j=1}^p \alpha_{jk} = 1, \; k=1,\ldots,r, \; \alpha_{jk} \geq 0, \; j=1,\ldots,p, \; k=1,\ldots,r$$ As a form of regularization, we can additionally incorporate noncrossing constraints into the above optimization, which take the form: $$\alpha_{\bullet,k}^T q \leq \alpha_{\bullet,k+1}^T q, \; k=1,\ldots,r-1, \; q \in \mathcal{Q}$$ where the quantile levels $\tau_k, k=1,\ldots,r$ are assumed to be in increasing order, and $\mathcal{Q}$ is a collection of points over which to enforce the noncrossing constraints. Finally, somewhere in between these two extremes is to allow one ensemble weight per component member $j$, per quantile group $g$. This can be interpreted as a set of further constraints which enforce equality between $\alpha_{jk}$ and $\alpha_{j\ell}$, for all $k,\ell$ that are in the same group $g$.