Quantile generalized lasso — quantile

Compute quantile generalized lasso solutions.

quantile_genlasso(
  x,
  y,
  d,
  tau,
  lambda,
  weights = NULL,
  intercept = TRUE,
  standardize = TRUE,
  lb = -Inf,
  ub = Inf,
  noncross = FALSE,
  x0 = NULL,
  lp_solver = c("glpk", "gurobi"),
  time_limit = NULL,
  warm_starts = TRUE,
  params = list(),
  transform = NULL,
  inv_trans = NULL,
  jitter = NULL,
  verbose = FALSE
)

Arguments

x: Matrix of predictors. If sparse, then passing it an appropriate sparse Matrix class can greatly help optimization.
y: Vector of responses.
d: Matrix defining the generalized lasso penalty; see details. If sparse, then passing it an appropriate sparse Matrix class can greatly help optimization. A convenience function get_diff_mat for constructing trend filtering penalties is provided.
tau, lambda: Vectors of quantile levels and tuning parameter values. If these are not of the same length, the shorter of the two is recycled so that they become the same length. Then, for each i, we solve a separate quantile generalized lasso problem at quantile level tau[i] and tuning parameter value lambda[i]. The most common use cases are: specifying one tau value and a sequence of lambda values; or specifying a sequence of tau values and one lambda value.
weights: Vector of observation weights (to be used in the loss function). Default is NULL, which is interpreted as a weight of 1 for each observation.
intercept: Should an intercept be included in the regression model? Default is TRUE.
standardize: Should the predictors be standardized (to have unit variance) before fitting? Default is TRUE.
lb, ub: Lower and upper bounds, respectively, to place as constraints on the coefficients in the optimization problem. These can be constants (to place the same bound on each coefficient) or vectors of length equal to the number of predictors (to place a potentially different bound on each coefficient). Default is code-Inf and Inf for lb and ub, respectively, which effectively means no constraints are used. Two important notes: when intercept is TRUE, the constraints are *not* placed on the intercept; and when standardize is TRUE, the constraints are placed on the *standardized scale* (they are used in the optimization problem whose predictors have been standardized).
noncross: Should noncrossing constraints be applied? These force the estimated quantiles to be properly ordered across all quantile levels being considered. The default is FALSE. If TRUE, then noncrossing constraints are applied to the estimated quantiles at all points specified by the next argument x0. Note: this option only makes sense if the values in the tau vector are distinct, and sorted in increasing order.
x0: Matrix of points used to define the noncrossing constraints. Default is NULL, which means that we consider noncrossing constraints at the training points x.
lp_solver: One of "glpk" or "gurobi", indicating which LP solver to use. If possible, "gurobi" should be used because it is much faster and more stable; default is "glpk"; however, because it is open-source.
time_limit: This sets the maximum amount of time (in seconds) to allow Gurobi or GLPK to solve any single quantile generalized lasso problem (for a single tau and lambda value). Default is NULL, which means unlimited time.
warm_starts: Should warm starts be used in the LP solver (from one LP solve to the next)? Only supported for Gurobi.
params: List of control parameters to pass to Gurobi or GLPK. Default is list() which means no additional parameters are passed. For example: with Gurobi, we can use list(Threads=4) to specify that Gurobi should use 4 threads when available. (Note that if a time limit is specified through this params list, then its value will be overriden by the last argument time_limit, assuming the latter is not NULL.)
transform, inv_trans: The first is a function to transform y before solving the quantile generalized lasso; the second is the corresponding inverse transform. For example: for count data, we might want to model log(1+y) (which would be the transform, and the inverse transform would be exp(x)-1). Both transform and inv_trans should be vectorized. Convenience functions log_pad and exp_pad are provided (these are inverses), as well as logit_pad and sigmd_pad (these are inverses).
jitter: Function for applying random jitter to y, which might help optimization. For example: for count data, there can be lots of ties (with or without transformation of y), which can make optimization more difficult. The function jitter should take an integer n and return n random draws. A convenience function unif_jitter is provided.
verbose: Should progress be printed out to the console? Default is FALSE.

Value

A list with the following components:

beta: Matrix of generalized lasso coefficients, of dimension = (number of features + 1) x (number of quantile levels) assuming intercept=TRUE, else (number of features) x (number of quantile levels). Note

these coefficients will always be on the appropriate scale; they are always on the scale of original features, even if standardize=TRUE

status: Vector of status flags returned by Gurobi's or GLPK's LP solver, of length = (number of quantile levels)
tau,lambda: Vectors of tau and lambda values used
weights,intercept,...,jitter: Values of these other arguments used in the function call

Details

This function solves the quantile generalized lasso problem, for each pair of quantile level $\tau$ and tuning parameter $\lambda$: $$\mathop{\mathrm{minimize}}_{\beta_0,\beta} \; \sum_{i=1}^n w_i \psi_\tau(y_i-\beta_0-x_i^T\beta) + \lambda \|D\beta\|_1$$ for a response vector $y$ with components $y_i$, predictor matrix $X$ with rows $x_i$, and penalty matrix $D$. Here $\psi_\tau(v) = \max\{\tau v, (\tau-1) v\}$ is the "pinball" or "tilted $\ell_1$" loss. When noncrossing constraints are applied, we instead solve one big joint optimization, over all quantile levels and tuning parameter values: $$\mathop{\mathrm{minimize}}_{\beta_{0k}, \beta_k, k=1,\ldots,r} \; \sum_{k=1}^r \bigg(\sum_{i=1}^n w_i \psi_{\tau_k}(y_i-\beta_{0k}- x_i^T\beta_k) + \lambda_k \|D\beta_k\|_1\bigg)$$ $$\mathrm{subject \; to} \;\; \beta_{0k}+x^T\beta_k \leq \beta_{0,k+1}+x^T\beta_{k+1} \;\; k=1,\ldots,r-1, \; x \in \mathcal{X}$$ where the quantile levels $\tau_k, k=1,\ldots,r$ are assumed to be in increasing order, and $\mathcal{X}$ is a collection of points over which to enforce the noncrossing constraints.

Either problem is readily converted into a linear program (LP), and solved using either Gurobi (which is free for academic use, and generally fast) or GLPK (which free for everyone, but slower).

Author

Ryan Tibshirani