Compute quantile generalized lasso solutions.
quantile_genlasso(
x,
y,
d,
tau,
lambda,
weights = NULL,
intercept = TRUE,
standardize = TRUE,
lb = -Inf,
ub = Inf,
noncross = FALSE,
x0 = NULL,
lp_solver = c("glpk", "gurobi"),
time_limit = NULL,
warm_starts = TRUE,
params = list(),
transform = NULL,
inv_trans = NULL,
jitter = NULL,
verbose = FALSE
)
Matrix of predictors. If sparse, then passing it an appropriate
sparse Matrix
class can greatly help optimization.
Vector of responses.
Matrix defining the generalized lasso penalty; see details. If
sparse, then passing it an appropriate sparse Matrix
class can
greatly help optimization. A convenience function get_diff_mat
for
constructing trend filtering penalties is provided.
Vectors of quantile levels and tuning parameter values. If
these are not of the same length, the shorter of the two is recycled so
that they become the same length. Then, for each i
, we solve a
separate quantile generalized lasso problem at quantile level tau[i]
and tuning parameter value lambda[i]
. The most common use cases are:
specifying one tau value and a sequence of lambda values; or specifying a
sequence of tau values and one lambda value.
Vector of observation weights (to be used in the loss function). Default is NULL, which is interpreted as a weight of 1 for each observation.
Should an intercept be included in the regression model? Default is TRUE.
Should the predictors be standardized (to have unit variance) before fitting? Default is TRUE.
Lower and upper bounds, respectively, to place as constraints on
the coefficients in the optimization problem. These can be constants (to
place the same bound on each coefficient) or vectors of length equal to the
number of predictors (to place a potentially different bound on each
coefficient). Default is code-Inf and Inf
for lb
and
ub
, respectively, which effectively means no constraints are used.
Two important notes: when intercept
is TRUE, the constraints are
*not* placed on the intercept; and when standardize
is TRUE, the
constraints are placed on the *standardized scale* (they are used in the
optimization problem whose predictors have been standardized).
Should noncrossing constraints be applied? These force the
estimated quantiles to be properly ordered across all quantile levels being
considered. The default is FALSE. If TRUE, then noncrossing constraints are
applied to the estimated quantiles at all points specified by the next
argument x0
. Note: this option only makes sense if the values in the
tau
vector are distinct, and sorted in increasing order.
Matrix of points used to define the noncrossing
constraints. Default is NULL, which means that we consider noncrossing
constraints at the training points x
.
One of "glpk" or "gurobi", indicating which LP solver to use. If possible, "gurobi" should be used because it is much faster and more stable; default is "glpk"; however, because it is open-source.
This sets the maximum amount of time (in seconds) to allow Gurobi or GLPK to solve any single quantile generalized lasso problem (for a single tau and lambda value). Default is NULL, which means unlimited time.
Should warm starts be used in the LP solver (from one LP solve to the next)? Only supported for Gurobi.
List of control parameters to pass to Gurobi or GLPK. Default
is list()
which means no additional parameters are passed. For
example: with Gurobi, we can use list(Threads=4)
to specify that
Gurobi should use 4 threads when available. (Note that if a time limit is
specified through this params
list, then its value will be overriden
by the last argument time_limit
, assuming the latter is not NULL.)
The first is a function to transform y before
solving the quantile generalized lasso; the second is the corresponding
inverse transform. For example: for count data, we might want to model
log(1+y) (which would be the transform, and the inverse transform would be
exp(x)-1). Both transform
and inv_trans
should be
vectorized. Convenience functions log_pad
and exp_pad
are
provided (these are inverses), as well as logit_pad
and
sigmd_pad
(these are inverses).
Function for applying random jitter to y, which might help
optimization. For example: for count data, there can be lots of ties (with
or without transformation of y), which can make optimization more
difficult. The function jitter
should take an integer n and return n
random draws. A convenience function unif_jitter
is provided.
Should progress be printed out to the console? Default is FALSE.
A list with the following components:
Matrix of generalized lasso coefficients, of dimension =
(number of features + 1) x (number of quantile levels) assuming
intercept=TRUE
, else (number of features) x (number of quantile
levels). Note
these coefficients will always be on the appropriate scale;
they are always on the scale of original features, even if
standardize=TRUE
Vector of status flags returned by Gurobi's or GLPK's LP solver, of length = (number of quantile levels)
Vectors of tau and lambda values used
Values of these other arguments used in the function call
This function solves the quantile generalized lasso problem, for each pair of quantile level \(\tau\) and tuning parameter \(\lambda\): $$\mathop{\mathrm{minimize}}_{\beta_0,\beta} \; \sum_{i=1}^n w_i \psi_\tau(y_i-\beta_0-x_i^T\beta) + \lambda \|D\beta\|_1$$ for a response vector \(y\) with components \(y_i\), predictor matrix \(X\) with rows \(x_i\), and penalty matrix \(D\). Here \(\psi_\tau(v) = \max\{\tau v, (\tau-1) v\}\) is the "pinball" or "tilted \(\ell_1\)" loss. When noncrossing constraints are applied, we instead solve one big joint optimization, over all quantile levels and tuning parameter values: $$\mathop{\mathrm{minimize}}_{\beta_{0k}, \beta_k, k=1,\ldots,r} \; \sum_{k=1}^r \bigg(\sum_{i=1}^n w_i \psi_{\tau_k}(y_i-\beta_{0k}- x_i^T\beta_k) + \lambda_k \|D\beta_k\|_1\bigg)$$ $$\mathrm{subject \; to} \;\; \beta_{0k}+x^T\beta_k \leq \beta_{0,k+1}+x^T\beta_{k+1} \;\; k=1,\ldots,r-1, \; x \in \mathcal{X}$$ where the quantile levels \(\tau_k, k=1,\ldots,r\) are assumed to be in increasing order, and \(\mathcal{X}\) is a collection of points over which to enforce the noncrossing constraints.
Either problem is readily converted into a linear program (LP), and solved using either Gurobi (which is free for academic use, and generally fast) or GLPK (which free for everyone, but slower).