Compute quantile lasso solutions.
quantile_lasso(
x,
y,
tau,
lambda,
weights = NULL,
no_pen_vars = c(),
intercept = TRUE,
standardize = TRUE,
lb = -Inf,
ub = Inf,
noncross = FALSE,
x0 = NULL,
lp_solver = c("glpk", "gurobi"),
time_limit = NULL,
warm_starts = TRUE,
params = list(),
transform = NULL,
inv_trans = NULL,
jitter = NULL,
verbose = FALSE
)
Matrix of predictors. If sparse, then passing it an appropriate
sparse Matrix
class can greatly help optimization.
Vector of responses.
Vectors of quantile levels and tuning parameter values. If
these are not of the same length, the shorter of the two is recycled so
that they become the same length. Then, for each i
, we solve a
separate quantile lasso problem at quantile level tau[i]
and tuning
parameter value lambda[i]
. The most common use cases are: specifying
one tau value and a sequence of lambda values; or specifying a sequence of
tau values and one lambda value.
Vector of observation weights (to be used in the loss function). Default is NULL, which is interpreted as a weight of 1 for each observation.
Indices of the variables that should be excluded from the
lasso penalty. Default is c()
, which means that no variables are to
be excluded.
A list with the following components:
Matrix of lasso coefficients, of dimension = (number of
features + 1) x (number of quantile levels) assuming intercept=TRUE
,
else (number of features) x (number of quantile levels). Note: these
coefficients will always be on the appropriate scale; they are always on
the scale of original features, even if standardize=TRUE
Vector of status flags returned by Gurobi's or GLPK's LP solver, of length = (number of quantile levels)
Vectors of tau and lambda values used
Values of these other arguments used in the function call
This function solves the quantile lasso problem, for each pair of quantile level \(\tau\) and tuning parameter \(\lambda\): $$\mathop{\mathrm{minimize}}_{\beta_0,\beta} \; \sum_{i=1}^n w_i \psi_\tau(y_i-\beta_0-x_i^T\beta) + \lambda \|\beta\|_1$$ for a response vector \(y\) with components \(y_i\), and predictor matrix \(X\) with rows \(x_i\). Here \(\psi_\tau(v) = \max\{\tau v, (\tau-1) v\}\) is the "pinball" or "tilted \(\ell_1\)" loss. When noncrossing constraints are applied, we instead solve one big joint optimization, over all quantile levels and tuning parameter values: $$\mathop{\mathrm{minimize}}_{\beta_{0k}, \beta_k, k=1,\ldots,r} \; \sum_{k=1}^r \bigg(\sum_{i=1}^n w_i \psi_{\tau_k}(y_i-\beta_{0k}- x_i^T\beta_k) + \lambda_k \|\beta_k\|_1\bigg)$$ $$\mathrm{subject \; to} \;\; \beta_{0k}+x^T\beta_k \leq \beta_{0,k+1}+x^T\beta_{k+1} \;\; k=1,\ldots,r-1, \; x \in \mathcal{X}$$ where the quantile levels \(\tau_j, j=1,\ldots,k\) are assumed to be in increasing order, and \(\mathcal{X}\) is a collection of points over which to enforce the noncrossing constraints.
Either problem is readily converted into a linear program (LP), and solved using either Gurobi (which is free for academic use, and generally fast) or GLPK (which free for everyone, but slower).
All arguments not described above are as in the quantile_genlasso
function. The associated coef
and predict
functions are just
those for the quantile_genlasso
class.