\[ \newcommand{\argmin}{\mathop{\mathrm{argmin}}} \newcommand{\argmax}{\mathop{\mathrm{argmax}}} \newcommand{\minimize}{\mathop{\mathrm{minimize}}} \newcommand{\st}{\mathop{\mathrm{subject\,\,to}}} \]

Problem setup

Consider the problem \[ \minimize_{\beta_0,\beta} \; \sum_{i=1}^n \psi_\tau(y_i - \beta_0 - x_i^T \beta) + \lambda \|\beta\|_1 \] where \[ \psi_\tau(v) = \max\{\tau v, (\tau-1) v)\}, \] often called the “pinball” or “tilted \(\ell_1\)” loss, for a quantile level \(\tau \in (0,1)\). With multiple quantile levels \(\tau_k\), \(k=1,\ldots,r\), we can simply solve the quantile lasso problem separately for each \(\tau_k\).

Cross-validation

Suppose we have multiple quantile levels \(\tau_k\), \(k=1,\ldots,r\) of interest, and we allow one tuning parameter per quantile level \(\lambda_k\), \(k=1,\ldots,r\). We seek to minimize the cross-validation (CV) error over the choices of tuning parameters, \[ \sum_{k=1}^r \sum_{i=1}^n \psi_\tau\Big(y_i - \hat\beta_0(D_{-i}; \tau_k, \lambda_k) - x_i^T \hat\beta(D_{-i}; \tau_k, \lambda_k)\Big), \] where \(\hat\beta_0(D; \tau, \lambda)\), \(\hat\beta(D; \tau, \lambda)\) denotes the quantile lasso solution fit on a data set \(D\), at quantile level \(\tau\), and tuning parameter value \(\lambda\). Above, for each \(i=1,\ldots,n\), we use \(D_{-i}\) to denote the CV training fold used for point \(i\). We can do this just by separately optimizing, for each \(k=1,\ldots,r\), the CV error \[ \sum_{i=1}^n \psi_\tau\Big(y_i - \hat\beta_0(D_{-i}; \tau_k, \lambda_k) - x_i^T \hat\beta(D_{-i}; \tau_k, \lambda_k)\Big), \] over \(\lambda_k\). That is, this flexibility—allowing each quantile level its own tuning parameter value—is both statistically and computationlly favorable.

Gaussian example

We show a simple example of CV with Gaussian regression data. We also show how to extrapolate to a new set of quantiles, and how to refit at a new set of quantiles.

library(quantgen)

set.seed(33)
n = 500
p = 50
x = matrix(rnorm(n*p), n, p)
mu = function(x) x[1] + x[2]
y = apply(x, 1, mu) + rnorm(n)

# Run CV, over just a few quantile levels 
tau = c(0.1, 0.3, 0.5, 0.7, 0.9)
cv_obj = cv_quantile_lasso(x, y, tau=tau, nlambda=30, nfolds=5, lp_solver="gurobi", verbose=TRUE, sort=TRUE)

## CV fold 1 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 2 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 3 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 4 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 5 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## Computing CV errors and optimum lambdas ...
## Refitting on full training set with optimum lambdas ...
## Problems solved (of 5): 1 ... 2 ... 3 ... 4 ... 5 ...

plot(cv_obj)

# Refit at new quantile levels
tau_new = c(0.01, 0.025, seq(0.05, 0.95, by=0.05), 0.975, 0.99)
new_obj = refit_quantile_lasso(cv_obj, x, y, tau_new, lp_solver="gurobi", verbose=TRUE)

## Problems solved (of 23): 5 ... 10 ... 15 ... 20 ...

# Predicted and extrapolated quantiles at a few values of x
par(mfrow=c(1,3))
for (i in 1:9) {
  x0 = matrix(rnorm(p), nrow=1)
  qtrue = qnorm(tau_new, mu(x0))
  qpred1 = predict(cv_obj, x0, sort=TRUE)
  qextr1 = quantile_extrapolate(tau, qpred1, tau_new, qfun_left=qnorm, qfun_right=qnorm)
  qpred2 = predict(new_obj, x0, sort=TRUE)
  plot(tau_new, qtrue, type="o", ylim=range(qtrue, qextr1, qpred2, na.rm=TRUE), ylab="Quantile")
  #lines(tau_new, qextr1, col=2, pch=20, type="o")
  points(tau, qpred1, col=4, cex=1.5, lwd=2)
  lines(tau_new, qpred2, col=3, pch=20, type="o")
  legend("topleft", legend=c("True", "Predicted", "Extrapolated", "Predicted (refit)"), 
         col=c(1,4,2,3), pch=c(21,21,20,20))
}

Poisson example

We show another simple example of CV now with Poisson regression data. Through this, we can also demonstrate how to use built-in transform (and inverse transform) functionality. And again we show how to extrapolate to a new set of quantiles, and how to refit at a new set of quantiles.

n = 500
p = 50
x = matrix(rnorm(n*p), n, p)
mu = function(x) x[1] + x[2]
y = rpois(n, exp(apply(x, 1, mu)))

# Run CV, over just a few quantile levels 
tau = c(0.1, 0.3, 0.5, 0.7, 0.9)
cv_obj1 = cv_quantile_lasso(x, y, tau=tau, nlambda=30, nfolds=5, lp_solver="gurobi", verbose=TRUE, sort=TRUE)

## CV fold 1 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 2 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 3 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 4 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 5 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## Computing CV errors and optimum lambdas ...
## Refitting on full training set with optimum lambdas ...
## Problems solved (of 5): 1 ... 2 ... 3 ... 4 ... 5 ...

cv_obj2 = cv_quantile_lasso(x, y, tau=tau, nlambda=30, nfolds=5, lp_solver="gurobi", verbose=TRUE, sort=TRUE,
                            transform=log_pad(), inv_trans=exp_pad())

## CV fold 1 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 2 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 3 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 4 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## CV fold 5 ...
## Problems solved (of 150): 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40 ... 45 ... 50 ... 55 ... 60 ... 65 ... 70 ... 75 ... 80 ... 85 ... 90 ... 95 ... 100 ... 105 ... 110 ... 115 ... 120 ... 125 ... 130 ... 135 ... 140 ... 145 ... 150 ... 
## Computing CV errors and optimum lambdas ...
## Refitting on full training set with optimum lambdas ...
## Problems solved (of 5): 1 ... 2 ... 3 ... 4 ... 5 ...

plot(cv_obj1)

plot(cv_obj2)

# Refit at new quantile levels
tau_new = c(0.01, 0.025, seq(0.05, 0.95, by=0.05), 0.975, 0.99)
new_obj1 = refit_quantile_lasso(cv_obj1, x, y, tau_new, lp_solver="gurobi", verbose=TRUE)

## Problems solved (of 23): 5 ... 10 ... 15 ... 20 ...

new_obj2 = refit_quantile_lasso(cv_obj2, x, y, tau_new, lp_solver="gurobi", verbose=TRUE)

## Problems solved (of 23): 5 ... 10 ... 15 ... 20 ...

# Predicted and extrapolated quantiles at a few values of x
par(mfrow=c(1,3))
for (i in 1:9) {
  x0 = matrix(rnorm(p), nrow=1)
  qtrue = qpois(tau_new, exp(mu(x0)))
  qpred1 = predict(cv_obj1, x0, sort=TRUE, nonneg=TRUE, round=TRUE)
  qextr1 = quantile_extrapolate(tau, qpred1, tau_new, qfun_left=qpois, qfun_right=qpois, nonneg=TRUE, round=TRUE)
  qpred2 = predict(cv_obj2, x0, sort=TRUE, nonneg=TRUE, round=TRUE)
  qextr2 = quantile_extrapolate(tau, qpred2, tau_new, qfun_left=qpois, qfun_right=qpois, nonneg=TRUE, round=TRUE)
  plot(tau_new, qtrue, type="o", ylim=range(qtrue, qextr1, qextr2, na.rm=TRUE), ylab="Quantile")
  lines(tau_new, qextr1, col=2, pch=20, type="o")
  points(tau, qpred1, col=4, cex=1.5, lwd=2)
  lines(tau_new, qextr2, col=3, pch=20, type="o")
  points(tau, qpred2, col=5, cex=1.5, lwd=2)
  legend("topleft", legend=c("True", "Predicted", "Extrapolated", "Predicted (log)", "Extrapolated (log)"), 
         col=c(1,4,2,5,3), pch=c(21,21,20,21,20))
}

# Refitted versions 
par(mfrow=c(1,3))
for (i in 1:9) {
  x0 = matrix(rnorm(p), nrow=1)
  qtrue = qpois(tau_new, exp(mu(x0)))
  qpred3 = predict(new_obj1, x0, sort=TRUE, nonneg=TRUE, round=TRUE)
  qpred4 = predict(new_obj2, x0, sort=TRUE, nonneg=TRUE, round=TRUE)
  plot(tau_new, qtrue, type="o", ylim=range(qtrue, qpred3, qpred4, na.rm=TRUE), ylab="Quantile")
  lines(tau_new, qpred3, col=3, pch=20, type="o")
  lines(tau_new, qpred4, col=6, pch=20, type="o")
  legend("topleft", legend=c("True", "Predicted (refit)", "Predicted (log, refit)"), col=c(1,3,6), pch=c(21,20,20))
}

Noncrossing constraints

We refit the last CV object at new quantile levels, and additionally enforce noncrossing constraints. (Note that these noncrossing constraints can only be applied when the values in the tau vector being passed to quantile_genlasso() are distinct, and sorted in increasing order.)

n0 = 9
x0 = matrix(rnorm(n0*p), n0, p)
new_obj3 = refit_quantile_lasso(cv_obj2, x, y, tau_new, noncross=TRUE, x0=rbind(x,x0), lp_solver="gurobi", verbose=TRUE)

## Gurobi Optimizer version 9.0.3 build v9.0.3rc0 (mac64)
## Optimize a model with 36498 rows, 13823 columns and 2342796 nonzeros
## Model fingerprint: 0xb85df12d
## Coefficient statistics:
##   Matrix range     [2e-07, 5e+00]
##   Objective range  [1e+00, 1e+02]
##   Bounds range     [0e+00, 0e+00]
##   RHS range        [7e-03, 4e+00]
## 
## Concurrent LP optimizer: dual simplex and barrier
## Showing barrier log only...
## 
## Presolve time: 0.90s
## Presolved: 13823 rows, 36498 columns, 2342796 nonzeros
## 
## Ordering time: 0.02s
## 
## Barrier statistics:
##  AA' NZ     : 6.742e+05
##  Factor NZ  : 8.242e+05 (roughly 27 MBytes of memory)
##  Factor Ops : 5.457e+07 (less than 1 second per iteration)
##  Threads    : 3
## 
##                   Objective                Residual
## Iter       Primal          Dual         Primal    Dual     Compl     Time
##    0   5.11386905e-11  1.78987380e-12  8.86e+03 7.55e-01  2.29e+01     1s
##    1   2.50619529e+02  2.65041033e+04  2.59e+03 1.45e-01  6.32e+00     1s
##    2   2.93521613e+02  2.49830558e+04  2.15e+02 3.70e-02  9.83e-01     2s
##    3   2.62550077e+02  6.35818607e+03  1.30e+01 5.97e-03  1.39e-01     2s
##    4   5.69897202e+02  4.08676364e+03  1.37e+00 1.91e-03  7.43e-02     2s
##    5   7.66511816e+02  3.20090821e+03  7.79e-01 9.78e-04  5.12e-02     2s
##    6   1.03374492e+03  2.80176317e+03  1.54e-01 6.93e-04  3.70e-02     2s
##    7   1.07745632e+03  2.35425452e+03  1.28e-01 4.87e-04  2.67e-02     2s
##    8   1.29462351e+03  1.89165357e+03  1.64e-02 2.11e-04  1.25e-02     2s
##    9   1.39295604e+03  1.69176980e+03  4.23e-03 1.70e-04  6.24e-03     2s
##   10   1.43846767e+03  1.57130530e+03  1.53e-03 4.60e-05  2.77e-03     2s
##   11   1.45740968e+03  1.53786442e+03  8.29e-04 2.80e-05  1.68e-03     2s
##   12   1.46998339e+03  1.51390834e+03  4.37e-04 1.40e-05  9.16e-04     2s
##   13   1.47920893e+03  1.50113629e+03  1.79e-04 6.73e-06  4.57e-04     2s
##   14   1.48460210e+03  1.49478326e+03  4.97e-05 3.49e-06  2.13e-04     2s
##   15   1.48602815e+03  1.49118745e+03  2.28e-05 1.75e-06  1.08e-04     2s
##   16   1.48650097e+03  1.48940933e+03  1.45e-05 8.94e-07  6.07e-05     2s
##   17   1.48691744e+03  1.48839232e+03  6.82e-06 4.11e-07  3.07e-05     2s
##   18   1.48727740e+03  1.48777890e+03  1.28e-06 1.46e-07  1.05e-05     2s
##   19   1.48735425e+03  1.48747691e+03  1.49e-07 3.45e-08  2.56e-06     2s
##   20   1.48736665e+03  1.48741483e+03  4.07e-08 1.30e-08  1.01e-06     2s
##   21   1.48736969e+03  1.48739075e+03  1.65e-08 5.15e-09  4.39e-07     2s
##   22   1.48737212e+03  1.48737748e+03  4.16e-09 1.12e-09  1.12e-07     2s
##   23   1.48737283e+03  1.48737461e+03  1.42e-09 3.16e-10  3.70e-08     2s
##   24   1.48737303e+03  1.48737331e+03  7.19e-10 1.87e-11  5.88e-09     2s
##   25   1.48737321e+03  1.48737324e+03  1.91e-09 4.09e-12  6.22e-10     2s
##   26   1.48737321e+03  1.48737321e+03  1.53e-08 8.79e-14  3.78e-11     2s
## 
## Barrier solved model in 26 iterations and 2.40 seconds
## Optimal objective 1.48737321e+03
## 
## Crossover log...
## 
##        0 DPushes remaining with DInf 3.4983776e+02                 3s
## 
##     2826 PPushes remaining with PInf 4.1810695e-05                 3s
##        0 PPushes remaining with PInf 0.0000000e+00                 3s
## 
##   Push phase complete: Pinf 0.0000000e+00, Dinf 1.2676966e-12      3s
## 
## Iteration    Objective       Primal Inf.    Dual Inf.      Time
##     2876    1.4873732e+03   0.000000e+00   0.000000e+00      3s
## 
## Solved with barrier
## Solved in 2876 iterations and 4.76 seconds
## Optimal objective  1.487373212e+03

qpred1 = predict(new_obj2, x0, sort=FALSE, nonneg=TRUE, round=TRUE)
qpred2 = predict(new_obj2, x0, sort=TRUE, nonneg=TRUE, round=TRUE) 
qpred3 = predict(new_obj3, x0, sort=FALSE, nonneg=TRUE, round=TRUE) # Should be sorted already

par(mfrow=c(1,3))
for (i in 1:9) {
  qtrue = qpois(tau_new, exp(mu(x0[i,])))
  plot(tau_new, qtrue, type="o", ylim=range(qtrue, qpred1[i,], qpred2[i,], qpred3[i,], na.rm=TRUE), ylab="Quantile")
  lines(tau_new, qpred1[i,], col=2, pch=20, type="o")
  lines(tau_new, qpred2[i,], col=3, pch=20, type="o")
  lines(tau_new, qpred3[i,], col=4, pch=20, type="o")
  legend("topleft", legend=c("True", "Predicted", "Predicted (sort)", "Predicted (noncross)"), col=1:4, pch=c(21,20,20,20))
}

Cross-Validation for Quantile Lasso

Ryan Tibshirani

June 7, 2020

Problem setup

Cross-validation

Gaussian example

Poisson example

Noncrossing constraints