학습을 위한 근위부 그라데이션 방법

학습에 대한 근위부 구배(앞쪽 뒤쪽으로 갈라짐) 방법은 정규화 페널티가 다를 수 없는 볼록 정규화 문제의 일반 등급에 대한 알고리즘을 연구하는 최적화 및 통계 학습 이론 연구 영역이다.그러한 예로는 양식의 $\ell _{1}$ 1 ${\$ {1} 정규화 $\ell _{1}$ (라소라고도 함)가 있다.

\min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \ w\ _{1},\quad {\text{ where }}x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .

근위부 그라데이션 방법은 통계학 학습 이론으로부터 정규화 문제를 특정 문제 적용에 맞춘 벌칙으로 풀 수 있는 일반적인 틀을 제공한다.^[1]^[2]이러한 맞춤형 벌칙은 (라소의 경우) 첨삭성이나 그룹 구조(집단 라소의 경우)와 같은 문제 해결의 특정 구조를 유도하는 데 도움이 될 수 있다.

라소 정규화

$\ell _{1}$ 손실에 대한 정규화된 경험적 위험 최소화 문제와 problem 1 ${\$ 1} $정규화$ 페널티 $\ell _{1}$ :

\min \mathb {R}^{d}{\frac {1}{n1}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \{1}, {1},

$x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .$ 서 $x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .$ $x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .$ $x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .$ R $x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .$ y $x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .$ $x_{i}\in \mathbb {R} ^{d}{\text{ and }}y_{i}\in \mathbb {R} .$ ${\displaystyle x_{i}\$ in $\mathb$ {R $}$ ^{d $}{d$ }{\ $text{$ 및 }} $y_{i}\in \mathb {R}}}}}$ $\ell _{1}$ 1 $\ell _{1}$ {\displaystystylear $\$ line $\{$ 1} 정규화 $\ell _{1}$ 문제를 laso라고 부르기도 한다.^[5]이러한 $\ell _{1}$ ${\$ 정규화 $\ell _{1}$ 문제는 희박한 해결책을 유도하기 때문에 흥미롭다. 즉, 최소화 $w$ 에 대한 $w$ 솔루션 {\ $displaystyle w}$ 은(는) 0이 아닌 구성요소가 상대적으로 적다.라소는 비콘벡스 문제의 볼록한 이완이라고 볼 수 있다.

\min \mathb {R}^{d}{\frac {1}{n1}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\angle )^{2}+\lambda \{0}}}

여기서 $\|w\|_{0}$ $\|w\|_{0}$ $\|w\|_{0}$ w ${\$ \ $w\_{0}$ 는 vector $\|w\|_{0}$ 0 ${\displaystyle$ \ell_{0} "norm"을 $\ell _{0}$ 하며 $, 이$ 는 $\ell _{0}$ 벡터 w {\displaystyle $w}$ 의 0이 아닌 항목 수입니다 $w$ 희박한 솔루션은 결과의 해석성을 위한 학습 이론에 특히 관심이 많다. 희소수 해결책이 될 수 있다.요인^[5]

L₁ 근접 연산자를 위한 해결

단순성을 위해 $\lambda =1$ = $\lambda =1$ ${\displaystyle \lambda =1}.$ 문제를 해결하기 위해 주의를 제한한다 $\lambda =1$

\min \mathb {R}^{d}{\frac {1}{n1}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\angle )^{2}+\ w\{1},}

we consider our objective function in two parts: a convex, differentiable term $F(w)={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}$ and a convex function $R(w)=\ w\ _{1}$ . Note that ${\dis$ 플레이 $스타일$ R $}$ 은 $R$ (는) 엄격히 볼록한 것이 아니다.

$R(w)$ ( w ) $R(w)$ 에 대한 근접 연산자를 계산해 봅시다 $R(w)$ 먼저 근접 연산자 $\operatorname {prox} _{R}(x)$ R $\operatorname {prox} _{R}(x)$ ( $\operatorname {prox} _{R}(x)$ ) $\operatorname {prox} _{R}(x)$ 의 대체 특성을 다음과 같이 찾아 보십시오 $\operatorname {prox} _{R}(x)$ .

${\begin{aligned}u=\operatorname {prox} _{R}(x)\iff &0\in \partial \left(R(u)+{\frac {1}{2}}\ u-x\ _{2}^{2}\right)\\\iff &0\in \partial R(u)+u-x\\\iff &x-u\in \partial R(u).\end{정렬}}$

$R(w)=\|w\|_{1}$ ) $R(w)=\|w\|_{1}$ = $R(w)=\|w\|_{1}$ $R(w)=\|w\|_{1}$ { $R(w)=\|w\|_{1}$ 1 {\ $displaystyle R(w)=\w\{1}{1$ }의 경우 $\partial R(w)$ $\partial R(w)$ ) ${\displaystyle$ $\partial R(w$ $)}:$ $\partial R(w)$ $)$ 의 항목 $i$ ${\displaystystyle$ $i}$ 을 $i$ 정확하게 계산할 $\partial R(w)$ 수 $R(w)=\|w\|_{1}$ 있다.

\property w_{i} ={\\w_{case}1,&w_{i}<0\\\\\왼쪽[-1\오른쪽]&w_{i}=0.\end{case}}

Using the recharacterization of the proximity operator given above, for the choice of $R(w)=\ w\ _{1}$ and $\gamma >0$ we have that $\operatorname {prox} _{\gamma R}(x)$ is defined entrywise by

\left(\operatorname {prox} _{\gamma R}(x)\right)_{i}={\begin{cases}x_{i}-\gamma ,&x_{i}>\gamma \\0,& x_{i} \leq \gamma \\x_{i}+\gamma ,&x_{i}<-\gamma ,\end{cases}}

소프트 임계값 연산자 S $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ ) $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ = $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ 1 $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ 1 $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ $S_{\gamma }(x)=\operatorname {prox} _{\gamma \|\cdot \|_{1}}(x)$ {\ $displaystyle S_$ {\ $gamma$ }(x $)=\operatorname$ { $prox}$ _{\ $gamma \cdot$ \{1}( $x$ ^[1]^[6]

고정 점 반복 방식

마지막으로 라소 문제를 해결하기 위해 앞에서 설명한 고정 점 방정식을 고려한다.

x^{*}=\operatorname {prox} _{\gamma R}\왼쪽(x^{*}-\gamma \nabla F(x^{*})\오른쪽).

근접 연산자의 형태를 명시적으로 계산한 것을 감안하면, 표준 고정점 반복 절차를 정의할 수 있다. $w^{0}\in \mathbb {R} ^{d}$ , $\$ }\ $in$ \mathb {R} $^{d$ 에서 초기 $w^{0}\in \mathbb {R} ^{d}$ 를 수정하고 $k=1,2,\ldots$ = $k=1,2,\ldots$ , $k=1,2,\ldots$ , $k=1,2,\ldots$ … $k=1,2,\ldots$ 이(가) 정의하십시오 $k=1,2,\ldots$ .

w^{k+1}=S_{\gamma }\왼쪽(w^{k}-\gamma \nabla F\왼쪽(w^{k}\오른쪽)\오른쪽).

여기서 경험적 오류 용어 $F(w)$ ) $F(w)$ 과 $F(w)$ (와) 정규화 $R(w)$ R $R(w)$ ) $R(w)$ 사이의 유효 트레이드오프를 참고하십시오 $R(w)$ 이 고정점 방법은 객관적인 기능을 구성하는 두 가지 다른 볼록함수의 효과를 구배 강하 단계( $w^{k}-\gamma \nabla F\left(w^{k}\right)$ - $w^{k}-\gamma \nabla F\left(w^{k}\right)$ $w^{k}-\gamma \nabla F\left(w^{k}\right)$ k $){{k$ }-\ $displaystyle$ w $^-\gamma \nabla$ F $\left(w^{k}\right)$ 와 부드러운 임계값 설정 단계 $w^{k}-\gamma \nabla F\left(w^{k}\right)$ $S_{\gamma }$ ${\$ $}).$

이 고정된 포인트 체계의 수렴은^[1]^[6] 문헌에 잘 연구되어 있으며, 단계 크기 ${\displaystyle \gamma$ }와 $\gamma$ 손실 함수(여기서 취한 제곱 손실 등)의 적절한 선택에 따라 보장된다.가속화된 방법은 $F$ 에 네스테로프가 F $F$ 에 대한 일정한 규칙성 가정 하에서 수렴 속도를 향상시키는 방법으로 도입되었다 $F$ ^[7] 그러한 방법은 예년에 광범위하게 연구되어 왔다.^[8]일부 정규화 $용어$ R {\ $displaystyle R}$ 에 대해 근접 연산자를 명시적으로 계산할 수 없는 더 일반적인 학습 문제의 경우 $R$ 이러한 고정 포인트 체계는 구배와 근접 연산자 모두에 근사치를 사용하여 수행할 수 있다.^[4]^[9]

현실적 고려

지난 10년 동안 볼록 최적화 기법에서는 통계학 학습 이론에서 근위부 그라데이션 방법의 적용에 영향을 미친 수많은 발전이 있었다.여기서는 이러한 방법의 실제 알고리즘 성능을 크게 개선할 수 있는 몇 가지 중요한 주제를 조사한다.^[2]^[10]

적응형 스텝 크기

고정 점 반복 방식에서

w^{k+1}=\operatorname {prox} _{\gamma R}\left(w^{k}-\gamma \nabla F\left(w^{k}\right)\right),

일정한 $[\displaystyle \gamma }}$ 이(가) 아닌 $\gamma _{k}$ 가변 스텝 크기 $[\$ 을 허용할 수 있다 $\gamma$ 문헌 전반에 걸쳐 수많은 적응형 스텝 크기 체계가 제안되어 왔다.^[1]^[4]^[11]^[12]이러한 제도를^[2]^[13] 적용하면 고정점 수렴에 필요한 반복 횟수가 상당히 개선될 수 있음을 알 수 있다.

탄성망(혼합규범 정규화)

탄력적인 순정규격화는 순수 $\ell _{1}$ ${\$ } 정규화에 $\ell _{1}$ 대한 대안을 제공한다.라소(lasso, $\ell _{1}$ $\ell _{1}$ ${\$ 의 정규화 문제에는 벌칙어 $R(w)=\|w\|_{1}$ ( $R(w)=\|w\|_{1}$ w ) $R(w)=\|w\|_{1}$ = $R(w)=\|w\|_{1}$ w ( $R(w)=\|w\|_{1}$ ) = $R(w)=\|w\|_{1}$ 1 {\ $displaystyle$ R( $w)=\$ w $\ _{$ 1}가 포함되는데 $R(w)=\|w\|_{1}$ 이는 엄격히 볼록하지 않는다.따라서 $F$ $F$ 이 $\min _{w}F(w)+R(w),$ $\min _{w}F(w)+R(w),$ 경험적 손실 함수인 $F$ $\min _{w}F(w)+R(w),$ $\min _{w}F(w)+R(w),$ (w ) + R (w ) , {\ $displaystyle \min _{w}F(w)+R(w$ )에 대한 해결책은 고유할 필요가 없다.이는 $\ell _{2}$ $\ell _{2}$ 2 $\ell _{2}$ {\ $displaystyle \ell$ _ ${2$ } 정규화 벌칙과 같이 엄격하게 볼록한 용어를 추가로 포함하면 피하는 경우가 많다.예를 들어 문제를 고려할 수 있다.

\min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-\langle w,x_{i}\rangle )^{2}+\lambda \left((1-\mu )\ w\ _{1}+\mu \ w\ _{2}^{2}\right),

0<>로 여기가 어디고 y는 나는 R.{\displaystyle x_{나는}\in \mathbb{R}^{d}{\text{과}∈ Rd∈ x}y_{나는}\in\mathbb{R}.};μ ≤ 1{0<, \mu 1\leq\displaystyle} 페널티 용어 λ((1− μ)‖ w‖ 1+μ ‖ w‖ 22){\displaystyle \lambda \left((1-\mu)\ w\_{1}+\mu)w\ _{2}^{2}\right)}이 있다. 엄격한ly colfx, 따라서 최소화 문제는 이제 독특한 해결책을 받아들인다.충분히 작은 $\mu >0$ > $\mu >0$ $\mu >0$ 에 대해 추가 벌칙 용어 $\mu \|w\|_{2}^{2}$ w $\mu \|w\|_{2}^{2}$ μ w μ w $\mu \|w\|_{2}^{2}$ w μ w $\mu \|w\|_{2}^{2}$ $\mu \|w\|_{2}^{2}$ ${\$ \ $w\ _{2}^{2}^{$ 2}}가 전제조건으로 작용하며 $\mu \|w\|_{2}^{2}$ 용액의 첨가에 악영향을 미치지 않으면서 정합성을 실질적으로 개선할 수 있다는 것이 관찰되었다 $\mu >0$ ^[2]^[14]

그룹 구조 이용

근위부 그라데이션 방법은 통계학 학습 이론의 다양한 문제에 적용할 수 있는 일반적인 체계를 제공한다.학습의 어떤 문제들은 종종 선행으로 알려진 추가적인 구조를 가진 데이터를 포함할 수 있다.지난 몇 년 동안 다른 애플리케이션에 맞춤화된 방법을 제공하기 위해 그룹 구조에 대한 정보를 통합하는 새로운 발전이 있었다.여기서 우리는 몇 가지 그러한 방법을 조사한다.

그룹 라소

그룹 라소는 특징을 분리 블록으로 묶을 때 라소 방식을 일반화한 것이다.^[15]기능이 블록 $\{w_{1},\ldots ,w_{G}\}$ { $\{w_{1},\ldots ,w_{G}\}$ 1, $\{w_{1},\ldots ,w_{G}\}$ … , $\{w_{1},\ldots ,w_{G}\}$ w $\{w_{1},\ldots ,w_{G}\}$ $\{w_{1},\ldots ,w_{G}\}$ 로 그룹화된다고 가정합시다 $\{w_{1},\ldots ,w_{G}\}$ 여기서는 정규화 패널티로 간주한다.

R(w)=\sum _{g=1}^{G}\ w_{g}\ _{2}

$\ell _{2}$ 그룹에 대한 해당 형상 벡터에 대한 on $\ell _{2}$ 2 ${\$ 2}} $표준$ 의 합이다.위와 유사한 근접 연산자 분석을 사용하여 이 벌칙에 대한 근접 연산자를 계산할 수 있다.라소 페널티에서 각 개별 구성 요소에 소프트 임계값인 근접 연산자가 있는 경우, 그룹 라소의 근접 연산자는 각 그룹에 소프트 임계값이다. $w_{g}$ g ${\$ 그룹의 경우 $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ = $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ w $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ ) $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$ {\ $displaystyle \lambda \gamma \left(\sum _{g=1}^{G}\{2$ }\rig}\ $_{2}\rig})$ 의 근접 연산자가 제공됨 $\lambda \gamma \left(\sum _{g=1}^{G}\|w_{g}\|_{2}\right)$

{\widetilde {S}}_{\lambda \gamma }(w_{g})={\begin{cases}w_{g}-\lambda \gamma {\frac {w_{g}}{\ w_{g}\ _{2}}},&\ w_{g}\ _{2}>\lambda \gamma \\0,&\ w_{g}\ _{2}\leq \lambda \gamma \end{cases}}

여기서 $w_{g}$ $w_{g}$ ${\$ 는 $g$ $g$ th $g$ 그룹이다 $w_{g}$ .

라소와 대조적으로, 그룹 라소에 대한 근접 연산자의 파생은 모로 분해에 의존한다.여기서 그룹 라소 페널티 공의 근접 연산자는 이중 규범의 공에 투영하는 것이 된다.^[2]

기타그룹구조

형상이 분리 블록으로 그룹화되는 그룹 라소 문제와 대조적으로 그룹화된 형상이 중복되거나 중첩된 구조를 갖는 경우가 있을 수 있다.이러한 집단 라소의 일반화는 다양한 맥락에서 고려되어 왔다.^[16]^[17]^[18]^[19]겹치는 그룹의 경우 공통적인 접근방식은 중첩을 설명하기 위해 잠재 변수를 도입하는 잠복 그룹 라소라고 알려져 있다.^[20]^[21]내포된 그룹 구조는 계층 구조 예측에서 그리고 지시된 반복 그래프로 연구된다.^[18]

참고 항목

참조

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Combettes, Patrick L.; Wajs, Valérie R. (2005). "Signal Recovering by Proximal Forward-Backward Splitting". Multiscale Model. Simul. 4 (4): 1168–1200. doi:10.1137/050626090. S2CID 15064954.
^ ^a ^b ^c ^d ^e Mosci, S.; Rosasco, L.; Matteo, S.; Verri, A.; Villa, S. (2010). "Solving Structured Sparsity Regularization with Proximal Methods". Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. 6322: 418–433. doi:10.1007/978-3-642-15883-4_27. ISBN 978-3-642-15882-7.
^ ^a ^b Moreau, J.-J. (1962). "Fonctions convexes duales et points proximaux dans un espace hilbertien". Comptes Rendus de l'Académie des Sciences, Série A. 255: 2897–2899. MR 0144188. Zbl 0118.10502.
^ ^a ^b ^c Bauschke, H.H., and Combettes, P.L. (2011). Convex analysis and monotone operator theory in Hilbert spaces. Springer.
^ ^a ^b Tibshirani, R. (1996). "Regression shrinkage and selection via the lasso". J. R. Stat. Soc. Ser. B. 1. 58 (1): 267–288.
^ ^a ^b Daubechies, I.; Defrise, M.; De Mol, C. (2004). "An iterative thresholding algorithm for linear inverse problem with a sparsity constraint". Comm. Pure Appl. Math. 57 (11): 1413–1457. arXiv:math/0307152. doi:10.1002/cpa.20042. S2CID 1438417.
^ Nesterov, Yurii (1983). "A method of solving a convex programming problem with convergence rate $O(1/k^{2})$ ". Soviet Mathematics - Doklady. 27 (2): 372–376.
^ Nesterov, Yurii (2004). Introductory Lectures on Convex Optimization. Kluwer Academic Publisher.
^ Villa, S.; Salzo, S.; Baldassarre, L.; Verri, A. (2013). "Accelerated and inexact forward-backward algorithms". SIAM J. Optim. 23 (3): 1607–1633. CiteSeerX 10.1.1.416.3633. doi:10.1137/110844805.
^ Bach, F.; Jenatton, R.; Mairal, J.; Obozinski, Gl. (2011). "Optimization with sparsity-inducing penalties". Foundations and Trends in Machine Learning. 4 (1): 1–106. arXiv:1108.0775. Bibcode:2011arXiv1108.0775B. doi:10.1561/2200000015. S2CID 56356708.
^ Loris, I.; Bertero, M.; De Mol, C.; Zanella, R.; Zanni, L. (2009). "Accelerating gradient projection methods for $\ell _{1}$ -constrained signal recovery by steplength selection rules". Applied & Comp. Harmonic Analysis. 27 (2): 247–254. arXiv:0902.4424. doi:10.1016/j.acha.2009.02.003. S2CID 18093882.
^ Wright, S.J.; Nowak, R.D.; Figueiredo, M.A.T. (2009). "Sparse reconstruction by separable approximation". IEEE Trans. Image Process. 57 (7): 2479–2493. Bibcode:2009ITSP...57.2479W. CiteSeerX 10.1.1.115.9334. doi:10.1109/TSP.2009.2016892.
^ Loris, Ignace (2009). "On the performance of algorithms for the minimization of $\ell _{1}$ -penalized functionals". Inverse Problems. 25 (3): 035008. arXiv:0710.4082. Bibcode:2009InvPr..25c5008L. doi:10.1088/0266-5611/25/3/035008. S2CID 14213443.
^ De Mol, C.; De Vito, E.; Rosasco, L. (2009). "Elastic-net regularization in learning theory". J. Complexity. 25 (2): 201–230. arXiv:0807.3423. doi:10.1016/j.jco.2009.01.002. S2CID 7167292.
^ Yuan, M.; Lin, Y. (2006). "Model selection and estimation in regression with grouped variables". J. R. Stat. Soc. B. 68 (1): 49–67. doi:10.1111/j.1467-9868.2005.00532.x. S2CID 6162124.
^ Chen, X.; Lin, Q.; Kim, S.; Carbonell, J.G.; Xing, E.P. (2012). "Smoothing proximal gradient method for general structured sparse regression". Ann. Appl. Stat. 6 (2): 719–752. arXiv:1005.4717. doi:10.1214/11-AOAS514. S2CID 870800.
^ Mosci, S.; Villa, S.; Verri, A.; Rosasco, L. (2010). "A primal-dual algorithm for group sparse regularization with overlapping groups". NIPS. 23: 2604–2612.
^ ^a ^b Jenatton, R.; Audibert, J.-Y.; Bach, F. (2011). "Structured variable selection with sparsity-inducing norms". J. Mach. Learn. Res. 12: 2777–2824. arXiv:0904.3523. Bibcode:2009arXiv0904.3523J.
^ Zhao, P.; Rocha, G.; Yu, B. (2009). "The composite absolute penalties family for grouped and hierarchical variable selection". Ann. Stat. 37 (6A): 3468–3497. arXiv:0909.0411. Bibcode:2009arXiv0909.0411Z. doi:10.1214/07-AOS584. S2CID 9319285.
^ Obozinski, Guillaume; Jacob, Laurent; Vert, Jean-Philippe (2011). "Group Lasso with Overlaps: The Latent Group Lasso approach". arXiv:1110.0413 [stat.ML].
^ Villa, Silvia; Rosasco, Lorenzo; Mosci, Sofia; Verri, Alessandro (2012). "Proximal methods for the latent group lasso penalty". arXiv:1209.0368 [math.OC].

[combettes-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Combettes, Patrick L.; Wajs, Valérie R. (2005). "Signal Recovering by Proximal Forward-Backward Splitting". Multiscale Model. Simul. 4 (4): 1168–1200. doi:10.1137/050626090. S2CID 15064954.

[structSparse-2] Mosci, S.; Rosasco, L.; Matteo, S.; Verri, A.; Villa, S. (2010). "Solving Structured Sparsity Regularization with Proximal Methods". Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science. 6322: 418–433. doi:10.1007/978-3-642-15883-4_27. ISBN 978-3-642-15882-7.

[moreau-3] Moreau, J.-J. (1962). "Fonctions convexes duales et points proximaux dans un espace hilbertien". Comptes Rendus de l'Académie des Sciences, Série A. 255: 2897–2899. MR 0144188. Zbl 0118.10502.

[bauschke-4] Bauschke, H.H., and Combettes, P.L. (2011). Convex analysis and monotone operator theory in Hilbert spaces. Springer.

[tibshirani-5] Tibshirani, R. (1996). "Regression shrinkage and selection via the lasso". J. R. Stat. Soc. Ser. B. 1. 58 (1): 267–288.

[daubechies-6] Daubechies, I.; Defrise, M.; De Mol, C. (2004). "An iterative thresholding algorithm for linear inverse problem with a sparsity constraint". Comm. Pure Appl. Math. 57 (11): 1413–1457. arXiv:math/0307152. doi:10.1002/cpa.20042. S2CID 1438417.

[nesterov-7] Nesterov, Yurii (1983). "A method of solving a convex programming problem with convergence rate $O(1/k^{2})$ ". Soviet Mathematics - Doklady. 27 (2): 372–376.

[8] Nesterov, Yurii (2004). Introductory Lectures on Convex Optimization. Kluwer Academic Publisher.

[9] Villa, S.; Salzo, S.; Baldassarre, L.; Verri, A. (2013). "Accelerated and inexact forward-backward algorithms". SIAM J. Optim. 23 (3): 1607–1633. CiteSeerX 10.1.1.416.3633. doi:10.1137/110844805.

[bach-10] Bach, F.; Jenatton, R.; Mairal, J.; Obozinski, Gl. (2011). "Optimization with sparsity-inducing penalties". Foundations and Trends in Machine Learning. 4 (1): 1–106. arXiv:1108.0775. Bibcode:2011arXiv1108.0775B. doi:10.1561/2200000015. S2CID 56356708.

[11] Loris, I.; Bertero, M.; De Mol, C.; Zanella, R.; Zanni, L. (2009). "Accelerating gradient projection methods for $\ell _{1}$ -constrained signal recovery by steplength selection rules". Applied & Comp. Harmonic Analysis. 27 (2): 247–254. arXiv:0902.4424. doi:10.1016/j.acha.2009.02.003. S2CID 18093882.

[12] Wright, S.J.; Nowak, R.D.; Figueiredo, M.A.T. (2009). "Sparse reconstruction by separable approximation". IEEE Trans. Image Process. 57 (7): 2479–2493. Bibcode:2009ITSP...57.2479W. CiteSeerX 10.1.1.115.9334. doi:10.1109/TSP.2009.2016892.

[13] Loris, Ignace (2009). "On the performance of algorithms for the minimization of $\ell _{1}$ -penalized functionals". Inverse Problems. 25 (3): 035008. arXiv:0710.4082. Bibcode:2009InvPr..25c5008L. doi:10.1088/0266-5611/25/3/035008. S2CID 14213443.

[deMolElasticNet-14] De Mol, C.; De Vito, E.; Rosasco, L. (2009). "Elastic-net regularization in learning theory". J. Complexity. 25 (2): 201–230. arXiv:0807.3423. doi:10.1016/j.jco.2009.01.002. S2CID 7167292.

[groupLasso-15] Yuan, M.; Lin, Y. (2006). "Model selection and estimation in regression with grouped variables". J. R. Stat. Soc. B. 68 (1): 49–67. doi:10.1111/j.1467-9868.2005.00532.x. S2CID 6162124.

[16] Chen, X.; Lin, Q.; Kim, S.; Carbonell, J.G.; Xing, E.P. (2012). "Smoothing proximal gradient method for general structured sparse regression". Ann. Appl. Stat. 6 (2): 719–752. arXiv:1005.4717. doi:10.1214/11-AOAS514. S2CID 870800.

[17] Mosci, S.; Villa, S.; Verri, A.; Rosasco, L. (2010). "A primal-dual algorithm for group sparse regularization with overlapping groups". NIPS. 23: 2604–2612.

[nest-18] Jenatton, R.; Audibert, J.-Y.; Bach, F. (2011). "Structured variable selection with sparsity-inducing norms". J. Mach. Learn. Res. 12: 2777–2824. arXiv:0904.3523. Bibcode:2009arXiv0904.3523J.

[19] Zhao, P.; Rocha, G.; Yu, B. (2009). "The composite absolute penalties family for grouped and hierarchical variable selection". Ann. Stat. 37 (6A): 3468–3497. arXiv:0909.0411. Bibcode:2009arXiv0909.0411Z. doi:10.1214/07-AOS584. S2CID 9319285.

[20] Obozinski, Guillaume; Jacob, Laurent; Vert, Jean-Philippe (2011). "Group Lasso with Overlaps: The Latent Group Lasso approach". arXiv:1110.0413 [stat.ML].

[21] Villa, Silvia; Rosasco, Lorenzo; Mosci, Sofia; Verri, Alessandro (2012). "Proximal methods for the latent group lasso penalty". arXiv:1209.0368 [math.OC].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Search

학습을 위한 근위부 그라데이션 방법

네임스페이스

더

목차

관련 배경

모로 분해