LogSumExp

LSE(LogSumExp)(RealSoftMax^[1] 또는 다변량 소프트플러스라고도 함) 기능은 최대 기능에 대한 부드러운 최대값 - 주로 기계 학습 알고리즘에 의해 사용된다.^[2]이 값은 다음과 같은 인수의 지수 합계의 로그로 정의된다.

\mathrm {LSE}(x_{1},\dots,x_{n})=\log \left(\exp(x_{1})+\cdots +\exp(x_{n}}\right).

특성.

LogSumExp 함수 도메인은 실제 좌표 공간인 $\mathbb {R} ^{n}$ n {\displaystyle $\$ $mathb {R} ^{n$ 이며, 코드메인은 실제 선인 $\mathbb {R}$ {\ $displaystyle$ \ $mathb {R}}$ 이다. $\max _{i}x_{i}$ $\max _{i}x_{i}$ $\max _{i}x_{i}$ $\max _{i}x_{i}$ ${\$ 에 대한 근사값이며, 범위는 $\max _{i}x_{i}$ 다음과 같다.

\max{\x_{1},\dots ,x_{n}\}\leq \mathrm {LSE}(x_{1}, dots ,x_{n})\leq \max {\x_{1},\dots ,x_{n}}}+\log(n).

$n=1$ = $n=1$ $n=1$ 이 아닌 한 첫 번째 불평등은 엄격하다 $n=1$ 두 번째 불평등은 모든 주장이 동일하지 않는 한 엄격하다. (증거:Let $m=\max _{i}x_{i}$ = $m=\max _{i}x_{i}$ $m=\max _{i}x_{i}$ x $m=\max _{i}x_{i}$ ${\$ 그런 다음 $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ ( $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ ) $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ i = $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ ( $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ x $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ ) $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ ( $\exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)$ ) ${\displaystyle \exp(m)\leq \sum _{i=1}^{n}\exp(m$ 불평등에 로그인을 적용하면 결과가 나온다.)

게다가, 우리는 한계를 더 촘촘하게 만들기 위해 기능을 확장할 수 있다. ${\frac {1}{t}}\mathrm {LSE} (tx)$ ${\frac {1}{t}}\mathrm {LSE} (tx)$ ${\frac {1}{t}}\mathrm {LSE} (tx)$ S ${\frac {1}{t}}\mathrm {LSE} (tx)$ ( ${\frac {1}{t}}\mathrm {LSE} (tx)$ ) ${\frac {1}{t}}\mathrm {LSE} (tx)$ {\ $displaystyle$ {\ $frac$ {1 $}{t}\mathrm {LSE}(tx)}$ 을(를 ${\frac {1}{t}}\mathrm {LSE} (tx)$ 고려하십시오.그러면

\max{\x_{1},\dots,x_{n}\}\}{\frac {1}{{t}}}}\\\mathrm {LSE}(tx)\leq \\max {\x_{1},\dots,x_{n}}}+{\frac {\\\\\\\log(n(n}}}}}}}}}}}}}}}}}}}}}}}}.

(proof: 위의 불평등에서 $t>0$ 일부 $t>0$ > $t>0$ ${\displaystyle tx_{i})$ 에 $tx_{i}$ 대해 각 $x_{i}$ $x_{i}$ ${\$ $}$ 로 $x_{i}$ 교체하여 다음을 $tx_{i}$ 하십시오.

\max {\{tx_{1},\dots {LSE}(tx_{1},\dots,tx_{n})\leq \max {\{tx_{1},\dots ,tx_{n}\}}}+log(n)

$t>0$ , $t>0$ > 0 ${\displaystyle$ t $>0}$ 이후부터

t\max {\x_{1},\dots,x_{n}\}\mathrm {LSE}(tx_{1},\x_{n})\leq t\max {\x_{1},\dots ,x_{n}\}}}+\log(n).

마지막으로 $t$ $t$ 로 나누면 결과가 나온다 $t$ .

또한 대신 음수로 곱하면 당연히 $\min$ $\min \$ 함수와 $\min$ 비교할 수 있다.

\min {\x_{1},\dots,x_{n}\}\-{\frac {\log(n)}{t}}}{t}}}\frac {1}{-t}\fracmatrm {LSE}(-tx)\min {\x_{1},\dots}.

LogSumExp 기능은 볼록하며, 그 영역의 모든^[3] 곳에서 엄격히 증가하고 있다(그러나 모든 곳에서^[4] 볼록하지 않다).

$\mathbf {x} =(x_{1},\dots ,x_{n}),$ = $\mathbf {x} =(x_{1},\dots ,x_{n}),$ ( x $\mathbf {x} =(x_{1},\dots ,x_{n}),$ , $\mathbf {x} =(x_{1},\dots ,x_{n}),$ … $\mathbf {x} =(x_{1},\dots ,x_{n}),$ , $\mathbf {x} =(x_{1},\dots ,x_{n}),$ n ) $\mathbf {x} =(x_{1},\dots ,x_{n}),$ , $\mathbf {x} =(x_{1},\dots ,x_{n}),$ 부분파생상품은 $\mathbf {x} =(x_{1},\dots ,x_{n}),$ 다음과 같다.

{\frac {\partial }{\partial x_{i}}}{\mathbf {x}}}}={\frac {\exp x_{i}}{\sum _{j}\ex_{j}}},},

즉, LogSumExp의 구배는 소프트맥스 기능이다.

LogSumExp의 볼록 결합은 음의 엔트로피다.

로그 도메인 계산을 위한 로그섬 확장 트릭

LSE 함수는 로그 확률에서와 같이 로그 척도로 일반적인 산술 연산이 수행될 때 자주 접하게 된다.^[5]

선형 스케일의 곱셈 연산이 로그 스케일의 단순한 추가가 되는 것과 마찬가지로, 선형 스케일의 추가 연산은 로그 스케일의 LSE가 된다.

\mathrm {LSE}(\log(x_{1}),...,\log(x_{n})=\log(x_{1}+\properties +x_{n}}}

로그 영역 계산을 사용하는 일반적인 목적은 매우 작거나 매우 큰 숫자가 한정된 정밀 부동 소수점 번호를 사용하여 직접 표시될 때(즉, 선형 도메인에서) 정확도를 높이고 과소 흐름 및 오버플로 문제를 방지하는 것이다.^[6]

불행하게도, 이 경우에 직접 LSE를 사용하는 것은 다시 오버플로/과잉 문제를 일으킬 수 있다.따라서 (특히 위의 'max' 근사치의 정확도가 충분하지 않은 경우) 대신 다음과 같은 동등한 것을 사용해야 한다.따라서 IT++와 같은 많은 수학 라이브러리는 LSE의 기본 루틴을 제공하며 이 공식을 내부적으로 사용한다.

\mathrm {LSE}(x_{1},\dots,x_{n})=x^{*}+\log(\exp(x_{1}-x^{*})+\exp(x_-n}-x^{*}\right)}

여기서 $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ = $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ { $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ 1 $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ , $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ … $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ , x $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$ ${\$ {\cH00 $}, x_{n}\}}$

엄격히 볼록한 로그섬 확장형 함수

LSE는 볼록하지만 엄격히 볼록하지는 않다.0으로 설정된 추가 인수를 추가하여 엄격히 볼록한 로그섬 확장형 함수를^[7] 정의할 수 있다.

\mathrm {LSE} _{0}^{+}(x_{1},...,x_{n})=\mathrm {LSE}(0,x_{1},x_{n})

이 기능은 적절한 Bregman 발생기(강력하게 볼록하고 차별화됨)이다.예를 들어 다항식/이항식 계열의 적혈구로서 기계학습에서 접하게 된다.

열대 분석에서, 이것은 로그의 의미에 있는 합이다.

참고 항목

참조

^ Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.
^ Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18: 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442. S2CID 17259055.
^ El Ghaoui, Laurent (2017). Optimization Models and Applications.
^ "convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.
^ McElreath, Richard. Statistical Rethinking. OCLC 1107423386.
^ "Practical issues: Numeric stability". CS231n Convolutional Neural Networks for Visual Recognition.{{cite web}}: CS1 maint : url-status (링크)
^ Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225. Bibcode:2018arXiv180307225N. {{cite journal}}:Cite 저널은 필요로 한다. journal=(도움말)

[1] Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.

[F._Nielsen_2016-2] Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18: 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442. S2CID 17259055.

[L._El_Ghaoui_2017-3] El Ghaoui, Laurent (2017). Optimization Models and Applications.

[4] "convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.

[5] McElreath, Richard. Statistical Rethinking. OCLC 1107423386.

[6] "Practical issues: Numeric stability". CS231n Convolutional Neural Networks for Visual Recognition.{{cite web}}: CS1 maint : url-status (링크)

[F._Nielsen_2018-7] Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225. Bibcode:2018arXiv180307225N. {{cite journal}}:Cite 저널은 필요로 한다. journal=(도움말)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Search

LogSumExp

네임스페이스

더

목차

특성.

로그 도메인 계산을 위한 로그섬 확장 트릭

엄격히 볼록한 로그섬 확장형 함수

참고 항목

참조