레벤베르크-마르카르트 알고리즘

수학과 컴퓨팅에서는 Levenberg-Marquardt 알고리즘(LMA 또는 LM)을 사용하여 감쇠 최소 제곱법(DLS)이라고도 합니다.이러한 최소화 문제는 특히 최소 제곱 곡선 피팅에서 발생합니다.LMA는 가우스-뉴턴 알고리즘(GNA)과 경사 강하 방법 사이에 보간한다.LMA는 GNA보다 견고합니다.즉, 대부분의 경우 최종 최소값보다 훨씬 늦게 시작하더라도 해결책을 찾을 수 있습니다.정상적으로 동작하는 기능 및 합리적인 시작 파라미터의 경우 LMA는 GNA보다 느린 경향이 있습니다.LMA는 신뢰영역접근법에서는 Gauss-Newton으로 간주할 수도 있습니다.

이 알고리즘은 프랭크포드 육군 병기공장에서 일하던 1944년 케네스 레벤버그에 ^[1]의해 처음 출판되었다.1963년 DuPont에서 통계학자로 일했던 Donald Marquardt에 ^[2]의해 재발견되었고 Girard,^[3] Wynne^[4], ^[5]Morrison에 의해 독립적으로 발견되었다.

LMA는 일반적인 곡선 적합 문제를 해결하기 위해 많은 소프트웨어 애플리케이션에서 사용됩니다.Gauss-Newton 알고리즘을 사용하면 1차 ^[6]방식보다 더 빨리 수렴할 수 있습니다.단, 다른 반복 최적화 알고리즘과 마찬가지로 LMA는 로컬 최소값만 찾습니다.이것은 반드시 글로벌 최소값은 아닙니다.

문제

Levenberg-Marquardt 알고리즘의 주요 적용은 최소 제곱 곡선 적합 문제에 있습니다. 독립 변수와 종속 변수의m개의 $m$ $\left(x_{i},y_{i}\right)$ 쌍( $\left(x_{i},y_{i}\right)$ yi $)$ 세트 $(\displaystyle$ \ $left(x_i,y_{i}\right)$ 가 $주어지면$ $\left(x_{i},y_{i}\right)$ β $\$ \style $\boldsyl\$ bol\ $beta}$ ${\boldsymbol {\beta }}$ 를 찾습니다. ${\boldsymbol {\beta }}$ $모델$ $S\left({\boldsymbol {\beta }}\right)$ $f$ $f\left(x,{\boldsymbol {\beta }}\right)$ β $f\left(x,{\boldsymbol {\beta }}\right)$ { $f\left(x,{\boldsymbol {\beta }}\right)$ $f\$ $left(x,$ $S\left({\boldsymbol {\beta }}\right)$ {\ $boldsymbol {beta }}\right})$ 의 $f\left(x,{\boldsymbol {\beta }}\right)$ $S\left({\boldsymbol {\beta }}\right)$ 편차 S $)$ 의 제곱합이 최소화되도록 합니다.

β

{i}-f\left(x_{i},{\boldsymbol

{\

wordsymbol }\right]^{2

}. 공백이 아닌 것으로 간주됩니다.

해결 방법

다른 수치 최소화 알고리즘과 마찬가지로 Levenberg-Marquardt 알고리즘은 반복 절차입니다.최소화를 시작하려면 매개 ${\boldsymbol {\beta }}$ 벡터 ${\boldsymbol {\beta }}$ β $(\displaystyle\boldsymbol\$ text ${\boldsymbol {\beta }}$ 에 대한 초기 추측을 제공해야 ${\boldsymbol {\beta }}^{\text{T}}={\begin{pmatrix}1,\ 1,\ \dots ,\ 1\end{pmatrix}}$ . 최소값이 1개인 경우, ${\boldsymbol {\beta }}^{\text{T}}={\begin{pmatrix}1,\ 1,\ \dots ,\ 1\end{pmatrix}}$ T ${\boldsymbol {\beta }}^{\text{T}}={\begin{pmatrix}1,\ 1,\ \dots ,\ 1\end{pmatrix}}$ ( ${\boldsymbol {\beta }}^{\text{T}}={\begin{pmatrix}1,\ 1,\ \dots ,\ 1\end{pmatrix}}$ , 1, $…,$ ${\boldsymbol {\beta }}^{\text{T}}={\begin{pmatrix}1,\ 1,\ \dots ,\ 1\end{pmatrix}}$ ) {\ $displaystyle\boldsymbol\text}$ {\ $text$ }T $}}="begin{pmatrix}1,\1,\"$ begin{pmatrix $},\"1\end$ {pmatrix $}}"$ 는 ${\boldsymbol {\beta }}^{\text{T}}={\begin{pmatrix}1,\ 1,\ \dots ,\ 1\end{pmatrix}}$ 정상적으로 동작합니다.복수의 최소값이 있는 경우 초기 추측이 이미 최종 솔루션에 다소 근접한 경우에만 알고리즘이 글로벌 최소값으로 수렴됩니다.

$f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ 반복 단계에서 $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ ${\boldsymbol {\beta }}$ β $(\$ $displaystyle$ $boldsymbol$ $\beta$ $})$ 는 ${\boldsymbol {\beta }}$ ${\boldsymbol {\beta }}+{\boldsymbol {\delta }}$ 추정치 $β$ + $display$ (\ $displaystyle$ \ $boldsymbol$ $\$ 로 대체됩니다 $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ $,{\boldsymbol$ {\boldsymbol ${\$ boldsymbol }+{\ $boldsymbol$ {\ $bold }}$ 은 $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ (는 $)$ 선형화에 의해 근사됩니다.

({displaystyle f\left(x_{i},{\boldsymbol {beta }}+{i}{\boldsymbol {beta}}}+\mathbf {J}_{i}{\boldsymbol {dta }}})

어디에

\displaystyle \mathbf {J} _{i} = flac {\left(x_{i}, {\boldsymbol {\right}} {\boldsymbol {\boldsymbol} }

는 ${\boldsymbol {\beta }}$ β {\ $displaystyle$ {\ $boldsymbol$ {\displaystyle ${\boldsymbol {\beta }}$ 에 대한 f{\ $displaystyle$ f $}$ 의 $f$ 구배(이 경우 행 표시)입니다.

제곱 편차의 $($ β) { $style$ S $\left({\bold$ symbol $\$ $beta$ }})는 $S\left({\boldsymbol {\beta }}\right)$ ${\boldsymbol {\beta }}$ β $(\displaystyle\bold$ symbol $\$ $beta$ 에 대해 0의 구배를 가진다. $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ 의 f ( x $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ i $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ , $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ + $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ ){ $displaystyle$ f \ $left$ ( $x$ { i , { \ $bold$ symbol \ $beta$ } } + { \ $bold$ symbol \ $delta$ } } $frightrightright$ the $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ the $f\left(x_{i},{\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ the the the the the the the the the the the the the the the the the the the the the the the

(\displaystyle S\left({\boldsymbol {\boldsymbol}+{i=1}^{m}\left(x_{i}, {\boldsymbol})-\mathbf {J}_i}\boldsymbol {\mboldsy})

또는 벡터 표기법으로는

디스플레이 스타일S\left({\boldsymbol{\beta}}와{\boldsymbol{\delta}}\right)&, \approx \left\ \mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)-\mathbf{J}{\boldsymbol{\delta}}\right\ ^{2}\\&, =\left[\mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)-\mathbf{J}{\boldsymbol{\delta}}\right]^{\mathrm{T}}\left는 경우에는\mathbf{y}-\mathbf{f}.\left({\boldsymbol {beta }\right}-\mathbf {J}\boldsymbol {delta }\right\&, =\left[\mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)\right]^{\mathrm{T}}\left[\mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)\right]-\left[\mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)\right]^{\mathrm{T}}\mathbf{J}{\boldsymbol{\delta}}-\left(\mathbf{J}{\boldsymbol{\delta}}\right)^{\mathrm.{T}}\left[\mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)\right]+{\boldsymbol{\delta}}^{\mathrm{T}}}{J}{\boldsymbol{\delta}}\\& \mathbf, =\left[\mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)\right]^{\mathrm{T}}\left[\mathbf{y}-\mathbf{f}\left({\boldsymbol{\beta}}\right)\right]-2\lef{J}^{\mathrm{T}\mathbf.t[)mathbf {y} -\mathbf {f} \left({\boldsymbol {beta }\right)^{\mathrm {T}}\mathbf {J}{boldsymbol {delta }}+{\boldsymathbrm {T} ^}\end { aligned}}

$S$ $S\left({\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ ( ${\boldsymbol {\delta }}$ β + $)$ ) { $displaystyle$ S \ $left$ ( { \ $bold$ symbol \ $beta$ } + { \ $bold$ $symbol$ $S\left({\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ \ $delta }$ $S\left({\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ )의 $S\left({\boldsymbol {\beta }}+{\boldsymbol {\delta }}\right)$ ${\boldsymbol {\delta }}$ ${\boldsymbol {\delta }}$ 도함수를 취하여 결과를 0으로 설정하면,

\displaystyle \left(\mathbf {J} ^{\mathbf {J} \right} =\mathbf {J} ^{\mathbf {T} } \left[\mathbf {f} - \mboldsymbol } } = \mathbold symbf {J} }

$\mathbf {J}$ 서 $\mathbf {J}$ J(\ $displaystyle \mathbf {J})$ 는 $\mathbf {J}$ Jacobian 매트릭스이며 $,$ $i$ (\ $displaystyle$ $\mathbf {J$ } - 행은 $i$ $(\$ \mathbf { $J$ $} _{i$ 이고 $\mathbf {f} \left({\boldsymbol {\beta }}\right)$ $\mathbf {f} \left({\boldsymbol {\beta }}\right)$ ( $β)\$ displaystyle \ $mathbf$ { $f$ $}$ 및 y $(\$ 는 $\mathbf {f} \left({\boldsymbol {\beta }}\right)$ y $(\$ $displaysty$ $)$ 입니다. $e$ ) i $i$ $f\left(x_{i},{\boldsymbol {\beta }}\right)$ - 제1 $f\left(x_{i},{\boldsymbol {\beta }}\right)$ $xi$ β $)$ {\ $displaystyle f\left$ {i $},$ {\ $boldsymbol\$ $display y_{i$ $}})$ $y_{i}$ $.$ β $(\$ style $\display\boldsymbol\beta })$ 에 ${\boldsymbol {\beta }}$ 대해 얻은 위의 표현은 Gauss-Newton법에 따릅니다.위에서 정의한 야코비안 행렬은 (일반적으로) 정사각형 행렬이 아니라 m $m\times n$ × $n$ (\displaystyle m $\times$ n $m\times n$ $m\times n$ 의 직사각형 행렬입니다. $n$ 서 n $(\displaystyle$ n $)$ 은 $n$ ${\boldsymbol {\beta }}$ 의 수 $(\$ 행렬 곱셈 $\left(\mathbf {J} ^{\mathrm {T} }\mathbf {J} \right)$ $\left(\mathbf {J} ^{\mathrm {T} }\mathbf {J} \right)$ $\left(\mathbf {J} ^{\mathrm {T} }\mathbf {J} \right)$ J $)(\$ ^{\ $mathbf$ { $T}$ }\ $mathbf {J}$ \ $right)}$ 은 $\left(\mathbf {J} ^{\mathrm {T} }\mathbf {J} \right)$ $n\times n$ 한 $n\times n$ n × $n\times n$ (\ $displaystyle$ n\ $times$ n $)$ 의 $n\times n$ 정사각형 행렬을 생성하고, 오른쪽의 행렬 벡터 곱은n(\ $displaystyle$ n $n$ 의 $n$ 를 나타냅니다.결과는 n개의 $선형$ 방정식 $n$ 이며, ${\boldsymbol {\delta }}$ 방정식은 ${\$ {\ $displaystyle$ {\ $boldsymbol$ {\ $delta$ ${\boldsymbol {\delta }}$ 에 대해 풀 수 있습니다.

Levenberg의 공헌은 이 방정식을 "감쇠된 버전"으로 대체한 것이다.

\displaystyle \left(\mathbf {J} ^{\mathbf {T} } \mathbf {J} +\boldda \mathbf {I} \right) {\boldsymbol {J} ^{\mathbf {T} } } \left[\mathbf {J}

$\mathbf {I}$ 서 $\mathbf {I}$ I(\ $displaystyle \mathbf {I})$ 는 $\mathbf {I}$ 아이덴티티 매트릭스이며 추정 파라미터 ${\boldsymbol {\beta }}$ β(\ $displaystyle\boldsymbol$ $\delta$ ${\boldsymbol {\beta }}$ 에 ${\boldsymbol {\delta }}$ 증분값으로 ${\boldsymbol {\delta }}$ (\displaystyle\boldsymbol\bol\ $beta$ })를 부여합니다.

(음수가 아닌) 감쇠 계수 $\lambda$ (\ $displaystyle \lambda)$ 는 $\lambda$ 각 반복마다 조정됩니다.S $(\displaystyle$ S $)$ 의 $S$ 감소 속도가 빠르면 더 작은 값을 사용하여 알고리즘을 Gauss-Newton 알고리즘에 가깝게 만들 수 있습니다.반복으로 인해 잔차 감소가 불충분하면 $\lambda$ \ $lambda)$ 를 $\lambda$ 증가시켜 경사 하강 방향에 한 걸음 더 가까이 다가갈 수 있습니다.S $(\displaystyle$ S $)$ 의 $S$ 베타 $(\$ 에 대한 $-2\left(\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]\right)^{\mathrm {T} }$ 는 $-2$ ( $-2\left(\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]\right)^{\mathrm {T} }$ [ $-2\left(\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]\right)^{\mathrm {T} }$ y $-2\left(\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]\right)^{\mathrm {T} }$ - $-2\left(\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]\right)^{\mathrm {T} }$ ( $])$ T(\ $displaystyle$ - $2$ \left (\ $mathbf {$ J} $^{\mathbf$ {T} } } } } \ $left[\mathbf$ {\ $mathbf {y })입니다.$ fore 값이 큰 $(\displaystyle\displayda$ 는 기울기와 반대 방향으로 대략적으로 스텝을 수행합니다.계산된 스텝의 길이 ${\boldsymbol {\delta }}$ ${\boldsymbol {\beta }}+{\boldsymbol {\delta }}$ $displaystyle\boldsymbol\delta$ }) 또는 ${\boldsymbol {\delta }}$ 최신 파라미터 ${\boldsymbol {\beta }}+{\boldsymbol {\delta }}$ + ${\boldsymbol {\beta }}+{\boldsymbol {\delta }}$ $(\$ $boldsymbol\beta }}+{\boldsymbol\delta$ }})의 ${\boldsymbol {\beta }}+{\boldsymbol {\delta }}$ 제곱합이 사전 정의된 한계보다 낮아지고 반복이 정지된 ${\boldsymbol {\beta }}$ $boldsymbol$ {{ $bold }}$ 이 ${\boldsymbol {\beta }}$ (가) 해결책으로 간주됩니다.

$\lambda$ $\|\mathbf {J} ^{\mathrm {T} }\mathbf {J} \|$ $\|\mathbf {J} ^{\mathrm {T} }\mathbf {J} \|$ $\|\mathbf {J} ^{\mathrm {T} }\mathbf {J} \|$ J $\|\mathbf {J} ^{\mathrm {T} }\mathbf {J} \|$ {\ { \ $displaystyle$ \ $lambda$ } ^ { \ $mathbrm {$ T } } \ $mathbf$ { J } ^ { $\mathbf {J} ^{\mathrm {T} }\mathbf {J} +\lambda \mathbf {I}$ $\|\mathbf {J} ^{\mathrm {T} }\mathbf {J} \|$ { { \ $displaystyle$ $\$ $mathbf { J$ } ^ { } { } { { { { displaystyle \ $mathbf$ { \ $\mathbf {J} ^{\mathrm {T} }\mathbf {J} +\lambda \mathbf {I}$ } $}$ } } { $ting$ { { { { $\mathbf {J} ^{\mathrm {T} }\mathbf {J} +\lambda \mathbf {I}$ when when when when when { when when when { { when when when when when when whenall gradient $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ - $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ J $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ [ $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ - $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ ( $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ β ) $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ ]{ $displaystyle$ \ $lambda$ ^ { - $1 }\ mathbf { T$ } \ left [ \ $mathbf$ { $y }$ - \ $mathbf$ { $f$ } \ $lef$ ( { \ $boldsymbol$ \ f $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$ } } } \ right $\lambda ^{-1}\mathbf {J} ^{\mathrm {T} }\left[\mathbf {y} -\mathbf {f} \left({\boldsymbol {\beta }}\right)\right]$

솔루션 척도를 불변하게 만들기 위해 Marquardt의 알고리즘은 곡률에 따라 척도를 조정한 구배 각 성분의 수정된 문제를 해결했습니다.그러면 구배가 작은 방향을 따라 더 큰 이동이 제공되므로 작은 구배 방향으로의 느린 수렴을 피할 수 있습니다.플레처는 1971년 논문에서 비선형 최소 제곱에 대한 수정된 마르카르트 서브루틴은 $\mathbf {J} ^{\text{T}}\mathbf {J}$ 항등 행렬 I(\ $displaystyle \mathbf {I})$ 를 $\mathbf {I}$ $\mathbf {J} ^{\text{T}}\mathbf {J}$ T $(\$ 의 대각 요소로 구성된 대각 $\mathbf {I}$ 로 대체하여 형태를 단순화했다. $T}}\mathbf {J$

\left[\mathbf {J} ^{\mathbf {T} } \mathbf {J} ^{\mathbf {T} ^{\right}} {\boldsyol {J} =} {\mathbf}

선형 불량 문제를 해결하기 위해 사용되는 티코노프 정규화와 통계의 추정 기법인 능선 회귀에서도 유사한 감쇠 계수가 나타난다.

댐핑 매개 변수 선택

댐핑 $\lambda$ 의 최선의 선택을 위해 다양한 $휴리스틱인수$ 가 제시되고 $있습니다$ 이러한 선택 중 일부가 알고리즘의 로컬컨버전스를 보증하는 이유를 나타내는 이론적인 인수가 존재합니다.다만, 이러한 선택으로 알고리즘의 글로벌컨버전스가 악영향을 받을 수 있습니다.특히 최적에 가까운 매우 느린 수렴의 가파른 내리막의 ble 특성.

모든 선택지의 절대값은 초기 문제의 규모를 얼마나 잘 확장하느냐에 따라 달라집니다.Marquardt λ 0{\displaystyle \lambda_{0}}며 이것은 팩터 ν 1{\displaystyle \nu 1}. 처음에λ=0{\displaystyle \lambda =\lambda_{0}λ}와 광장에는 S의 잔여 금액 계산 하나 속 후{S\left({\boldsymbol{\beta}}\right)\displaystyle}(β)하고 값부터 시작하여 추천했다.pfr $\lambda =\lambda _{0}$ 감쇠 계수가 $=$ $0인$ 시작점 $({displaystyle$ \displayda $=\$ $displayda$ _ ${0$ $}/\nu$ })이고 $\lambda =\lambda _{0}$ 두 번째 $\lambda _{0}/\nu$ 0 / $\lambda _{0}/\nu$ { $displaystyle$ $\displayda$ _ ${0$ }/\nu $\lambda _{0}/\nu$ 입니다 $\nu$ . 이 두 가지 모두 초기 지점보다 나쁠 경우 감쇠가 $개선$ 될 때까지 연속적으로 증가합니다.일부 $(\displaystyle$ k $k$ 에 대해 $\lambda _{0}\nu ^{k}$ 감쇠 계수가 0 $(\$ _ ${0}\nu$ ^{ $k})$ 로 $\lambda _{0}\nu ^{k}$ 검출되었습니다.

$\lambda /\nu$ $\lambda /\nu$ / ${\$ { $displaystyle \lambda$ / \ $nu$ }를 $\lambda /\nu$ 사용하면 잔류 제곱이 감소하는 경우, 이 $\lambda$ 은 $\lambda$ {\ { $displaystyle$ \ $lambda$ }의 새로운 값으로 간주되며(이 감쇠계수로 얻은 최적의 위치는 $\lambda /\nu$ / $\lambda /\nu$ { $displaystyle$ \ $lambda$ / \nu }로 간주됨) 프로세스를 계속합니다.nu $}$ 은 $\lambda /\nu$ (는) 더 나쁜 잔차를 발생시켰지만, ${\$ ${\$ $displaystyle$ \ $displayda$ $\lambda$ }를 $\lambda$ 사용하면 더 나은 잔차를 발생시킨 후 $\lambda$ $\lambda$ {\displaystyle \displayda $\lambda$ }는 변경되지 $\lambda$ 않고 $display$ {\displaystyle $\da }$ 를 $\lambda$ 댐핑 계수로 하여 얻은 값으로 간주됩니다.

댐핑 파라미터의 제어를 위한 효과적인 전략은 지연된 만족이라고 불리며, 각 오르막 스텝에 대해 파라미터를 소량 증가시키고 각 내리막 스텝에 대해 많은 양을 감소시키는 것으로 구성된다.이 전략의 배후에 있는 아이디어는 최적화 초기에 너무 빠르게 내리막길을 이동하는 것을 방지하고, 따라서 향후 반복에서 사용할 수 있는 단계를 제한하여 ^[7]수렴 속도를 늦추는 것입니다.2배 증가 및 3배 감소는 대부분의 경우에 효과가 있는 것으로 나타났지만, 큰 문제의 경우 1.5배 증가 및 ^[8]5배 감소로 극단값이 더 잘 작동할 수 있습니다.

측지 가속

Levenberg-Marquardt 스텝을 파라미터 공간의 측지경로를 따른 ${\boldsymbol {v}}_{k}$ v $(\$ {\ $boldsymbol {v}}$ _ ${k$ })로 ${\boldsymbol {v}}_{k}$ 해석할 때, k(\style $\$ 의 가속도를 ${\boldsymbol {a}}_{k}$ ${\boldsymbol {a}}_{k}$ 하는 2차 항을 추가하여 방법을 개선할 수 있습니다.

{\boldsymbol {v}}_{k}+{\frac {1}{2}}{\boldsymbol {a}}_{k}

${\boldsymbol {a}}_{k}$ 서 k $(\$ { $}}_{k$ })는 ${\boldsymbol {a}}_{k}$ 의 해결책입니다.

{\boldsymbol {J}}_{k}{\boldsymbol {a}_{k}=-f_{v}}

이 측지 가속 항은 방향 $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ v $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ $=$ $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ μ $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ f $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ x $){displaystyle f_{v$ }=\ $sum$ _ ${\mu \nu }v_{\$ mu $}v_{\nu$ }\ $symbold _{\$ syol} $mbdom }$ 에 의해서만 의존하므로, f_v μ μ μ $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ μ μ μ μ $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ f (x) μ f (x) $f_{vv}=\sum _{\mu \nu }v_{\mu }v_{\nu }\partial _{\mu }\partial _{\nu }f({\boldsymbol {x}})$ f (x) μ f (x) μ f (x) μ f는 완전한 2차 미분 매트릭스를 계산할 필요가 없으며, 컴퓨팅 ^[9]비용 측면에서 약간의 오버헤드만 필요로 합니다.2차 도함수는 상당히 복잡한 식일 수 있기 때문에, 그것을 유한 차분 근사치로 대체하는 것이 편리할 수 있다.

({displaystyle}{vv}^{i}&\약 {\boldsymbol {x}}+hboldsymbol {x})-2f_{i}({\boldsymbol {x})+f_{i}({\boldsymbol {x})}-hboldsymbol{i})}})}{h}-{\boldsymbol {J}}_{i}{\boldsymbol {delta }}\right}\end {aligned}}}

$f({\boldsymbol {x}})$ 서 f $)$ { $displaystyle$ f $({\boldsymbol {x}})$ 및 $f({\boldsymbol {x}})$ $(\$ { $J})$ 는 ${\boldsymbol {J}}$ 알고리즘에 의해 이미 계산되었기 때문에 f $f({\boldsymbol {x}}+h{\boldsymbol {\delta }})$ + $†)$ { $displaystyle$ f $({\boldsymbol {x})}+h{\boldsymbol {delta}}}}}}}}}}}}}}}}$ 를 계산하기 위해 1개의 추가 함수 평가만 필요합니다 $f({\boldsymbol {x}}+h{\boldsymbol {\delta }})$ 유한 차분 $(\displaystyle$ h $)$ 의 $h$ 선택은 알고리즘의 안정성에 영향을 줄 수 있으며 일반적으로 0.1 ^[8]정도의 값이 적절합니다.

가속도가 속도와 반대 방향을 가리킬 수 있기 때문에 댐핑이 너무 작을 경우 방법이 정지되는 것을 방지하기 위해 가속도에 대한 추가 기준이 추가되어 다음과 같이 요구됩니다.

{2\left} {\boldsymbol {a} _{k} \right} {\leq \alpha

$\alpha$ 서 $\alpha$ α {\ $displaystyle \alpha }$ 는 $\alpha$ 보통 1보다 작은 값으로 고정되며 더 어려운 문제에 대해서는 ^[8]더 작은 값으로 고정됩니다.

지오데식 가속도 항을 추가하면 수렴 속도를 크게 높일 수 있으며, 특히 알고리즘이 가능한 단계가 더 작고 2차 항으로 인해 더 높은 정확도로 인해 상당한 개선을 ^[8]제공하는 목적 함수의 풍경에서 좁은 협곡을 통과할 때 유용합니다.

예

잘 맞지 않다

더 적합하다

최적

이 예에서는 Levenberg – Marquardt 알고리즘을 Leasqr 함수로 사용하여 y $= a$ δ ( $y=a\cos \left(bX\right)+b\sin \left(aX\right)$ ) + $y=a\cos \left(bX\right)+b\sin \left(aX\right)$ $δ$ ( $X$ ) \ left ( b X \ $cos$ \ cos $\$ left ( $bX$ \ $right$ ) + $b \ sin$ \ left ( $aX$ \ $right$ ) } $y=a\cos \left(bX\right)+b\sin \left(aX\right)$ 를 $y=a\cos \left(bX\right)+b\sin \left(aX\right)$ 적합시키려고 합니다.이 그래프는 초기 곡선에 사용된 a $=$ $a=100$ { $displaystyle a=$ $b=102$ $a=100$ $b=102$ $=$ 102 { $displaystyle$ b $=$ 102 $b=102$ } $a=100$ 에 대해 점진적으로 더 잘 적합함을 보여줍니다.마지막 그래프의 모수가 원본에 가장 가깝게 선택된 경우에만 곡선이 정확하게 적합됩니다.이 방정식은 Levenberg-Marquardt 알고리즘의 매우 민감한 초기 조건의 예입니다. $이$ 감도의 한 가지 이유는 다중 최소값이 ${\hat {\beta }}$ 하기 때문입니다. $\cos \left(\beta x\right)$ cos $\cos \left(\beta x\right)$ ( $\cos \left(\beta x\right)$ x $\cos \left(\beta x\right)$ ) { $display$ style \ $cos$ \ left ( \ $beta$ $\cos \left(\beta x\right)$ x \ $right$ $)$ ${\hat {\beta }}$ {\ ${\hat {\beta }}+2n\pi$ β ${\hat {\beta }}+2n\pi$ ^ ${\hat {\beta }}+2n\pi$ + $2$ ${\$ （ $display$ \ $hat ）$

「」를 참조해 주세요.

신뢰 지역
넬더-메드법
르벤베르크-마르카르트 알고리즘의 변형은 ^[10]방정식의 비선형 시스템을 푸는 데에도 사용되었다.

레퍼런스

^ Levenberg, Kenneth (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares". Quarterly of Applied Mathematics. 2 (2): 164–168. doi:10.1090/qam/10666.
^ Marquardt, Donald (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters". SIAM Journal on Applied Mathematics. 11 (2): 431–441. doi:10.1137/0111030. hdl:10338.dmlcz/104299.
^ Girard, André (1958). "Excerpt from Revue d'optique théorique et instrumentale". Rev. Opt. 37: 225–241, 397–424.
^ Wynne, C. G. (1959). "Lens Designing by Electronic Digital Computer: I". Proc. Phys. Soc. Lond. 73 (5): 777–787. Bibcode:1959PPS....73..777W. doi:10.1088/0370-1328/73/5/310.
^ Morrison, David D. (1960). "Methods for nonlinear least squares problems and convergence proofs". Proceedings of the Jet Propulsion Laboratory Seminar on Tracking Programs and Orbit Determination: 1–9.
^ Wiliamowski, Bogdan; Yu, Hao (June 2010). "Improved Computation for Levenberg–Marquardt Training" (PDF). IEEE Transactions on Neural Networks and Learning Systems. 21 (6).
^ Transtrum, Mark K; Machta, Benjamin B; Sethna, James P (2011). "Geometry of nonlinear least squares with applications to sloppy models and optimization". Physical Review E. APS. 83 (3): 036701. arXiv:1010.1449. Bibcode:2011PhRvE..83c6701T. doi:10.1103/PhysRevE.83.036701. PMID 21517619. S2CID 15361707.
^ ^a ^b ^c ^d Transtrum, Mark K; Sethna, James P (2012). "Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization". arXiv:1201.5885 [physics.data-an].
^ "Nonlinear Least-Squares Fitting". GNU Scientific Library. Archived from the original on 2020-04-14.
^ Kanzow, Christian; Yamashita, Nobuo; Fukushima, Masao (2004). "Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints". Journal of Computational and Applied Mathematics. 172 (2): 375–397. Bibcode:2004JCoAM.172..375K. doi:10.1016/j.cam.2004.02.013.

추가 정보

Moré, Jorge J.; Sorensen, Daniel C. (1983). "Computing a Trust-Region Step" (PDF). SIAM J. Sci. Stat. Comput. 4 (3): 553–572. doi:10.1137/0904038.
Gill, Philip E.; Murray, Walter (1978). "Algorithms for the solution of the nonlinear least-squares problem". SIAM Journal on Numerical Analysis. 15 (5): 977–992. Bibcode:1978SJNA...15..977G. doi:10.1137/0715063.
Pujol, Jose (2007). "The solution of nonlinear inverse problems and the Levenberg-Marquardt method". Geophysics. SEG. 72 (4): W1–W16. Bibcode:2007Geop...72W...1P. doi:10.1190/1.2732552.^{[영구 데드링크]}
Nocedal, Jorge; Wright, Stephen J. (2006). Numerical Optimization (2nd ed.). Springer. ISBN 978-0-387-30303-1.

외부 링크

알고리즘에 대한 자세한 설명은 C장 15.5의 수치 레시피에서 확인할 수 있다: 비선형 모델
C. T. Kelley, 최적화를 위한 반복적 방법, 응용 수학의 SIAM 프런티어, 1999년 제18호, ISBN 0-89871-433-8.온라인 복사
SIAM 뉴스의 알고리즘 이력
Ananth Ranganathan의 튜토리얼
K. 매드슨, H. B. 닐슨, O.Tingleff, 비선형 최소 제곱 문제에 대한 방법(비선형 최소 제곱 자습서, L-M 코드: 분석 Jacobian secant)
T. Strutz:데이터 적합성과 불확실성(가중치 최소 제곱 이상에 대한 실용적인 소개)스프링거 비에그 제2판, 2016년, ISBN 978-3-658-11455-8.
H. P. Gavin, 비선형 최소 제곱 곡선 적합 문제에 대한 Levenberg-Marquardt 방법(MATLAB 구현 포함)

[Levenberg-1] Levenberg, Kenneth (1944). "A Method for the Solution of Certain Non-Linear Problems in Least Squares". Quarterly of Applied Mathematics. 2 (2): 164–168. doi:10.1090/qam/10666.

[Marquardt-2] Marquardt, Donald (1963). "An Algorithm for Least-Squares Estimation of Nonlinear Parameters". SIAM Journal on Applied Mathematics. 11 (2): 431–441. doi:10.1137/0111030. hdl:10338.dmlcz/104299.

[Girard-3] Girard, André (1958). "Excerpt from Revue d'optique théorique et instrumentale". Rev. Opt. 37: 225–241, 397–424.

[Wynne-4] Wynne, C. G. (1959). "Lens Designing by Electronic Digital Computer: I". Proc. Phys. Soc. Lond. 73 (5): 777–787. Bibcode:1959PPS....73..777W. doi:10.1088/0370-1328/73/5/310.

[Morrison-5] Morrison, David D. (1960). "Methods for nonlinear least squares problems and convergence proofs". Proceedings of the Jet Propulsion Laboratory Seminar on Tracking Programs and Orbit Determination: 1–9.

[6] Wiliamowski, Bogdan; Yu, Hao (June 2010). "Improved Computation for Levenberg–Marquardt Training" (PDF). IEEE Transactions on Neural Networks and Learning Systems. 21 (6).

[Transtrum2011-7] Transtrum, Mark K; Machta, Benjamin B; Sethna, James P (2011). "Geometry of nonlinear least squares with applications to sloppy models and optimization". Physical Review E. APS. 83 (3): 036701. arXiv:1010.1449. Bibcode:2011PhRvE..83c6701T. doi:10.1103/PhysRevE.83.036701. PMID 21517619. S2CID 15361707.

[Transtrum2012-8] Transtrum, Mark K; Sethna, James P (2012). "Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization". arXiv:1201.5885 [physics.data-an].

[9] "Nonlinear Least-Squares Fitting". GNU Scientific Library. Archived from the original on 2020-04-14.

[10] Kanzow, Christian; Yamashita, Nobuo; Fukushima, Masao (2004). "Levenberg–Marquardt methods with strong local convergence properties for solving nonlinear equations with convex constraints". Journal of Computational and Applied Mathematics. 172 (2): 375–397. Bibcode:2004JCoAM.172..375K. doi:10.1016/j.cam.2004.02.013.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Search

레벤베르크-마르카르트 알고리즘

네임스페이스

더

목차

문제

해결 방법

댐핑 매개 변수 선택

측지 가속

예

「」를 참조해 주세요.

레퍼런스

추가 정보

외부 링크

Search

레벤베르크-마르카르트 알고리즘

문제

해결 방법

댐핑 매개 변수 선택

측지 가속

예

「 」를 참조해 주세요.

레퍼런스

추가 정보

외부 링크

「」를 참조해 주세요.