전진-후진 알고리즘

전방-후방 알고리즘은 $o_{1:T}:=o_{1},\dots ,o_{T}$ 의 관측/방출 $o_{1:T}:=o_{1},\dots ,o_{T}$ : $o_{1:T}:=o_{1},\dots ,o_{T}$ $o_{1:T}:=o_{1},\dots ,o_{T}$ $o_{1:T}:=o_{1},\dots ,o_{T}$ $o_{1:T}:=o_{1},\dots ,o_{T}$ , $o_{1:T}:=o_{1},\dots ,o_{T}$ $o_{1:T}:=o_{1},\dots ,o_{T}$ T ${\$ $T:=o_{1},\dots,o_{{}$ $T}}$ , i.e. it computes, for all hidden state variables $X_{t}\in \{X_{1},\dots ,X_{T}\}$ , the distribution ${\displaystyle P(X_{t}\ \ o_{1:$ $T$ 이 추론 작업을 보통 평활이라고 한다.알고리즘은 동적 프로그래밍의 원리를 이용하여 두 번의 패스로 후방 한계분포를 얻는 데 필요한 값을 효율적으로 계산한다.첫 번째 패스는 시간에 따라 전진하는 반면 두 번째 패스는 시간에 따라 후진하는 방식으로 진행되며, 따라서 이름은 전진-후진 알고리즘이다.

전방-후방 알고리즘이라는 용어는 또한 시퀀스 모델에서 전방-후방 방식으로 작동하는 일반 알고리즘 등급에 속하는 알고리즘을 지칭하는 데에도 사용된다.이러한 의미에서, 이 글의 나머지 부분에 있는 설명은 이 세분류의 특정한 한 가지 경우를 제외하고는 언급된다.

개요

첫 번째 통과에서 $P(X_{t}\ |\ o_{1:t})$ 알고리즘은 모든 $t\in \{1,\dots ,T\}$ , $t\in \{1,\dots ,T\}$ … $t\in \{1,\dots ,T\}$ , $t\in \{1,\dots ,T\}$ $t\in \{1,\dots ,T\}$ ${\displaystyle$ t $\in \{1,\dots$ , $P(X_{t}\ |\ o_{1:t})$ $t\in \{1,\dots ,T\}$ 에 대해 $P(X_{t}\ |\ o_{1:t})$ 에서 첫 $번째$ t ${\displaystyty$ t $}$ 관측치가 $t$ 주어진 특정 상태로 끝날 확률을 제공하는 일련의 전방 확률을 계산한다 $.$ $displaystyle P(X_{t}\$ \ $o_{1:t$ 두 번째 통과에서 알고리즘은 $시작점$ t{\ $displaystyle t$ 즉 $P(o_{t+1:T}\ |\ X_{t})$ ( $P(o_{t+1:T}\ |\ X_{t})$ + $P(o_{t+1:T}\ |\ X_{t})$ : $P(o_{t+1:T}\ |\ X_{t})$ $P(o_{t+1:T}\ |\ X_{t})$ t ) $P(o_{t+1:T}\ |\ X_{t})$ {\ $displaystyle$ P $(o_{t+1:$ 1): $T}\$ \ $X_{t$ 다음 두 가지 확률 분포 집합을 결합하여 전체 관측 순서가 주어진 특정 시점의 상태에 대한 분포를 구할 수 있다.

P(X_{t}\ \ o_{1:T}}=P(X_{t}\ \ o_{1:t}, o_{t+1:1:t}:T}\propto P(o_{t+1:T}\ \ X_{t}P(X_{t} o_{1:t})

마지막 단계는 베이즈 규칙의 적용과 $o_{t+1:T}$ t $o_{t+1:T}$ + $o_{t+1:T}$ : T ${\$ $X_{t}$ $}$ 및 $o_{t+1:T}$ $o_{1:t}$ 1 $o_{1:t}$ : $o_{1:t}$ ${\$ X $X_{t}$ ${\$ 지정 $o_{1:t}$ $X_{t}$

위에서 설명한 바와 같이 알고리즘에는 다음과 같은 세 가지 단계가 포함된다.

전진 확률 계산
역확률 계산
평활값 계산

전방 및 후방 단계를 "전방 메시지 패스" 및 "후방 메시지 패스"라고도 할 수 있다. 이러한 용어는 일반적인 믿음 전파 접근법에 사용되는 메시지 패스싱에 기인한다.시퀀스의 각 단일 관측치에서는 다음 관측치에서의 계산에 사용할 확률을 계산한다.평활 단계는 후진 패스 중에 동시에 계산할 수 있다.이 단계는 알고리즘이 보다 정확한 결과를 계산하기 위해 과거 출력의 관측치를 고려할 수 있도록 한다.

전방-후방 알고리즘은 어떤 시점에서도 가장 가능성이 높은 상태를 찾기 위해 사용될 수 있다.그러나 가장 가능성이 높은 상태의 순서를 찾는 데는 사용할 수 없다(Viterbi 알고리즘 참조).

전진 확률

일반적으로 전방-후방 알고리즘은 이산형 확률 모델뿐만 아니라 연속성에 적용될 수 있지만, 다음의 설명은 확률 분포보다는 확률 값의 행렬을 사용할 것이다.

우리는 주어진 숨겨진 마르코프 모델과 관련된 확률 분포를 다음과 같은 행렬 표기법으로 변환한다.The transition probabilities $\mathbf {P} (X_{t}\mid X_{t-1})$ of a given random variable $X_{t}$ representing all possible states in the hidden Markov model will be represented by the matrix $\mathbf {T}$ where the column index ${\d$ $isplaystyle j}$ 은 $j$ (는) 대상 상태를 나타내고 행 인덱스 $i$ $i$ 은(는) 시작 상태를 나타낸다 $i$ .A transition from row-vector state $\mathbf {\pi _{t}}$ to the incremental row-vector state $\mathbf {\pi _{t+1}}$ is written as $\mathbf {\pi _{t+1}} =\mathbf {\pi _{t}} \mathbf {T}$ . The example below represents a system whe각 단계가 70%이고 다른 상태로 전환될 확률은 30%이다.전환 매트릭스는 다음과 같다.

\mathbf {T} ={\begin{pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}}

전형적인 마르코프 모델에서 우리는 상태 벡터에 이 행렬을 곱하여 이후의 상태에 대한 확률을 구한다.숨겨진 마르코프 모델에서는 상태를 알 수 없으며, 우리는 대신 가능한 상태와 관련된 사건을 관찰한다.양식의 이벤트 행렬:

\mathbf {B} ={\begin{pmatrix}0.9&0.1\\0.2&0.8\end{pmatrix}}

주어진 특정 상태를 관측하기 위한 확률을 제공한다.위의 예에서 사건 2는 이 상태에서 발생할 확률이 10%인 반면 사건 1은 우리가 상태 1에 있는 경우 시간의 90%가 관찰될 것이다.대조적으로, 사건 1은 우리가 상태 2에 있고 사건 2가 발생할 확률이 80%인 경우에만 관찰될 것이다.시스템의 상태를 설명하는 임의의 행 벡터가 주어질 경우( $\mathbf {\pi }$ ${\$ } } $\mathbf {\pi }$ }) $\mathbf {\pi }$ 사건 j를 관측할 확률은 다음과 같다.

\mathbf {P}(O=j)=\sum _{i}\pi _{i}B_{i,j}}

The probability of a given state leading to the observed event j can be represented in matrix form by multiplying the state row-vector ( $\mathbf {\pi }$ ) with an observation matrix ( $\mathbf {O_{j}} =\mathrm {diag} (B_{*,o_{j}})$ ) containing only diagonal entries. 위의 예를 계속하여 사건 1의 관측 행렬은 다음과 같을 것이다.

\mathbf{O_{1} ={\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}

이를 통해 베이즈 규칙을 통해 $\mathbf {\pi '}$ 새로운 비정형 확률 상태 벡터 vector $\mathbf {\pi '}$ $\mathbf {\pi '}$ ′ ${\$ 을(를) 계산할 수 있으며, $\mathbf {\pi }$ ${\$ 의 각 요소가 다음과 같이 이벤트 1을 생성할 $\mathbf {\pi }$ 가능성에 따라 가중치를 부여한다.

\mathbf {\pi '} =\mathbf {\pi } \mathbf {O_{1}}}

우리는 이제 이러한 일반적인 절차를 일련의 관찰에 따라 구체화할 수 있다.초기 상태 벡터 $\mathbf {\pi } _{0}$ ${\$ } $_{0$ 을(를) 가정하면 $\mathbf {f_{0:0}} =\mathbf {\pi } _{0}$ : $\mathbf {f_{0:0}} =\mathbf {\pi } _{0}$ = $\mathbf {f_{0:0}} =\mathbf {\pi } _{0}$ 0 ${\$ = $\mathbf$ {\ $pi$ 으로 시작하고, likeli에 따라 상태 분포와 가중치를 업데이트한다.첫 번째 관찰의 우드:

\mathbf{f_{0:1}} =\mathbf {\pi } _{0}\mathbf {T} \mathbf {O_{o(1)}}}}

이 프로세스는 다음을 사용하여 추가 관찰로 진행할 수 있다.

\mathbf {f_{0:t} =\mathbf {f_{0:t-1} \mathbf {O_{o(t)}}}}}}

이 값은 전방 비 정규화 확률 벡터다.이 벡터의 i번째 입력은 다음을 제공한다.

\mathbf {f_{0:t}(i)=\mathbf {P}(o_{1},o_{2},\dots,o_{t},X_{t}=x_{i} \mathbf {\pi _} _{0}}}

전형적으로, 우리는 각 단계에서 확률 벡터를 정규화하여 그것의 항목이 1이 되도록 할 것이다.따라서 스케일링 계수는 각 단계에서 다음과 같이 도입된다.

\mathbf {{f}_{0:t} =c_{t}^{-1}\\mathbf {{\hat{f}_{0:t-1}\mathbf {T}\mathbf {O_{o(t)}}}}}}}}}}}}}}}}}}}

여기서 $\mathbf {{\hat {f}}_{0:t-1}}$ $\mathbf {{\hat {f}}_{0:t-1}}$ $\mathbf {{\hat {f}}_{0:t-1}}$ : $\mathbf {{\hat {f}}_{0:t-1}}$ - $\mathbf {{\hat {f}}_{0:t-1}}$ 1 ${\$ 은 이전 단계의 스케일링 벡터를 나타내고 $\mathbf {{\hat {f}}_{0:t-1}}$ $c_{t}$ $c_{t}$ ${\$ 는 결과 벡터 항목을 합한 1로 만드는 스케일링 계수를 나타낸다 $c_{t}$ .스케일링 인자의 산출물은 최종 상태와 무관하게 주어진 사건을 관측할 수 있는 총 확률이다.

{\displaystyle \mathbf {P}(o_{1},o_{2},\dots,o_{t} \mathbf {\pi } _{0}}}=\prod _{s=1}^{t_{s}}}}}}}}}}}.

이를 통해 우리는 스케일링된 확률 벡터를 다음과 같이 해석할 수 있다.

\mathbf {{\hat {f}}_{0:t}} (i)={\frac {\mathbf {f_{0:t}} (i)}{\prod _{s=1}^{t}c_{s}}}={\frac {\mathbf {P} (o_{1},o_{2},\dots ,o_{t},X_{t}=x_{i} \mathbf {\pi } _{0})}{\mathbf {P} (o_{1},o_{2},\dots ,o_{t} \mathbf {\pi } _{0})}}=\mathbf {P} (X_{t}=x_{i} o_{1},o_{2},\dots ,o_{t},\mathbf {\pi } _{0})

따라서 우리는 스케일링 인자의 산물이 우리에게 시간 t까지의 주어진 시퀀스를 관찰할 수 있는 총확률을 제공하고 스케일링된 확률 벡터가 이 시간에 각 상태에 있을 확률을 제공한다는 것을 발견한다.

역확률

유사한 절차를 구성하여 역확률을 찾을 수 있다.이들은 다음과 같은 확률을 제공하고자 한다.

\mathbf {b_{t:T}}}(i)=\mathbf {P}(o_{t+1}, o_{t+2},\dots, o_{T} X_{t}=x_{i}}}}

즉, 우리는 이제 특정 상태( $X_{t}=x_{i}$ t $X_{t}=x_{i}$ = $X_{t}=x_{i}$ i ${\$ 에서 출발한다고 가정하고, $X_{t}=x_{i}$ 이제 이 상태로부터 모든 미래 사건을 관측할 확률에 관심이 있다.초기 상태는 주어진 것으로 가정되기 때문에(즉, 이 상태의 이전 확률 = 100%), 우리는 다음과 같이 시작한다.

\mathbf {b_{T:T}} = [1\ 1\ \dots ]^{T

현재 행 벡터를 사용하는 반면, 전진 확률은 행 벡터를 사용하고 있다는 점에 유의하십시오.그런 다음 다음을 사용하여 역방향으로 작업할 수 있다.

\mathbf {b_{t-1:T}} =\mathbf {T} \mathbf {O_{t}} \mathbf {b_{t:T}}

이 벡터도 정상화할 수 있어 입력 내용이 하나로 통합될 수 있지만, 일반적으로 이 작업은 수행되지 않는다.각 항목이 특정 초기 상태에 주어진 미래 사건 시퀀스의 확률을 포함하고 있다는 점에 주목하면, 이 벡터를 정상화하는 것은 미래 사건에 주어진 각 초기 상태의 가능성을 찾기 위해 베이지스의 정리를 적용하는 것과 동등할 것이다(최종 상태 벡터에 대해 균일한 사전 추정).단, 전진 확률 계산에 사용되는 동일한 $c_{t}$ $c_{t}$ ${\$ 상수를 $c_{t}$ 사용하여 이 벡터를 스케일링하는 것이 더 일반적이다. b $\mathbf {b_{T:T}}$ : $\mathbf {b_{T:T}}$ : T ${\$ $T}} }$ 은(는) 크기가 조정되지 $\mathbf {b_{T:T}}$ 않지만 후속 작업 사용:

\mathbf {{\hat{b}_{t-1:T}}} =c_{t}^{-1}\mathbf {T} \mathbf {O_{t}} \mathbf {{\hat{b}}_{t:T}}

$\mathbf {{\hat {b}}_{t:T}}$ 서 $\mathbf {{\hat {b}}_{t:T}}$ $\mathbf {{\hat {b}}_{t:T}}$ $\mathbf {{\hat {b}}_{t:T}}$ : T ${\$ $T}} }$ 은(는) 이전의 스케일링 벡터를 나타낸다 $\mathbf {{\hat {b}}_{t:T}}$ .이 결과는 다음과 같은 방법으로 축소된 확률 벡터가 역확률과 관련이 있다는 것이다.

\mathbf {{\hat{b}_{t:T}}(i)={\frac {\mathbf {b_{t:T}}(i){\prod _{s=t+1}^{T}c_{s}}}}

이는 다음과 같은 값을 곱하여 주어진 시간 t에서 각 상태에 있을 총 확률을 찾을 수 있기 때문에 유용하다.

\mathbf {\gamma _{t}} (i)=\mathbf {P} (X_{t}=x_{i} o_{1},o_{2},\dots ,o_{T},\mathbf {\pi } _{0})={\frac {\mathbf {P} (o_{1},o_{2},\dots ,o_{T},X_{t}=x_{i} \mathbf {\pi } _{0})}{\mathbf {P} (o_{1},o_{2},\dots ,o_{T} \mathbf {\pi } _{0})}}={\frac {\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}}{\prod _{s=1}^{T}c_{s}}=\mathbf {{f}_{0:t}}(i)\cdot \mathbf {{\hat{b}}_{t:T}}(i)

이를 이해하려면 $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ : t ( $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ ) $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ : T ( $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ ) ${\displaystyle \mathbf {f_{0:t}}}(i)\cdot \mathbf {b_{t:$ $T}}(i)}$ 은(는) 지정된 사건을 시간 t에서 $x_{i}$ 상태 $x_{i}$ $x_{i}$ ${\$ 를 통과하는 방식으로 관찰할 확률을 제공한다 $\mathbf {f_{0:t}} (i)\cdot \mathbf {b_{t:T}} (i)$ .이 확률에는 모든 미래 사건을 포함하는 후진 확률뿐만 아니라 시간 t까지의 모든 사건을 포함하는 전방 확률을 포함한다.이것은 우리가 방정식에서 찾고 있는 분자로 이 값을 정상화하고 $X_{t}=x_{i}$ $X_{t}=x_{i}$ = $X_{t}=x_{i}$ $X_{t}=x_{i}$ ${\$ 의 확률만을 추출하기 위해 관측 시퀀스의 총 확률로 나눈다 $X_{t}=x_{i}$ 이 값들은 최종 확률을 계산하기 위해 전방과 후방 확률을 결합하기 때문에 때때로 "스무팅된 값"이라고 불린다.

$따라서$ $\mathbf {\gamma _{t}} (i)$ $\mathbf {\gamma _{t}} (i)$ ( $\mathbf {\gamma _{t}} (i)$ ) ${\displaystyle \mathbf {\\gamma$ _ ${t}(i)}$ 값은 t 시 각 상태에 있을 확률을 제공한다.이와 같이, 그것들은 언제든지 가장 가능성이 높은 상태를 결정하는 데 유용하다."가장 가능성이 높은 상태"라는 용어는 다소 모호하다.가장 개연성이 높은 상태는 특정 지점에서 정확할 가능성이 가장 높지만, 개별적으로 개연성이 있는 상태의 순서는 가장 개연성이 높은 순서가 아닐 가능성이 높다.각 점의 확률은 서로 독립적으로 계산되기 때문이다.그들은 상태들 사이의 전환 확률을 고려하지 않고, 따라서 두 가지 시점에서 모두 가장 가능성이 높지만 함께 발생할 확률은 매우 적은 두 순간(t와 t+1)의 상태를 얻을 수 있다. 즉, $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ ( $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ = $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ i $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ X $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ + $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ = $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ j $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ ) $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ ( $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ = $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ ) $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ ( $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ = x i ) $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ = $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ $\mathbf {P} (X_{t}=x_{i},X_{t+1}=x_{j})\neq \mathbf {P} (X_{t}=x_{i})\mathbf {P} (X_{t+1}=x_{j})$ ) ${\displaystyle \mathbf {P}(X_{t}=x_{j}}X_{t+1}=x_{j}\neq \mathbf {P}(X_{t}=x_{i})\mathbf {P}(X_{t+1}=x_{j$ 관측 시퀀스를 생성한 상태의 가장 가능성이 높은 시퀀스는 Viterbi 알고리즘을 사용하여 찾을 수 있다.

예

이 예는 러셀 & 노르빅 2010 제15장 567절의 우산 세계를 기초로 하며, 우산을 가지고 다니거나 가지고 있지 않은 다른 사람의 관찰을 통해 날씨를 유추하고자 한다.우리는 날씨에 대해 두 가지 가능한 상태를 가정한다: 상태 1 = 비, 상태 2 = 비가 오지 않는다.우리는 날씨가 매일 똑같을 확률은 70%이고 바뀔 확률은 30%라고 가정한다.전환 확률은 다음과 같다.

\mathbf {T} ={\begin{pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}}

우리는 또한 각 주가 두 가지 가능한 사건 중 하나를 발생시킨다고 가정한다: 사건 1 = 우산, 사건 2 = 우산 없음.각 상태에서 발생하는 조건부 확률은 확률 행렬에 의해 주어진다.

\mathbf {B} ={\begin{pmatrix}0.9&0.1\\0.2&0.8\end{pmatrix}}

그런 다음 우리는 다음과 같은 일련의 사건들을 관찰한다: {음브렐라, 우산, 우산, 우산 없음} 우리가 계산에서 나타낼 사건들:

\mathbf{O_{1} ={\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}~\mathbf {O_{2}} ={\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}~\mathbf {O_{3}} ={\begin{pmatrix}0.1&0.0\\0.0&0.8\end{pmatrix}~\mathbf {O_{4}} ={\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}~\mathbf {O_{5}} ={\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}

$\mathbf {O_{3}}$ $\mathbf {O_{3}}$ ${\$ 은(는) "우산이 없음" 관측 때문에 다른 관측치와 다르다는 $\mathbf {O_{3}}$ 점에 유의하십시오.

전진 확률을 계산하는 데 있어 우리는 다음과 같이 시작한다.

\mathbf {f_{0:0} ={\numpmatrix}0.5&0.5\end{pmatrix}}

관측 전 날씨가 어떤 상태인지 알 수 없다는 것을 나타내는 선행 상태 벡터 입니다.상태 벡터는 행 벡터로 제공되어야 하지만, 우리는 아래의 계산이 읽기 쉽도록 행렬의 전치화를 사용할 것이다.그리고 나서 우리의 계산은 다음과 같은 형태로 기록된다.

(\mathbf {{\hat{f}_{0:t}})^{{{{}}{T}=c_{t}^{1}-1}\mathbf {O_{t}}(\mathbf {T} )^{T}(\mathbf {{\hat{f}_{0:t-1})^{{{0:t-}}}}^{^{}T

다음 대신:

\mathbf {{f}_{0:t} =c_{t}^{-1}\mathbf {{\hat{f}_{0:t-1} \mathbf {T}\mathbf {O_{t}}}}}}}}}}}}}}

변환 행렬도 전치되어 있지만, 이 예에서는 전치 행렬이 원래 행렬과 동일하다는 점에 유의하십시오.이러한 계산을 수행하고 결과를 정규화하면 다음과 같은 이점을 얻을 수 있다.

(\mathbf {{\hat{f}_{0:1}})^{{T}=c_{1}^{-1}{\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}{\nd{pmatrix}0.7&0.3\\\0.3&0.7\end{pmatrix}{\nd{pmatrix}0.209\\0}{pmatrix}\0}\0.\0}.5000\end{pmatrix}=c_{1}^{1}:{-1}{\pmatrix}0.4500\\0.1000\end{pmatrix}={\\pmatrix}0.8182\\0.1818\end{pmatrix}}

(\mathbf {{\hat {{f}_{0:2}})^{{{}}T}=c_{2}^{-1}{\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}}{\begin{pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\begin{pmatrix}0.8182\\0.1818\end{pmatrix}}=c_{2}^{-1}{\begin{pmatrix}0.5645\\0.0745\end{pmatrix}}={\begin{pmatrix}0.8834\\0.1166\end{pmatrix}}

(\mathbf {{\hat {{f}_{0:3})^{{{}}T}=c_{3}^{-1}{\begin{pmatrix}0.1&0.0\\0.0&0.8\end{pmatrix}}{\begin{pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\begin{pmatrix}0.8834\\0.1166\end{pmatrix}}=c_{3}^{-1}{\begin{pmatrix}0.0653\\0.2772\end{pmatrix}}={\begin{pmatrix}0.1907\\0.8093\end{pmatrix}}

(\mathbf {{f}_{0:4})^{T}=c_{4}^{-1}{\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}}{\begin{pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\begin{pmatrix}0.1907\\0.8093\end{pmatrix}}=c_{4}^{-1}{\begin{pmatrix}0.3386\\0.1247\end{pmatrix}}={\begin{pmatrix}0.7308\\0.2692\end{pmatrix}}

(\mathbf {{\hat {{f}_{0:5})^{{T}=c_{5}^{-1}{\begin{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}}{\begin{pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\begin{pmatrix}0.7308\\0.2692\end{pmatrix}}=c_{5}^{-1}{\begin{pmatrix}0.5331\\0.0815\end{pmatrix}}={\begin{pmatrix}0.8673\\0.1327\end{pmatrix}

역확률의 경우 다음 항목부터 시작하십시오.

\mathbf{b_{5:5} ={\\pmatrix}1.0\\1.0\end{pmatrix}}}

그런 다음 (관측치를 역순으로 사용하고 상수를 서로 다르게 사용하여 정규화)를 계산할 수 있다.

\mathbf {{b}_{4:5} =\matrix {\pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\nd{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}{\nd{pmatrix}1.0000\\1.0000\end{pmatrix}=\nd{pmatrix}0.6900\\0.4100\end{pmatrix}={\\pmatrix}0.6273\\0.3727\end{pmatrix}}

\mathbf {{b}_{3:5} =\matrix {\pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\nd{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}{pmatrix}{{pmatrix}0.6273\\0.3727\end{pmatrix}}=\nd{pmatrix}0.4175\\\0.215\end{pmatrix}={\nd{pmatrix}0.6533\0.3467\end{pmatrix}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

\mathbf {{b}_{2:5} =\matrix {\pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\nd{pmatrix}0.1&0.0\\0.0&0.8\end{pmatrix}{\nd{pmatrix}0.6533\\\0.3467\end{pmatrix}}=\nd{pmatrix}\nd{pmatrix}}={\nd{pmatrix}0.128\nd{pmatrix}0}0.3763\\0.6237\end{pmatrix}

\mathbf {{b}_{1:5} =\matrix {\pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\nd{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}{\nd{pmatrix}0.3763\\\0.6237\end{pmatrix}}\nd {\pmatrix}0.2745\\0.1889\end{pmatrix}}={\pmatrix}0.5923\0.4077\end{pmatrix}}}}}

\mathbf {{b}_{0:5} =\matrix {\pmatrix}0.7&0.3\\0.3&0.7\end{pmatrix}}{\nd{pmatrix}0.9&0.0\\0.0&0.2\end{pmatrix}{\nd{pmatrix}0.5923\\0.4077\end{pmatrix}=\nd{pmatrix}0.3976\\0.2170\end{pmatrix}={\matrix}0.6469\\0.3531\end{pmatrix}

마지막으로 평활 확률 값을 계산한다.또한 이러한 결과는 앞에서 발견된 $c_{t}$ c $c_{t}$ ${\$ s로 역확률을 스케일링하지 않았기 때문에 항목 합계가 1이 되도록 스케일링해야 한다.따라서 위의 후진 확률 벡터는 실제로 미래 관측치가 주어진 시간 t에서 각 상태의 가능성을 나타낸다.이러한 벡터는 실제 역확률에 비례하기 때문에, 그 결과는 추가 시간 규모를 조정해야 한다.

{\displaystyle(\mathbf {\mathbf {\mathbf _{0}})^{T}=\alpha {\begin{pmatrix}0.5000\\0.5000\end{pmatrix}\cH00{pmatrix}0.6469\\0.3531\end{pmatrix}=\cHB{pmatrix}0.3235\\0.1765\end{pmatrix}}={\nd{pmatrix}0.6469\\0.3531\end{pmatrix}}

{\displaystyle(\mathbf {\mathbf {\mathbf _{1})^{{}T}=\alpha {\begin{pmatrix}0.8182\\0.1818\end{pmatrix}}\circ {\begin{pmatrix}0.5923\\0.4077\end{pmatrix}}=\alpha {\begin{pmatrix}0.4846\\0.0741\end{pmatrix}}={\begin{pmatrix}0.8673\\0.1327\end{pmatrix}}

{\displaystyle(\mathbf {\mathbf {\mathbf _{2} )^{{}T}=\alpha {\begin{pmatrix}0.8834\\0.1166\end{pmatrix}\circle {\begin{pmatrix}0.3763\\\0.6237\end{pmatrix}}\nd {\pmatrix}0.334\\0.0728\end{pmatrix}}={\pmatrix}0.8204\\0.1796\end{pmatrix}}}}}}}

{\displaystyle(\mathbf {\mathbf {\mathbf _{3}})^{T}=\alpha {\begin{pmatrix}0.1907\\0.8093\end{pmatrix}}\circ {\begin{pmatrix}0.6533\\0.3467\end{pmatrix}}=\alpha {\begin{pmatrix}0.1246\\0.2806\end{pmatrix}}={\begin{pmatrix}0.3075\\0.6925\end{pmatrix}}

(\mathbf {\mathbf {\mathbf _{4}})^{T}=\alpha {\begin{pmatrix}0.7308\\0.2692\end{pmatrix}}\circ {\begin{pmatrix}0.6273\\0.3727\end{pmatrix}}=\alpha {\begin{pmatrix}0.4584\\0.1003\end{pmatrix}}={\begin{pmatrix}0.8204\\0.1796\end{pmatrix}}

{\displaystyle(\mathbf {\mathbf {\mathbf _{5}})^{T}=\alpha {\begin{pmatrix}0.8673\\0.1327\end{pmatrix}\nd{pmatrix}\nd{pmatrix}1.0000\\nd{pmatrix}=\nd{pmatrix}0.8673\\0.1327\end{pmatrix}={\pmatrix}0.8673\\0.1327\end{pmatrix}}

Notice that the value of $\mathbf {\gamma _{0}}$ is equal to $\mathbf {{\hat {b}}_{0:5}}$ and that $\mathbf {\gamma _{5}}$ is equal to $\mathbf {{\hat {f}}_{0:5}}$ . This follows naturally becau $\mathbf {{\hat {f}}_{0:5}}$ f $\mathbf {{\hat {f}}_{0:5}}$ $\mathbf {{\hat {f}}_{0:5}}$ : $\mathbf {{\hat {f}}_{0:5}}$ ${\$ $0:5$ $}}$ $\mathbf {{\hat {b}}_{0:5}}$ $\mathbf {{\hat {b}}_{0:5}}$ $\mathbf {{\hat {b}}_{0:5}}$ 0 $\mathbf {{\hat {b}}_{0:5}}$ : $\mathbf {{\hat {b}}_{0:5}}$ ${\$ 는) 초기 및 최종 상태 벡터(iiiiiiiiiiii)에 걸쳐 균일한 이전 항목으로 시작하고 $\mathbf {{\hat {b}}_{0:5}}$ 모든 관측치를 고려한다.그러나 $\mathbf {\gamma _{0}}$ 0 ${\$ 은(는) 초기 상태 벡터가 균일한 이전(즉, 모든 항목이 동일)을 나타내는 b $\mathbf {{\hat {b}}_{0:5}}$ ^ $\mathbf {{\hat {b}}_{0:5}}$ : $\mathbf {{\hat {b}}_{0:5}}$ ${\$ 과(와)만 같을 $\mathbf {\gamma _{0}}$ (와 같음).그렇지 않은 경우 $\mathbf {{\hat {b}}_{0:5}}$ $\mathbf {{\hat {b}}_{0:5}}$ $\mathbf {{\hat {b}}_{0:5}}$ : 5 ${\$ 을(를) 초기 상태 벡터와 결합해야 $\mathbf {{\hat {b}}_{0:5}}$ 가장 가능성이 높은 초기 상태를 찾을 수 있다.따라서 우리는 그 자체로 가장 가능성이 높은 최종 상태를 계산하기에 충분하다는 것을 발견한다.마찬가지로, 후진 확률은 초기 상태 벡터와 결합하여 관측치가 주어진 가장 개연성이 높은 초기 상태를 제공할 수 있다.전방과 후방 확률은 초기 지점과 최종 지점 사이에서 가장 가능성이 높은 상태를 유추하기 위해 결합할 필요가 있다.

위의 계산은 세 번째 기상 상태를 제외하고 매일 가장 가능성이 높은 기상 상태가 '비'였음을 보여준다.그러나 그들은 이제 서로 다른 시기에 각 주의 확률을 계량화하는 방법을 제공하기 때문에 이것보다 더 많은 것을 우리에게 알려준다.아마도 가장 중요한 것은 $\mathbf {\gamma _{5}}$ $\mathbf {\gamma _{5}}$ ${\$ 에서 우리의 값은 관찰 순서가 끝날 때 상태 벡터에 대한 우리의 지식을 정량화한다 $\mathbf {\gamma _{5}}$ .그러면 우리는 우산을 관찰할 확률뿐만 아니라 내일의 다양한 기상 상태가 발생할 확률을 예측하는 데 이것을 사용할 수 있다.

퍼포먼스

전방-후방 알고리즘은 O(S $O(S^{2}T)$ $O(ST)$ ) $O(S^{2}T)$ ${\displaystyle O(S$ $^{2}T$ $)}$ 공간에서 시간 복잡성을 가지고 $O(S^{2}T)$ 실행되며 $O(ST)$ $O(ST)$ 서 T $O(ST)$ $displaystyle T}$ 은 $T$ 시간 시퀀스의 길이, S $S$ 은 $S$ $상태$ 알파벳의 기호 수입니다.^[1]알고리즘은 또한 각 단계에서 값을 다시 계산하여 $O(S^{2}T^{2})$ 시간 $O(S^{2}T^{2})$ O $O(S^{2}T^{2})$ $O(S^{2}T^{2})$ $O(S^{2}T^{2})$ $O(S^{2}T^{2})$ ) ${\displaystyle O(S^{2}T^{$ 2}}{ $2}}}$ 을(를) 가진 일정한 공간에서 실행할 수 있다.^[2]비교를 위해, Brute-force 절차는 가능한 모든 $S^{T}$ $S^{T}$ ${\$ 상태 $S^{T}$ 시퀀스를 생성하고 관찰된 일련의 사건들과 함께 각 상태 시퀀스의 결합 확률을 계산하며, 시간 $O(T\cdot S^{T})$ O $O(T\cdot S^{T})$ ( $O(T\cdot S^{T})$ $O(T\cdot S^{T})$ S $O(T\cdot S^{T})$ ) ${\displaystyle O(T\cdodot$ S $^{T})}$ 을 가질 것이다 $O(T\cdot S^{T})$ Brute 힘은 현실적으로 다루기 어렵다.가능한 숨겨진 노드 시퀀스의 수가 일반적으로 매우 높기 때문에 문제가 발생할 수 있다.

Island 알고리즘이라 불리는 일반 전방-후방 알고리즘의 개선은 O ( $O(S^{2}T\log T)$ $O(S^{2}T\log T)$ $O(S^{2}T\log T)$ T $O(S^{2}T\log T)$ ) ${\displaystyle$ O $(S^{2}T\log$ T $)}$ 시간과 $O(S^{2}T\log T)$ $O(S^{2}\log T)$ ( $O(S^{2}\log T)$ $O(S^{2}\log T)$ $O(S^{2}\log T)$ T $O(S^{2}\log T)$ ) ${\displaystyle O(S^{2}\log T)$ 메모리를 $O(S^{2}\log T)$ 더 긴 실행 시간 동안 더 작은 메모리 사용으로 교환한다.또한, 역전 과정이 존재하지 않거나 조건이 좋지 않을 수 있지만 프로세스 모델을 반전시켜 O( $O(S^{2}T)$ $O(S)$ ) ${\$ 디스플레이 $스타일 O(S)}$ 공간 $O(S)$ , O $O(S^{2}T)$ 2 T $O(S^{2}T)$ ${\디스플레이$ 스타일 $O(S^{2}T)}$ 시간 $O(S^{2}T)$ 알고리즘을 얻을 수 있다.^[3]

또한 FLS(Fixed-lag 스무딩) 알고리즘과 같은 온라인 스무딩을 $\mathbf {f_{0:t+1}}$ f $\mathbf {f_{0:t+1}}$ : $\mathbf {f_{0:t+1}}$ + $\mathbf {f_{0:t+1}}$ ${\$ 을(^[4]를) 효율적으로 $\mathbf {f_{0:t+1}}$ 계산하는 알고리즘이 개발되었다.

가성음

알고리즘 forward_backward가 입력됨: gamesState int sequenceIndex 출력: 결과: sequenceIndex가 시퀀스의 끝을 지나면 1을 반환하고, sequenceIndex(guessState, sequenceIndex)가 보기 전에 저장된 결과를 반환한 경우: 0을 반환함: n: 결과 := 결과 +(추정S로부터의 변환 확률)tate to n sequenceIndex) × Backward(n, sequenceIndex + 1) 반환 결과에 대한 저장 결과(guessState, sequenceIndex)

파이톤 예

Python 프로그래밍 언어로 표시된 HMM(Viterbi 알고리즘과 동일):

미국. = ('건강하다', '열') end_state = 'E'   관측 = ('정상', '추운', 'dizzy')   start_lights = {'건강하다': 0.6, '열': 0.4}   transition_properties = {    '건강하다' : {'건강하다': 0.69, '열': 0.3, 'E': 0.01},    '열' : {'건강하다': 0.4, '열': 0.59, 'E': 0.01},    }   excommission_production. = {    '건강하다' : {'정상': 0.5, '추운': 0.4, 'dizzy': 0.1},    '열' : {'정상': 0.1, '추운': 0.3, 'dizzy': 0.6},    }

우리는 다음과 같이 전진 알고리즘의 구현을 작성할 수 있다.

반항하다 fwd_bkww.(관측, 미국., start_lights, trans_message, emm_message, end_st):     """전진-후진 알고리즘."""     # 알고리즘의 전방 부분     fwd = []     을 위해 i, 관찰_i 에 열거하다(관측):         f_curr = {}         을 위해 세인트 에 미국.:             만일 i == 0:                 # 포워드 부분의 베이스 케이스                 prev_f_sum = start_lights[세인트]             다른:                 prev_f_sum = 합계를 내다(f_message[k] * trans_message[k][세인트] 을 위해 k 에 미국.)              f_curr[세인트] = emm_message[세인트][관찰_i] * prev_f_sum          fwd.덧셈을(f_curr)         f_message = f_curr      p_fwd = 합계를 내다(f_curr[k] * trans_message[k][end_st] 을 위해 k 에 미국.)      # 알고리즘의 역방향 부분     bkw = []     을 위해 i, 관찰_i_plus 에 열거하다(뒤바뀐(관측[1:] + (없음,))):         b_curr = {}         을 위해 세인트 에 미국.:             만일 i == 0:                 # 후진부 베이스 케이스                 b_curr[세인트] = trans_message[세인트][end_st]             다른:                 b_curr[세인트] = 합계를 내다(trans_message[세인트][l] * emm_message[l][관찰_i_plus] * b_beattle[l] 을 위해 l 에 미국.)          bkw.삽입하다(0,b_curr)         b_beattle = b_curr      p_bkw = 합계를 내다(start_lights[l] * emm_message[l][관측[0]] * b_curr[l] 을 위해 l 에 미국.)      # 두 부분 병합     후방의 = []     을 위해 i 에 범위(렌(관측)):         후방의.덧셈을({세인트: fwd[i][세인트] * bkw[i][세인트] / p_fwd 을 위해 세인트 에 미국.})      주장하다 p_fwd == p_bkw     돌아오다 fwd, bkw, 후방의

함수fwd_bkw다음과 같은 주장을 취한다.x관찰 순서(예:['normal', 'cold', 'dizzy'];states은닉 상태의 집합이다.a_0출발 확률이다.a전환 확률이다.e방출 확률이다.

코드의 단순화를 위해 관찰 순서가x무위무위무위무위.a[i][j], 그리고e[i][j]모든 주 i,j에 대해 정의된다.

실행 예제에서 전진-후진 알고리즘은 다음과 같이 사용된다.

반항하다 예시():     돌아오다 fwd_bkww.(관측,                    미국.,                    start_lights,                    transition_properties,                    excommission_production.,                    end_state)

>>>을 위해 선을 긋다 에 예시(): ...    인쇄하다(*선을 긋다) ... {'정상': 0.3, '정상': 0.04000000000000001} {'정상': 0.0892, '정상': 0.03408} {정상': 0.007518, '정상': 0.028120319999999997} {'건강': 0.001041839999999998, 'Fever': 0.00109578} {'건강': 0.00249, 'Fever': 0.00394} {'건강': 0.01, 'Fever': 0.01} {'Healthy': 0.8770110375573259, 'Fever': 0.1229889624426741} {'Healthy': 0.623228030950954, 'Fever': 0.3767719690490461} {'Healthy': 0.2109527048413057, 'Fever': 0.7890472951586943}

참고 항목

참조

^ 러셀 & 노르빅 2010 페이지579
^ 러셀 & 노르빅 2010 페이지575
^ Binder, John; Murphy, Kevin; Russell, Stuart (1997). "Space-efficient inference in dynamic probabilistic networks" (PDF). Int'l, Joint Conf. On Artificial Intelligence. Retrieved 8 July 2020.
^ 러셀 & 노르빅 2010 그림 15.6 페이지 580

Lawrence R. Rabiner, Hidden Markov Models and Selected Applications in Speech Incognition.IEEE의 절차, 77(2), 페이지 257–286, 1989년 2월. 10.1109/5.18626
Lawrence R. Rabiner, B. H. Juang (January 1986). "An introduction to hidden Markov models". IEEE ASSP Magazine: 4–15.
Eugene Charniak (1993). Statistical Language Learning. Cambridge, Massachusetts: MIT Press. ISBN 978-0-262-53141-2.
Stuart Russell and Peter Norvig (2010). Artificial Intelligence A Modern Approach 3rd Edition. Upper Saddle River, New Jersey: Pearson Education/Prentice-Hall. ISBN 978-0-13-604259-4.

외부 링크

전진 알고리즘을 교육하기 위한 대화형 스프레드시트(스프레드 시트 및 단계별 워크스루 기능이 있는 기사)
전진 알고리즘을 포함한 숨겨진 마르코프 모델 자습서
Java에서 구현된 AI 알고리즘 모음(MHM 및 전방-후방 알고리즘 포함)

[1] 러셀 & 노르빅 2010 페이지579

[2] 러셀 & 노르빅 2010 페이지575

[3] Binder, John; Murphy, Kevin; Russell, Stuart (1997). "Space-efficient inference in dynamic probabilistic networks" (PDF). Int'l, Joint Conf. On Artificial Intelligence. Retrieved 8 July 2020.

[4] 러셀 & 노르빅 2010 그림 15.6 페이지 580

[1]

[2]

[3]

[4]

Search