가치함수

최적화 문제의 값 함수는 문제의 매개변수에 의존할 뿐, 해결책에서 객관적 함수에 의해 얻어진 값을 제공한다.^[1]^[2]제어된 동역학 시스템에서 값 함수는 간격에 걸쳐 시스템의 최적 보상을 나타낸다.[t, t₁]그때 시작했을 때t 국가 변수 x(t)=x.^[3]목표함수가 최소화해야 할 일부 비용을 나타낸다면, 가치함수는 최적 프로그램을 끝내기 위한 비용으로 해석될 수 있으며, 따라서 "비용-투-go 함수"^[4]^[5]라고 한다.객관적 함수가 보통 효용을 나타내는 경제적 맥락에서 가치 함수는 개념적으로 간접 효용 함수와 동일하다.^[6]^[7]

최적 제어의 문제에서 값 함수는 허용 가능한 조정기 집합을 차지한 목표 함수의 우월성으로 정의된다. $(t 0$ , $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ ) $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ ) [ 0 , $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ t $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ R $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ ${\$ $(t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}$ (t_{ $0},x_{0}})\in$ [0,t_{1}]\ $time \mathb {R}$ ^{ $d$ 에서 주어진 대표적인 최적 제어 문제는 다음과 같다.

{\dapplaystyle {\textimize}\quad J(t_{0},x_{0};u)=\int_{t_{0}^{1}I(t,x(t),u(t)\,\mathrm {d} t+\phi(x(t_{1})}}}}

의 대상이 되다

{\frac {d}x(t)}{\mathrm {d}t} t}=f(t,x(t),u(t)}

초기 상태 $x(t_{0})=x_{0}$ x $x(t_{0})=x_{0}$ ( t $x(t_{0})=x_{0}$ ) $x(t_{0})=x_{0}$ = $x(t_{0})=x_{0}$ $x(t_{0})=x_{0}$ ${\$ ^[8]The objective function $J(t_{0},x_{0};u)$ is to be maximized over all admissible controls $u\in U[t_{0},t_{1}]$ , where $u$ is a Lebesgue measurable function from $[t_{0},t_{1}]$ to $\mathbb {R} ^{m}$ ${\$ 에 지정된 임의 집합 $\mathbb {R} ^{m}$ 그런 다음 값 함수를 다음과 같이 정의한다.

$V(t,x(t)=\max _{u\in U}\int_{t}^{1}I(\tau,x(\tau )\,\mathrm {d} \tau +\phi(x(t_{1})})$

$V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ( t $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ , $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ( $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ) = $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ϕ ( $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ x ( $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ 1 $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ) ) $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ = $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ( $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ( $V(t_{1},x(t_{1}))=\phi (x(t_{1}))$ ) ) ${\displaystyle V(t_{1}},x(t_{1})=\phi(x(t_{1$ 여기서 $\phi (x(t_{1}))$ ( $\phi (x(t_{1}))$ 1 ) $\phi (x(t_{1}))$ {\ $displaystystyle \phi(x_{$ 1})})는 스크랩 값이다 $\phi (x(t_{1}))$ .최적의 제어 및 상태 궤적 쌍이 ( $(x^{\ast },u^{\ast })$ x $(x^{\ast },u^{\ast })$ , $(x^{\ast },u^{\ast })$ $(x^{\ast },u^{\ast })$ ) ${\displaystyle (x^{\ast$ $(x^{\ast },u^{\ast })$ $V$ ( t $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ , x 0 ) $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ = $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ ( $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ , x $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ ) $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ {\ $displaystystyle$ V $(t_{0},x_{0}=}}).$ $J(t_{0},x_{0};u^{\ast$ $V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })$ 현재 상태 $x$ $x$ 을 $u^{\ast }$ (를) 기준으로 최적의 $제어$ $u^{\ast }$ ∗{\ $displaystyle$ u $^{\ast }}$ 을(를) 제공하는 $h$ $함수$ h $h$ 을 $x$ (를) 피드백 제어 정책 ^[4]또는 단순히 정책 기능이라고 한다.^[9]

Bellman의 최적성 원칙은 대략 $현재$ 상태 $x(t)$ ( $x(t)$ ){\ $displaystyle$ t $},$ $t_{0}\leq t\leq t_{1}$ $t_{0}\leq t\leq t_{1}$ $t_{0}\leq t\leq t_{1}$ $t_{0}\leq t\leq t_{1}$ ${$ 0}\ $displaystyle$ $t_{$ 0}\ $leq t_$ ${1$ }}{1 $}}$ 시간의 $t_{0}\leq t\leq t_{1}$ 최적 정책이 "새로운" 초기 조건에 대해 $x(t)$ 최적이 되어야 한다고 명시하고 있다.만약 값 함수가 지속적으로 다를 수 있다면,[10] 이것은 해밀턴-자코비-벨만 방정식이라고 알려진 중요한 부분 미분 방정식을 발생시킨다.

-{\frac {\partial v(t,x)}{\partial t}=\max_{{i(t,x,u)\{\partial V(t,x)}{\partial x}f(t,x,u)\right\}}}}}

여기서 오른쪽의 maximand는 해밀턴어, $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ ) = $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ ) + $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ f $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ $H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda f(t,x,u)$ $){\displaystyle H\left(t,x,u,\lambda \right)=$ 로 다시 쓰일 수 있다. $I(t,x,u)+\lambda f(t,x,u)}$ ,

-{\frac {\partial V(t,x)}{\partial t}=\max_{u}H(t,x,u,\lambda )

비용 변수 역할을 하는 $\partial V(t,x)/\partial x=\lambda (t)$ with $\partial V(t,x)/\partial x=\lambda (t)$ $\partial V(t,x)/\partial x=\lambda (t)$ ( $\partial V(t,x)/\partial x=\lambda (t)$ t $\partial V(t,x)/\partial x=\lambda (t)$ , $\partial V(t,x)/\partial x=\lambda (t)$ ) $\partial V(t,x)/\partial x=\lambda (t)$ / $\partial V(t,x)/\partial x=\lambda (t)$ x $\partial V(t,x)/\partial x=\lambda (t)$ = $\partial V(t,x)/\partial x=\lambda (t)$ ( t $\partial V(t,x)/\partial x=\lambda (t)$ ) ${\displaystyle \partial V(t,x)/\partial$ x $=\lambda($ Given this definition, we further have $\mathrm {d} \lambda (t)/\mathrm {d} t=\partial ^{2}V(t,x)/\partial x\partial t+\partial ^{2}V(t,x)/\partial x^{2}\cdot f(x)$ , and after differentiating both sides of the HJB equation wi $x$ $x$ 에 대해 $x$

-{\frac {\partial ^{2}V(t,x)}{\partial t\partial x}}={\frac {\partial I}{\partial x}}+{\frac {\partial ^{2}V(t,x)}{\partial x^{2}}}f(x)+{\frac {\partial V(t,x)}{\partial x}}{\frac {\partial f(x)}{\partial x}}

적절한 항을 교체한 후 비용 계산 방정식을 복구한다.

{\daptyle -{\dot {\lambda }}}={\frac {\partial i}{\partial x}+\lambda (t){\partial f(x)}}{\partial x}={\frac {\partial H}{\partial x}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}.

여기서 $t$ 은 시간에 대한 파생 모델에 대한 뉴턴 표기법이다 ${\dot {\lambda }}(t)$ .^[12]

값 함수는 해밀턴-자코비-벨만 방정식에 대한 고유한 점도 솔루션이다.^[13]온라인 폐쇄 루프에서 값 함수는 또한 폐쇄 루프 시스템의 전지구적 점증적 안정성을 설정하는 Lyapunov 함수다.^[14]

참조

^ Fleming, Wendell H.; Rishel, Raymond W. (1975). Deterministic and Stochastic Optimal Control. New York: Springer. pp. 81–83. ISBN 0-387-90155-8.
^ Caputo, Michael R. (2005). Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications. New York: Cambridge University Press. p. 185. ISBN 0-521-60368-4.
^ Weber, Thomas A. (2011). Optimal Control Theory : with Applications in Economics. Cambridge: The MIT Press. p. 82. ISBN 978-0-262-01573-8.
^ ^a ^b Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-Dynamic Programming. Belmont: Athena Scientific. p. 2. ISBN 1-886529-10-8.
^ "EE365: Dynamic Programming" (PDF).
^ Mas-Colell, Andreu; Whinston, Michael D.; Green, Jerry R. (1995). Microeconomic Theory. New York: Oxford University Press. p. 964. ISBN 0-19-507340-1.
^ Corbae, Dean; Stinchcombe, Maxwell B.; Zeman, Juraj (2009). An Introduction to Mathematical Analysis for Economic Theory and Econometrics. Princeton University Press. p. 145. ISBN 978-0-691-11867-3.
^ Kamien, Morton I.; Schwartz, Nancy L. (1991). Dynamic Optimization : The Calculus of Variations and Optimal Control in Economics and Management (2nd ed.). Amsterdam: North-Holland. p. 259. ISBN 0-444-01609-0.
^ Ljungqvist, Lars; Sargent, Thomas J. (2018). Recursive Macroeconomic Theory (Fourth ed.). Cambridge: MIT Press. p. 106. ISBN 978-0-262-03866-9.
^ Benveniste and Scheinkman established sufficient conditions for the differentiability of the value function, which in turn allows the application of the envelope theorem, see Benveniste, L. M.; Scheinkman, J. A. (1979). "On the Differentiability of the Value Function in Dynamic Models of Economics". Econometrica. 47 (3): 727–732. doi:10.2307/1910417. JSTOR 1910417. Also see Seierstad, Atle (1982). "Differentiability Properties of the Optimal Value Function in Control Theory". Journal of Economic Dynamics and Control. 4: 303–310. doi:10.1016/0165-1889(82)90019-7.
^ Kirk, Donald E. (1970). Optimal Control Theory. Englewood Cliffs, NJ: Prentice-Hall. p. 88. ISBN 0-13-638098-0.
^ Zhou, X. Y. (1990). "Maximum Principle, Dynamic Programming, and their Connection in Deterministic Control". Journal of Optimization Theory and Applications. 65 (2): 363–373. doi:10.1007/BF01102352. S2CID 122333807.
^ 정리 10.1 in
^ Kamalapurkar, Rushikesh; Walters, Patrick; Rosenfeld, Joel; Dixon, Warren (2018). "Optimal Control and Lyapunov Stability". Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Berlin: Springer. pp. 26–27. ISBN 978-3-319-78383-3.

추가 읽기

Caputo, Michael R. (2005). "Necessary and Sufficient Conditions for Isoperimetric Problems". Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications. New York: Cambridge University Press. pp. 174–210. ISBN 0-521-60368-4.
Clarke, Frank H.; Loewen, Philip D. (1986). "The Value Function in Optimal Control: Sensitivity, Controllability, and Time-Optimality". SIAM Journal on Control and Optimization. 24 (2): 243–263. doi:10.1137/0324014.
LaFrance, Jeffrey T.; Barney, L. Dwayne (1991). "The Envelope Theorem in Dynamic Optimization" (PDF). Journal of Economic Dynamics and Control. 15 (2): 355–385. doi:10.1016/0165-1889(91)90018-V.
Stengel, Robert F. (1994). "Conditions for Optimality". Optimal Control and Estimation. New York: Dover. pp. 201–222. ISBN 0-486-68200-5.

[1] Fleming, Wendell H.; Rishel, Raymond W. (1975). Deterministic and Stochastic Optimal Control. New York: Springer. pp. 81–83. ISBN 0-387-90155-8.

[2] Caputo, Michael R. (2005). Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications. New York: Cambridge University Press. p. 185. ISBN 0-521-60368-4.

[3] Weber, Thomas A. (2011). Optimal Control Theory : with Applications in Economics. Cambridge: The MIT Press. p. 82. ISBN 978-0-262-01573-8.

[Bertsekas_Tsitsiklis-4] Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). Neuro-Dynamic Programming. Belmont: Athena Scientific. p. 2. ISBN 1-886529-10-8.

[5] "EE365: Dynamic Programming" (PDF).

[6] Mas-Colell, Andreu; Whinston, Michael D.; Green, Jerry R. (1995). Microeconomic Theory. New York: Oxford University Press. p. 964. ISBN 0-19-507340-1.

[7] Corbae, Dean; Stinchcombe, Maxwell B.; Zeman, Juraj (2009). An Introduction to Mathematical Analysis for Economic Theory and Econometrics. Princeton University Press. p. 145. ISBN 978-0-691-11867-3.

[8] Kamien, Morton I.; Schwartz, Nancy L. (1991). Dynamic Optimization : The Calculus of Variations and Optimal Control in Economics and Management (2nd ed.). Amsterdam: North-Holland. p. 259. ISBN 0-444-01609-0.

[9] Ljungqvist, Lars; Sargent, Thomas J. (2018). Recursive Macroeconomic Theory (Fourth ed.). Cambridge: MIT Press. p. 106. ISBN 978-0-262-03866-9.

[10] Benveniste and Scheinkman established sufficient conditions for the differentiability of the value function, which in turn allows the application of the envelope theorem, see Benveniste, L. M.; Scheinkman, J. A. (1979). "On the Differentiability of the Value Function in Dynamic Models of Economics". Econometrica. 47 (3): 727–732. doi:10.2307/1910417. JSTOR 1910417. Also see Seierstad, Atle (1982). "Differentiability Properties of the Optimal Value Function in Control Theory". Journal of Economic Dynamics and Control. 4: 303–310. doi:10.1016/0165-1889(82)90019-7.

[11] Kirk, Donald E. (1970). Optimal Control Theory. Englewood Cliffs, NJ: Prentice-Hall. p. 88. ISBN 0-13-638098-0.

[12] Zhou, X. Y. (1990). "Maximum Principle, Dynamic Programming, and their Connection in Deterministic Control". Journal of Optimization Theory and Applications. 65 (2): 363–373. doi:10.1007/BF01102352. S2CID 122333807.

[13] 정리 10.1 in

[14] Kamalapurkar, Rushikesh; Walters, Patrick; Rosenfeld, Joel; Dixon, Warren (2018). "Optimal Control and Lyapunov Stability". Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach. Berlin: Springer. pp. 26–27. ISBN 978-3-319-78383-3.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[11]

[12]

[13]

[14]

Search

가치함수

네임스페이스

더

참조

추가 읽기