장면 텍스트

장면 텍스트는 야외 환경에서 카메라가 캡처한 이미지에 나타나는 텍스트다.

이미지는 코치 카테고리를 텍스트 형식으로 표시한다.우리는 그 코치가 슬리퍼 카테고리에 속한다는 것을 관찰할 수 있다.

카메라 캡처 이미지에서 장면 텍스트의 검출과 인식은 좋은 카메라를 가진 스마트폰이 유비쿼터스화되면서 중요해진 컴퓨터 비전 과제다.장면 영상의 텍스트는 모양, 글꼴, 색상 및 위치에 따라 다양하다.장면 텍스트의 인식은 때때로 균일하지 않은 조명과 초점에 의해 더욱 복잡해진다.

장면 텍스트 인식 개선을 위해 국제문서분석인식회의(ICDAR)가 2년에 한 번씩 활발한 독서대회를 진행한다.이 대회는 2003년, 2005년^[1]^[2]^[3], 그리고 모든 ICDAR 컨퍼런스에서 열렸다.^[4]^[5]^[6]국제 패턴인식협회(IAPR)는 데이터셋 목록을 Reading systems로 만들었다.^[7]

텍스트 탐지

텍스트 검출은 영상에 존재하는 텍스트를 검출한 후 직사각형 경계상자로 이를 둘러싸는 과정이다.텍스트 검출은 영상 기반 기법이나 주파수 기반 기법을 사용하여 수행할 수 있다.

영상 기반 기법에서는 영상이 여러 세그먼트로 분할된다.각 세그먼트는 유사한 특성을 가진 픽셀의 연결된 구성요소다.연결된 구성요소의 통계적 특징을 활용하여 이들을 분류하고 본문을 형성한다.지원 벡터 기계와 경련 신경 네트워크와 같은 머신러닝 접근법을 사용하여 구성요소를 텍스트와 비 텍스트로 분류한다.

주파수 기반 기법에서는 이산 푸리에 변환(DFT) 또는 이산 파월트 변환(DWT)을 사용하여 고주파수 계수를 추출한다.영상에 있는 텍스트는 고주파 성분을 가지고 있으며 고주파수 계수만 선택하는 것은 영상의 비 텍스트 영역에서 텍스트를 필터링하는 것으로 가정한다.

단어 인식

단어 인식에서 텍스트는 이미 탐지되어 있는 것으로 가정하고 텍스트가 들어 있는 직사각형 경계 상자를 사용할 수 있다.경계상자에 존재하는 단어를 인식할 필요가 있다.단어 인식을 수행하는 데 사용할 수 있는 방법은 크게 하향식 접근법과 상향식 접근법으로 분류할 수 있다.

하향식 접근법에서는 사전의 단어 집합을 사용하여 주어진 이미지에 적합한 단어를 식별한다.^[8]^[9]^[10]대부분의 방법에서는 영상이 분할되지 않는다.따라서 하향식 접근법을 분할 자유 인식이라고 부르기도 한다.

상향식 접근법에서는 영상이 여러 구성 요소로 분할되고 분할된 영상이 인식 엔진을 통해 전달된다.^[11]^[12]^[13]OCR(Optical 문자 인식) 엔진이나 맞춤 교육을 받은 엔진을 사용하여 텍스트를 인식한다.

참조

^ Lucas, S.M. (2005). "ICDAR 2005 text locating competition results". S. M. Lucas. Text Locating Competition Results. In Proc. 8th ICDAR, pages 80–85, 2005. pp. 80–84 Vol. 1. doi:10.1109/ICDAR.2005.231. ISBN 978-0-7695-2420-7.
^ ICDAR 2005 경기.http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2005_Robust_Reading_Competitions.
^ Lucas, Simon M.; Panaretos, Alex; Sosa, Luis; Tang, Anthony; Wong, Shirley; Young, Robert; Ashida, Kazuki; Nagai, Hiroki; Okamoto, Masayuki; Yamamoto, Hiroaki; Miyao, Hidetoshi; Zhu, Junmin; Ou, Wuwen; Wolf, Christian; Jolion, Jean-Michel; Todoran, Leon; Worring, Marcel; Lin, Xiaofan (2005). "S. M. Lucas. ICDAR 2003 Robust Reading Competitions: Entries, Results, and Future Directions. IJDAR, 7(2):105–122, June 2005". International Journal of Document Analysis and Recognition (Ijdar). 7 (2–3): 105–122. CiteSeerX 10.1.1.104.1667. doi:10.1007/s10032-004-0134-3.
^ ICDAR 2013.http://www.icdar2013.org.
^ ICDAR 2017.http://u-pat.org/ICDAR2017/
^ ICDAR 2011 강력한 독서 대회.http://www.cvc.uab.es/icdar2011competition/.
^ IAPR TC11 Reading Systems-Datasets List.http://www.iapr-tc11.org/mediawiki/index.php?title=Datasets.
^ Weinman, J.J.; Learned-Miller, E.; Hanson, A.R. (2009). "J. J. Weinmann, E. Learned-Miller, and A. R. Hanson. Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. PAMI, 31(10):1733–1746, 2009". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (10): 1733–1746. doi:10.1109/TPAMI.2009.38. PMC 3021989. PMID 19696446.
^ "A. Mishra, K. Alahari, and C. V. Jawahar. Scene Text Recognition using Higher Order Language Priors. In Proc. BMVC, 2012" (PDF).
^ Novikova, Tatiana; Barinova, Olga; Kohli, Pushmeet; Lempitsky, Victor (2012). "Large-Lexicon Attribute-Consistent Text Recognition in Natural Images". Computer Vision – ECCV 2012. Lecture Notes in Computer Science. Vol. 7577. pp. 752–765. CiteSeerX 10.1.1.296.4807. doi:10.1007/978-3-642-33783-3_54. ISBN 978-3-642-33782-6.
^ Kumar, Deepak; Ramakrishnan, A. G. (2012). "Power-law transformation for enhanced recognition of born-digital word images". D. Kumar and A. G. Ramakrishnan. Power-law transformation for enhanced recognition of born-digital word images. In Proc. 9th SPCOM, 2012. pp. 1–5. doi:10.1109/SPCOM.2012.6290009. ISBN 978-1-4673-2014-6.
^ D. Kumar, M. N. Anil Prasad, and A. G. Ramakrishnan. "MAPS: Midline analysis and propagation of segmentation". Proc. 8th ICVGIP, 2012. doi:10.1145/2425333.2425348.{{cite book}}: CS1 maint: 작성자 매개변수 사용(링크)
^ "D. Kumar, M. N. Anil Prasad, and A. G. Ramakrishnan. NESP: Nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In Proc. 20th DRR, 2013" (PDF). 2013.
^ 애비 파인 리더.http://www.abbyy.com/
^ 뉘앙스 옴니파지 리더.http://www.nuance.com/
^ 테세락트 OCR 엔진.http://code.google.com/p/tesseract-ocr/

[1] Lucas, S.M. (2005). "ICDAR 2005 text locating competition results". S. M. Lucas. Text Locating Competition Results. In Proc. 8th ICDAR, pages 80–85, 2005. pp. 80–84 Vol. 1. doi:10.1109/ICDAR.2005.231. ISBN 978-0-7695-2420-7.

[2] ICDAR 2005 경기.http://www.iapr-tc11.org/mediawiki/index.php/ICDAR_2005_Robust_Reading_Competitions.

[3] Lucas, Simon M.; Panaretos, Alex; Sosa, Luis; Tang, Anthony; Wong, Shirley; Young, Robert; Ashida, Kazuki; Nagai, Hiroki; Okamoto, Masayuki; Yamamoto, Hiroaki; Miyao, Hidetoshi; Zhu, Junmin; Ou, Wuwen; Wolf, Christian; Jolion, Jean-Michel; Todoran, Leon; Worring, Marcel; Lin, Xiaofan (2005). "S. M. Lucas. ICDAR 2003 Robust Reading Competitions: Entries, Results, and Future Directions. IJDAR, 7(2):105–122, June 2005". International Journal of Document Analysis and Recognition (Ijdar). 7 (2–3): 105–122. CiteSeerX 10.1.1.104.1667. doi:10.1007/s10032-004-0134-3.

[4] ICDAR 2013.http://www.icdar2013.org.

[5] ICDAR 2017.http://u-pat.org/ICDAR2017/

[6] ICDAR 2011 강력한 독서 대회.http://www.cvc.uab.es/icdar2011competition/.

[7] IAPR TC11 Reading Systems-Datasets List.http://www.iapr-tc11.org/mediawiki/index.php?title=Datasets.

[8] Weinman, J.J.; Learned-Miller, E.; Hanson, A.R. (2009). "J. J. Weinmann, E. Learned-Miller, and A. R. Hanson. Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. PAMI, 31(10):1733–1746, 2009". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (10): 1733–1746. doi:10.1109/TPAMI.2009.38. PMC 3021989. PMID 19696446.

[9] "A. Mishra, K. Alahari, and C. V. Jawahar. Scene Text Recognition using Higher Order Language Priors. In Proc. BMVC, 2012" (PDF).

[10] Novikova, Tatiana; Barinova, Olga; Kohli, Pushmeet; Lempitsky, Victor (2012). "Large-Lexicon Attribute-Consistent Text Recognition in Natural Images". Computer Vision – ECCV 2012. Lecture Notes in Computer Science. Vol. 7577. pp. 752–765. CiteSeerX 10.1.1.296.4807. doi:10.1007/978-3-642-33783-3_54. ISBN 978-3-642-33782-6.

[11] Kumar, Deepak; Ramakrishnan, A. G. (2012). "Power-law transformation for enhanced recognition of born-digital word images". D. Kumar and A. G. Ramakrishnan. Power-law transformation for enhanced recognition of born-digital word images. In Proc. 9th SPCOM, 2012. pp. 1–5. doi:10.1109/SPCOM.2012.6290009. ISBN 978-1-4673-2014-6.

[12] D. Kumar, M. N. Anil Prasad, and A. G. Ramakrishnan. "MAPS: Midline analysis and propagation of segmentation". Proc. 8th ICVGIP, 2012. doi:10.1145/2425333.2425348.{{cite book}}: CS1 maint: 작성자 매개변수 사용(링크)

[13] "D. Kumar, M. N. Anil Prasad, and A. G. Ramakrishnan. NESP: Nonlinear enhancement and selection of plane for optimal segmentation and recognition of scene word images. In Proc. 20th DRR, 2013" (PDF). 2013.

[14] 애비 파인 리더.http://www.abbyy.com/

[15] 뉘앙스 옴니파지 리더.http://www.nuance.com/

[16] 테세락트 OCR 엔진.http://code.google.com/p/tesseract-ocr/

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Search

장면 텍스트

네임스페이스

더

텍스트 탐지

단어 인식

참조