언어모델

언어 모델은 학습된 하나 또는 여러 언어의 텍스트 말뭉치를 기반으로 일련의 단어의 확률을 생성할 수 있는 자연어의^[1] 확률적 모델입니다.가장 진보된 형태인 대형 언어 모델은 피드포워드 신경망과 변압기의 조합입니다.그들은 이전에 단어 n-그램 언어 모델과 같은 순수 통계 모델을 대체했던 순환 신경망 기반 모델을 대체했습니다.

언어 모델은 음성 인식^[2](예: 말도 안 되는), 기계 번역,^[3] 자연어 생성(인간과 유사한 텍스트 생성), 광학 문자 인식, 필기 인식,^[4] 문법 유도,^[5] 정보 ^[6]^[7]검색 등 다양한 작업에 유용합니다.

순수통계모형

단어 n-gram 기반 모델

단어 n-그램 언어 모델은 순수하게 통계적인 언어 모델입니다.그것은 순환 신경망 기반 모델로 대체되었으며, 큰 언어 모델로 대체되었습니다.^[8] 시퀀스에서 다음 단어의 확률은 이전 단어의 고정된 크기 창에만 의존한다는 가정에 근거합니다.이전 단어 하나만 고려하면 빅램 모델, 두 단어이면 트라이그램 모델, n-1 단어이면 n그램 ^[9]모델이라고 합니다.문장의 시작과 $끝$ 을 $\langle s\rangle$ 특수 토큰 ⟨ ⟩ ⟨ ⟩ $\langle s\rangle$ ⟩ $\langle s\rangle$ \ \ l style display } ang $\langle /s\rangle$ . le } l \ s style $\langle /s\rangle$ display $\langle /s\rangle$ { \ ⟩ s special ang le / denote the rang \ were le intro of to tokens duced ⟨ sentence end s a { and start and ⟨ s

보이지 않는 단어에 0의 확률이 할당되는 것을 방지하기 위해 각 단어의 확률은 말뭉치의 빈도 수보다 약간 낮습니다.이를 계산하기 위해 단순한 "애드원" 스무딩(1개의 카운트를 보이지 않는 n그램에 비정보적인 이전으로 할당)부터 Good-과 같은 더 정교한 모델까지 다양한 방법이 사용되었습니다.튜링 할인 또는 백오프 모델.

지수

최대 엔트로피 언어 모델은 특징 함수를 사용하여 단어와 n-그램 히스토리 사이의 관계를 인코딩합니다.등식은

P(w_{m}\mid w_{1},\ldots,w_{m-1})={\frac {1}{Z(w_{1},\ldots,w_{m-1}}}\exp(a^{T}f(w_{1}),\ldots,w_{m})

$Z(w_{1},\ldots ,w_{m-1})$ 서 $Z(w_{1},\ldots ,w_{m-1})$ Z ( $Z(w_{1},\ldots ,w_{m-1})$ 1 $Z(w_{1},\ldots ,w_{m-1})$ $Z(w_{1},\ldots ,w_{m-1})$ w $Z(w_{1},\ldots ,w_{m-1})$ - $Z(w_{1},\ldots ,w_{m-1})$ ) $Z(w_{1},\ldots ,w_{m-1})$ {\ $displaystyle$ Z $(w_{1},\ldots,w_{m-1})}$ 는 $Z(w_{1},\ldots ,w_{m-1})$ 파티션 함수, ${\displaystyle$ a}는 $a$ 매개 변수 벡터, $f(w_{1},\ldots ,w_{m})$ ( $f(w_{1},\ldots ,w_{m})$ 1 $f(w_{1},\ldots ,w_{m})$ $f(w_{1},\ldots ,w_{m})$ $f(w_{1},\ldots ,w_{m})$ $f(w_{1},\ldots ,w_{m})$ {\ $displaystyle$ f $(w_{1},\ldots,w_{m})}$ 는 $f(w_{1},\ldots ,w_{m})$ 피쳐 함수입니다.가장 간단한 경우, 특징 함수는 특정 n-그램의 존재를 나타내는 지표일 뿐입니다. ${\displaystyle$ a $}$ 이나 $a$ (와) 다른 형태의 정규화에 $대한$ 사전 정보를 사용하는 것이 좋습니다.

로그-이선형 모델은 지수 언어 모델의 또 다른 예입니다.

스킵그램 모형

스킵그램 언어 모델은 이전(즉, 단어 n-그램 언어 모델)이 직면했던 데이터 희소성 문제를 극복하기 위한 시도입니다.임베딩 벡터에 표현된 단어는 더 이상 반드시 연속적인 것이 아니라 생략되는 ^[10]공백을 남길 수 있습니다.

공식적으로 k-skip-n-gram은 성분들이 $서로$ 최대 k 거리에서 발생하는 길이-n 수열입니다.

예를 들어, 입력 텍스트에서:

스페인의 비는 주로 평원에 내립니다.

1-208-2그램 세트는 모든 빅그램(2그램)과 그 외의 후속 프로그램을 포함합니다.

in, rain 스페인은 주로 가을에, 스페인은 주로, 그리고 평원에 떨어집니다.

스킵그램 모델에서 단어들 간의 의미적 관계는 구성성의 형태를 포착하는 선형 조합으로 표현됩니다.예를 들어, 어떤 그러한 모델에서, 만약 $v$ 가 단어 w를 그것의 n-d 벡터 표현에 매핑하는 $함수$ 라면,

v(\mathrm {king})-v(\mathrm {male})+v(\mathrm {female})\mathrm {queen}

여기서 θ는 우변이 좌변의 ^[11]^[12]값에 가장 가까운 이웃이어야 한다고 규정함으로써 정확하게 만들어집니다.

신경모형

순환신경망

단어의 연속 표현 또는 임베딩은 반복 신경망 기반 언어 모델(연속 공간 언어 ^[13]모델로도 알려져 있음)에서 생성됩니다.이러한 지속적인 공간 임베딩은 어휘의 크기에 따라 가능한 단어 시퀀스의 수가 기하급수적으로 증가하여 데이터 희소성 문제를 더욱 유발하는 결과인 차원성의 저주를 완화하는 데 도움이 됩니다.신경망은 ^[14]단어를 신경망에서 가중치의 비선형 조합으로 표현함으로써 이 문제를 방지합니다.

큰 언어 모델

큰 언어 모델(Large Language Model, LLM)은 큰 크기를 특징으로 하는 언어 모델입니다.그들의 크기는 ^[15]대부분 인터넷에서 긁어낸 방대한 양의 텍스트 데이터를 처리할 수 있는 AI 가속기에 의해 가능합니다.구축된 인공 신경망은 수천만에서 최대 수십억 개의 가중치를 포함할 수 있으며, 자가 지도 학습과 준 지도 학습을 사용하여 (사전에) 훈련됩니다.트랜스포머 아키텍처는 보다 빠른 ^[16]교육에 기여했습니다.대체 아키텍처로는 구글이 제안한 전문가(MoE)가 2017년 희박하게 ^[17]게이트된 것을 시작으로 2021년^[18] Gshard에서 2022년 ^[19]GlaM으로 혼합된 것이 있습니다.

언어 모델로서, 그들은 입력된 텍스트를 받아 다음 토큰이나 ^[20]단어를 반복적으로 예측함으로써 작동합니다.2020년까지 모델이 특정 작업을 수행할 수 있도록 적응할 수 있는 유일한 방법은 미세 조정뿐이었습니다.그러나 GPT-3과 같은 더 큰 크기의 모델은 유사한 ^[21]결과를 얻기 위해 신속하게 설계될 수 있습니다.그들은 인간 언어 말뭉치에 내재된 구문, 의미론, "온톨로지"에 대한 구체화된 지식을 습득하는 것으로 생각되지만 ^[22]말뭉치에 존재하는 부정확성과 편향성도 있습니다.

대표적인 예로 오픈(Open)을 들 수 있습니다.AI의 GPT 모델(예: GPT-3.5 및 GPT-4, ChatGPT에 사용됨), Google의 PaLM(Bard에 사용됨), Meta의 LlaMa 모델 및 BLOM, Ernie 3.0 Titan, Claude.

때때로 인간의 성과와 일치하기도 하지만, 그들이 그럴듯한 인지 모델인지는 확실하지 않습니다.적어도 반복적인 신경망의 경우 때때로 인간이 배우지 않는 패턴을 학습하지만 인간이 일반적으로 ^[23]학습하는 패턴은 학습하지 못하는 것으로 나타났습니다.

평가 및 벤치마크

언어 모델의 품질 평가는 대부분 전형적인 언어 중심 작업에서 생성된 인간이 만든 샘플 벤치마크와 비교하여 수행됩니다.다른 덜 확립된 품질 테스트는 언어 모델의 고유한 특성을 검사하거나 두 개의 그러한 모델을 비교합니다.언어 모델은 일반적으로 동적이고 그것이 보는 데이터로부터 학습하도록 의도되기 때문에, 일부 제안된 모델은 학습 곡선의 검사와 같은 학습 속도를 조사합니다.

언어 처리 ^[25]시스템을 평가하기 위해 다양한 데이터 세트가 개발되었습니다.여기에는 다음이 포함됩니다.

언어 수용성^[26] 말뭉치
GLE^[27] 벤치마크
마이크로소프트 리서치 패러프레이즈^[28] 코퍼스
다중장르 자연어 추론
질문 자연어 추론
Quora 질문^[29] 쌍
텍스트^[30] 수반 인식
시맨틱 텍스트 유사성 벤치마크
SQuAD 질문 응답^[31] 테스트
스탠퍼드 센티멘털 트리뱅크^[32]
위노그라드 NLI
BooolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, OpenBookQA, 자연스러운 질문, TriviaQA, RACE, MMLU(대규모 멀티태스킹 언어 이해), BIG-벤치 하드, GSM8k, RealToxicity 프롬프트, WinoGender, CrowS-Pairs.^[33] (LAMA 벤치마크)

참고 항목

참고문헌

^ Jurafsky, Dan; Martin, James H. (2021). "N-gram Language Models". Speech and Language Processing (3rd ed.). Archived from the original on 22 May 2022. Retrieved 24 May 2022.
^ 쿤, 롤랑, 그리고 레나토 데 모리 (1990)."음성 인식을 위한 캐시 기반 자연어 모델"패턴 분석 및 머신 인텔리전스에 관한 IEEE 트랜잭션 12.6: 570–583
^ Andreas, Jacob, Andreas Blachos, 그리고 Stephen Clark (2013)."기계 번역으로서의 의미론적 파싱" 2020년 8월 15일 웨이백 머신에서 보관.제51회 전산언어학협회 연차총회(제2권:단편논문)의 의사진행 상황
^ Pham, Vu, et al (2014)."드롭아웃은 필기 인식을 위해 반복 신경망을 개선합니다." 2020년 11월 11일 웨이백 머신에서 보관. 필기 인식 분야의 최전선에 관한 제14회 국제 회의.IEEE.
^ Hutt, Phu Mon, 조경현, Samuel R.Bowman (2018)."신경 언어 모델을 사용한 문법 유도: 특이한 복제" 2022년 8월 14일 웨이백 머신에서 보관.arXiv:1808.10000.
^ Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.
^ Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.
^ Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal; Janvin, Christian (1 March 2003). "A neural probabilistic language model". The Journal of Machine Learning Research. 3: 1137–1155 – via ACM Digital Library.
^ Jurafsky, Dan; Martin, James H. (7 January 2023). "N-gram Language Models". Speech and Language Processing (PDF) (3rd edition draft ed.). Retrieved 24 May 2022.
^ David Guthrie; et al. (2006). "A Closer Look at Skip-gram Modelling" (PDF). Archived from the original (PDF) on 17 May 2017. Retrieved 27 April 2014.
^ 인용 오류:명명된 참조mikolov호출되었지만 정의되지 않았습니다(도움말 페이지 참조).
^ 인용 오류:명명된 참조compositionality호출되었지만 정의되지 않았습니다(도움말 페이지 참조).
^ Karpathy, Andrej. "The Unreasonable Effectiveness of Recurrent Neural Networks". Archived from the original on 1 November 2020. Retrieved 27 January 2019.
^ Bengio, Yoshua (2008). "Neural net language models". Scholarpedia. Vol. 3. p. 3881. Bibcode:2008SchpJ...3.3881B. doi:10.4249/scholarpedia.3881. Archived from the original on 26 October 2020. Retrieved 28 August 2015.
^ "Better Language Models and Their Implications". OpenAI. 14 February 2019. Archived from the original on 19 December 2020. Retrieved 25 August 2019.
^ Merritt, Rick (25 March 2022). "What Is a Transformer Model?". NVIDIA Blog. Retrieved 25 July 2023.
^ Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff (1 January 2017). "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer". arXiv:1701.06538 [cs.LG].
^ Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng (12 January 2021). "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". arXiv:2006.16668 [cs.CL].
^ Dai, Andrew M; Du, Nan (9 December 2021). "More Efficient In-Context Learning with GLaM". ai.googleblog.com. Retrieved 9 March 2023.
^ Bowman, Samuel R. (2023). "Eight Things to Know about Large Language Models". arXiv:2304.00612 [cs.CL].
^ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (December 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners" (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877–1901.
^ Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870.
^ Hornstein, Norbert; Lasnik, Howard; Patel-Grosz, Pritty; Yang, Charles (9 January 2018). Syntactic Structures after 60 Years: The Impact of the Chomskyan Revolution in Linguistics. Walter de Gruyter GmbH & Co KG. ISBN 978-1-5015-0692-5. Archived from the original on 16 April 2023. Retrieved 11 December 2021.
^ Karlgren, Jussi; Schutze, Hinrich (2015), "Evaluating Learning Language Representations", International Conference of the Cross-Language Evaluation Forum, Lecture Notes in Computer Science, Springer International Publishing, pp. 254–260, doi:10.1007/978-3-319-64206-2_8, ISBN 9783319642055
^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (10 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805 [cs.CL].
^ "The Corpus of Linguistic Acceptability (CoLA)". nyu-mll.github.io. Archived from the original on 7 December 2020. Retrieved 25 February 2019.
^ "GLUE Benchmark". gluebenchmark.com. Archived from the original on 4 November 2020. Retrieved 25 February 2019.
^ "Microsoft Research Paraphrase Corpus". Microsoft Download Center. Archived from the original on 25 October 2020. Retrieved 25 February 2019.
^ Aghaebrahimian, Ahmad (2017), "Quora Question Answer Dataset", Text, Speech, and Dialogue, Lecture Notes in Computer Science, vol. 10415, Springer International Publishing, pp. 66–73, doi:10.1007/978-3-319-64206-2_8, ISBN 9783319642055
^ Sammons, V.G.Vinod Vydiswaran, Dan Roth, Mark; Vydiswaran, V.G.; Roth, Dan. "Recognizing Textual Entailment" (PDF). Archived from the original (PDF) on 9 August 2017. Retrieved 24 February 2019.{{cite web}}: CS1 유지 : 여러 이름 : 저자 목록 (링크)
^ "The Stanford Question Answering Dataset". rajpurkar.github.io. Archived from the original on 30 October 2020. Retrieved 25 February 2019.
^ "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank". nlp.stanford.edu. Archived from the original on 27 October 2020. Retrieved 25 February 2019.
^ Hendrycks, Dan (14 March 2023), Measuring Massive Multitask Language Understanding, archived from the original on 15 March 2023, retrieved 15 March 2023

추가열람

J M Ponte; W B Croft (1998). "A Language Modeling Approach to Information Retrieval". Research and Development in Information Retrieval. pp. 275–281. CiteSeerX 10.1.1.117.4237.
F Song; W B Croft (1999). "A General Language Model for Information Retrieval". Research and Development in Information Retrieval. pp. 279–280. CiteSeerX 10.1.1.21.6467.
Chen, Stanley; Joshua Goodman (1998). An Empirical Study of Smoothing Techniques for Language Modeling (Technical report). Harvard University. CiteSeerX 10.1.1.131.5458.

[1] Jurafsky, Dan; Martin, James H. (2021). "N-gram Language Models". Speech and Language Processing (3rd ed.). Archived from the original on 22 May 2022. Retrieved 24 May 2022.

[2] 쿤, 롤랑, 그리고 레나토 데 모리 (1990)."음성 인식을 위한 캐시 기반 자연어 모델"패턴 분석 및 머신 인텔리전스에 관한 IEEE 트랜잭션 12.6: 570–583

[Semantic_parsing_as_machine_translation-3] Andreas, Jacob, Andreas Blachos, 그리고 Stephen Clark (2013)."기계 번역으로서의 의미론적 파싱" 2020년 8월 15일 웨이백 머신에서 보관.제51회 전산언어학협회 연차총회(제2권:단편논문)의 의사진행 상황

[4] Pham, Vu, et al (2014)."드롭아웃은 필기 인식을 위해 반복 신경망을 개선합니다." 2020년 11월 11일 웨이백 머신에서 보관. 필기 인식 분야의 최전선에 관한 제14회 국제 회의.IEEE.

[5] Hutt, Phu Mon, 조경현, Samuel R.Bowman (2018)."신경 언어 모델을 사용한 문법 유도: 특이한 복제" 2022년 8월 14일 웨이백 머신에서 보관.arXiv:1808.10000.

[ponte1998-6] Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.

[hiemstra1998-7] Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.

[8] Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal; Janvin, Christian (1 March 2003). "A neural probabilistic language model". The Journal of Machine Learning Research. 3: 1137–1155 – via ACM Digital Library.

[Word_n-gram_language_model_jm-9] Jurafsky, Dan; Martin, James H. (7 January 2023). "N-gram Language Models". Speech and Language Processing (PDF) (3rd edition draft ed.). Retrieved 24 May 2022.

[10] David Guthrie; et al. (2006). "A Closer Look at Skip-gram Modelling" (PDF). Archived from the original (PDF) on 17 May 2017. Retrieved 27 April 2014.

[mikolov-11] 인용 오류:명명된 참조mikolov호출되었지만 정의되지 않았습니다(도움말 페이지 참조).

[compositionality-12] 인용 오류:명명된 참조compositionality호출되었지만 정의되지 않았습니다(도움말 페이지 참조).

[13] Karpathy, Andrej. "The Unreasonable Effectiveness of Recurrent Neural Networks". Archived from the original on 1 November 2020. Retrieved 27 January 2019.

[bengio-14] Bengio, Yoshua (2008). "Neural net language models". Scholarpedia. Vol. 3. p. 3881. Bibcode:2008SchpJ...3.3881B. doi:10.4249/scholarpedia.3881. Archived from the original on 26 October 2020. Retrieved 28 August 2015.

[Large_language_model_:7-15] "Better Language Models and Their Implications". OpenAI. 14 February 2019. Archived from the original on 19 December 2020. Retrieved 25 August 2019.

[16] Merritt, Rick (25 March 2022). "What Is a Transformer Model?". NVIDIA Blog. Retrieved 25 July 2023.

[Large_language_model_HGZCJ-17] Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff (1 January 2017). "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer". arXiv:1701.06538 [cs.LG].

[Large_language_model_R9Qq5-18] Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng (12 January 2021). "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". arXiv:2006.16668 [cs.CL].

[Large_language_model_glam-blog-19] Dai, Andrew M; Du, Nan (9 December 2021). "More Efficient In-Context Learning with GLaM". ai.googleblog.com. Retrieved 9 March 2023.

[Large_language_model_Bowman-20] Bowman, Samuel R. (2023). "Eight Things to Know about Large Language Models". arXiv:2304.00612 [cs.CL].

[Large_language_model_few-shot-learners-21] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (December 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners" (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877–1901.

[Large_language_model_Manning-2022-22] Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870.

[23] Hornstein, Norbert; Lasnik, Howard; Patel-Grosz, Pritty; Yang, Charles (9 January 2018). Syntactic Structures after 60 Years: The Impact of the Chomskyan Revolution in Linguistics. Walter de Gruyter GmbH & Co KG. ISBN 978-1-5015-0692-5. Archived from the original on 16 April 2023. Retrieved 11 December 2021.

[24] Karlgren, Jussi; Schutze, Hinrich (2015), "Evaluating Learning Language Representations", International Conference of the Cross-Language Evaluation Forum, Lecture Notes in Computer Science, Springer International Publishing, pp. 254–260, doi:10.1007/978-3-319-64206-2_8, ISBN 9783319642055

[:0-25] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (10 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805 [cs.CL].

[26] "The Corpus of Linguistic Acceptability (CoLA)". nyu-mll.github.io. Archived from the original on 7 December 2020. Retrieved 25 February 2019.

[27] "GLUE Benchmark". gluebenchmark.com. Archived from the original on 4 November 2020. Retrieved 25 February 2019.

[28] "Microsoft Research Paraphrase Corpus". Microsoft Download Center. Archived from the original on 25 October 2020. Retrieved 25 February 2019.

[29] Aghaebrahimian, Ahmad (2017), "Quora Question Answer Dataset", Text, Speech, and Dialogue, Lecture Notes in Computer Science, vol. 10415, Springer International Publishing, pp. 66–73, doi:10.1007/978-3-319-64206-2_8, ISBN 9783319642055

[30] Sammons, V.G.Vinod Vydiswaran, Dan Roth, Mark; Vydiswaran, V.G.; Roth, Dan. "Recognizing Textual Entailment" (PDF). Archived from the original (PDF) on 9 August 2017. Retrieved 24 February 2019.{{cite web}}: CS1 유지 : 여러 이름 : 저자 목록 (링크)

[31] "The Stanford Question Answering Dataset". rajpurkar.github.io. Archived from the original on 30 October 2020. Retrieved 25 February 2019.

[32] "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank". nlp.stanford.edu. Archived from the original on 27 October 2020. Retrieved 25 February 2019.

[33] Hendrycks, Dan (14 March 2023), Measuring Massive Multitask Language Understanding, archived from the original on 15 March 2023, retrieved 15 March 2023

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

Search