컨텍스트 내 학습(자연어 처리)

자연어 처리에서 문맥 학습, 퓨샷 학습 또는 퓨샷 프롬프트는 모델이 ^[1]^[2]작업을 시도하기 전에 예제를 처리할 수 있도록 하는 프롬프트 기술입니다.이 방법은 GPT-3의^[3] 등장 이후 대중화되었으며 대규모 언어 ^[4]모델의 새로운 특성으로 간주됩니다.

퓨샷 프롬프트에는 일반적으로 "shot"이라고 하는 (문제, 솔루션) 쌍의 예가 포함되며, 이러한 프롬프트의 전체적인 사용은 ^[5]^[6]n-shot 프롬프트라고 합니다.예를 들어, 다음은 검토 감정 분류를 위한 원샷 프롬프트입니다.Review: This movie sucks. Sentiment: negative. Review: I love this movie. Sentiment:모델이 "양"을 출력하면 ^[4]작업이 올바르게 해결된 것입니다.

제로샷 프롬프트라는 용어는 예제가 ^[7]^[8]^[9]제공되지 않음을 나타내기 위해 종종 사용됩니다.질문 답변 과제에 대한 제로샷 프롬프트의 예로는 "누가 종의 기원에 관한 책을 썼는가?"가 있습니다.

인콘텍스트 학습은 처음에 ^[3]작업별 데이터 세트에서 사전 훈련된 언어 모델을 미세 조정하는 대안으로 제안되었습니다.실제로 모델에서 매개 변수가 변경되지 않아 실제 학습이 이루어지지 않기 때문에 이 용어는 오해의 소지가 있습니다.대신 프롬프트는 후속 추론을 위해 모델을 프라이밍합니다.미세 조정에 비해 맥락 내 학습의 주요 장점은 작업별 데이터 양이 감소하고 대규모이지만 좁은 미세 ^[3]조정 데이터 세트에서 지나치게 좁은 분포를 학습하여 과적합 가능성이 감소한다는 것입니다.대형 언어 모델의 퓨샷 성능은 NLP 작업에서 경쟁적인 결과를 달성하는 것으로 나타났으며, 때로는 이전의 최첨단 미세 조정 ^[3]^[10]접근 방식을 능가합니다.이러한 NLP 작업의 예로는 번역, 질문 답변, 클로즈 작업, 스크램블 해제, 문장에서 새로운 단어 사용 등이 있습니다.이러한 퓨샷 프롬프트의 생성 및 최적화는 현재 활발한 프롬프트 엔지니어링 연구 ^[11]^[12]분야의 일부입니다.

퓨샷 프롬프트는 미세 조정된 모델과 비교했을 때 경쟁적으로 수행되었지만 자체적인 단점이 있습니다.예를 들어, 샷이 나열되는 순서가 최첨단 성능과 무작위 추측 성능 사이에 차이를 만들 수 있는 것으로 나타났습니다.한 모델에서 특정 순서대로 잘 작동하는 퓨샷 예제 집합은 다른 ^[13]모델과 함께 사용할 경우 전혀 작동하지 않을 수 있습니다.이러한 단점에도 불구하고, 일반적으로 사용되는 트랜스포머 모델은 가중치 내에서 기울기 강하를 기반으로 하는 원칙적인 학습 알고리듬을 인코딩할 수 있으며,^[15]^[16]^[17]^[18] 예측을 할 때 컨텍스트 내에서 주어진 데이터를 기반으로 학습하는 작은 모델과 같은 메사^[14] 최적화를 가능하게 합니다.

맥락 내 학습의 일반적인 예는 ^[19]질문에 대답하기 전에 일련의 추론을 출력하도록 모델을 가르치는 퓨샷 예제가 제공되는 사고 체인 프롬프트입니다.이 기술은 논리적 사고와 ^[20]추론이 필요한 작업에서 모델의 성능을 향상시키는 것으로 나타났습니다.

참고 항목

레퍼런스

^ Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222. S2CID 235652287.
^ Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].
^ ^a ^b ^c ^d Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
^ ^a ^b Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
^ Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6. S2CID 248779924.
^ Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].
^ Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch.
^ Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].
^ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners" (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877–1901.
^ Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185. S2CID 221703107.
^ Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works". Business Insider. Retrieved 14 March 2023.
^ Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required". Washington Post. Retrieved 14 March 2023.
^ Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556. S2CID 233296494.
^ "Mesa-Optimization". Retrieved 17 May 2023.
^ Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].
^ Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
^ Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].
^ Musser, George. "How AI Knows Things No One Told It". Scientific American. Retrieved 17 May 2023.
^ Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com. Retrieved 10 March 2023.
^ Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].

[1] Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222. S2CID 235652287.

[2] Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].

[gpt3-3] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].

[emerge-4] Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].

[5] Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6. S2CID 248779924.

[6] Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].

[Wiggers-7] Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch.

[8] Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].

[few-shot-learners-9] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners" (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877–1901.

[10] Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185. S2CID 221703107.

[11] Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works". Business Insider. Retrieved 14 March 2023.

[12] Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required". Washington Post. Retrieved 14 March 2023.

[13] Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556. S2CID 233296494.

[14] "Mesa-Optimization". Retrieved 17 May 2023.

[15] Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].

[16] Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].

[17] Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].

[18] Musser, George. "How AI Knows Things No One Told It". Scientific American. Retrieved 17 May 2023.

[19] Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com. Retrieved 10 March 2023.

[weipaper-20] Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[15]

[16]

[17]

[18]

[14]

[19]

[20]

Search

컨텍스트 내 학습(자연어 처리)

네임스페이스

더

참고 항목

레퍼런스