Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Summary

This paper introduces a new prompting strategy called Plan-and-Solve (PS) prompting to improve the performance of large language models (LLMs) in multi-step reasoning tasks. The authors propose two components of PS prompting: devising a plan to divide the task into smaller subtasks, and carrying out the subtasks according to the plan. They also extend PS prompting with more detailed instructions to address calculation errors and improve the quality of generated reasoning steps, resulting in PS+ prompting.

The proposed prompting strategies are evaluated on ten datasets across three reasoning problems: arithmetic reasoning, commonsense reasoning, and symbolic reasoning. The experimental results show that zero-shot PS prompting consistently outperforms Zero-shot-CoT prompting across all datasets, is comparable to or exceeds Zero-shot-Program-of-Thought (PoT) prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem.

Key Takeaways

Introduction

  • Large language models (LLMs) have proven effective in various NLP tasks.
  • Fine-tuning LLMs for downstream tasks is challenging due to limited access to model parameters.
  • Zero-shot-CoT prompting has been successful in solving multi-step reasoning tasks but suffers from calculation errors, missing-step errors, and semantic misunderstanding errors.

Plan-and-Solve Prompting

  • Plan-and-Solve (PS) prompting consists of two components: devising a plan to divide the task into smaller subtasks, and carrying out the subtasks according to the plan.
  • PS prompting addresses missing-step errors by explicitly generating reasoning steps.
  • PS+ prompting extends PS prompting with more detailed instructions to improve the quality of generated reasoning steps.
  • PS+ prompting can be customized to solve a variety of problems other than math reasoning.

Experimental Results

  • The proposed prompting strategies are evaluated on ten benchmark datasets.
  • Zero-shot PS prompting consistently outperforms Zero-shot-CoT prompting across all reasoning problems and datasets.
  • Zero-shot PS prompting is comparable to or exceeds Zero-shot-PoT prompting.
  • PS+ prompting has a performance similar to an 8-shot CoT prompting in arithmetic reasoning.

Methods

Plan-and-Solve Prompting

  • Step 1: Prompting for Reasoning Generation
    • Construct templates to elicit LLMs to determine subtasks and accomplish them.
    • Use a prompt with a simple template "Q: [X]. A: [T]" where [X] contains the input problem statement and [T] is a hand-crafted instruction to trigger LLMs to generate a reasoning process.
    • Replace "Let's think step by step" with "Let's first understand the problem and devise a plan to solve the problem. Then, let's carry out the plan and solve the problem step by step."
    • Add more detailed instructions to the trigger sentence, such as "pay attention to calculation", "extract relevant variables and their corresponding numerals", and "calculate intermediate results".
  • Step 2: Prompting for Answer Extraction
    • Devise another prompt to extract the final numerical answer from the reasoning text generated in Step 1.

Experimental Setup

  • Evaluate the proposed prompting strategies on ten benchmark datasets from three categories of reasoning problems: arithmetic reasoning, commonsense reasoning, and symbolic reasoning.
  • Compare the performance of zero-shot PS and PS+ prompting with three types of prompting baselines: zero-shot-CoT, zero-shot-PoT, and few-shot with manual or automatic demonstrations.
  • Use GPT-3 (175B) as the backbone language model and set the temperature to 0 for greedy decoding.

Conclusion

  • Zero-shot PS prompting outperforms Zero-shot-CoT prompting and is comparable to or exceeds Zero-shot-PoT prompting.
  • PS+ prompting has a performance similar to an 8-shot CoT prompting in arithmetic reasoning.
  • The results suggest that PS prompting can generate a higher-quality reasoning process and has the potential to outperform manual few-shot CoT prompting.
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末共郭,一起剝皮案震驚了整個(gè)濱河市哗戈,隨后出現(xiàn)的幾起案子芋绸,更是在濱河造成了極大的恐慌嫩实,老刑警劉巖,帶你破解...
    沈念sama閱讀 216,496評論 6 501
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件癞揉,死亡現(xiàn)場離奇詭異世蔗,居然都是意外死亡嚎幸,警方通過查閱死者的電腦和手機(jī)践美,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,407評論 3 392
  • 文/潘曉璐 我一進(jìn)店門洗贰,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人陨倡,你說我怎么就攤上這事敛滋。” “怎么了玫膀?”我有些...
    開封第一講書人閱讀 162,632評論 0 353
  • 文/不壞的土叔 我叫張陵矛缨,是天一觀的道長。 經(jīng)常有香客問我帖旨,道長箕昭,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,180評論 1 292
  • 正文 為了忘掉前任解阅,我火速辦了婚禮落竹,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘货抄。我一直安慰自己述召,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,198評論 6 388
  • 文/花漫 我一把揭開白布蟹地。 她就那樣靜靜地躺著积暖,像睡著了一般。 火紅的嫁衣襯著肌膚如雪怪与。 梳的紋絲不亂的頭發(fā)上夺刑,一...
    開封第一講書人閱讀 51,165評論 1 299
  • 那天,我揣著相機(jī)與錄音分别,去河邊找鬼遍愿。 笑死,一個(gè)胖子當(dāng)著我的面吹牛耘斩,可吹牛的內(nèi)容都是我干的沼填。 我是一名探鬼主播,決...
    沈念sama閱讀 40,052評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼括授,長吁一口氣:“原來是場噩夢啊……” “哼坞笙!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起荚虚,我...
    開封第一講書人閱讀 38,910評論 0 274
  • 序言:老撾萬榮一對情侶失蹤薛夜,失蹤者是張志新(化名)和其女友劉穎,沒想到半個(gè)月后曲管,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體却邓,經(jīng)...
    沈念sama閱讀 45,324評論 1 310
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,542評論 2 332
  • 正文 我和宋清朗相戀三年院水,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了腊徙。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 39,711評論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡檬某,死狀恐怖撬腾,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情恢恼,我是刑警寧澤民傻,帶...
    沈念sama閱讀 35,424評論 5 343
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響漓踢,放射性物質(zhì)發(fā)生泄漏牵署。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,017評論 3 326
  • 文/蒙蒙 一喧半、第九天 我趴在偏房一處隱蔽的房頂上張望奴迅。 院中可真熱鬧,春花似錦挺据、人聲如沸取具。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,668評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽暇检。三九已至,卻和暖如春婉称,著一層夾襖步出監(jiān)牢的瞬間块仆,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,823評論 1 269
  • 我被黑心中介騙來泰國打工酿矢, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留榨乎,地道東北人。 一個(gè)月前我還...
    沈念sama閱讀 47,722評論 2 368
  • 正文 我出身青樓瘫筐,卻偏偏與公主長得像蜜暑,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個(gè)殘疾皇子策肝,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,611評論 2 353

推薦閱讀更多精彩內(nèi)容

  • 前言 Google Play應(yīng)用市場對于應(yīng)用的targetSdkVersion有了更為嚴(yán)格的要求肛捍。從 2018 年...
    申國駿閱讀 64,077評論 14 98
  • """1.個(gè)性化消息: 將用戶的姓名存到一個(gè)變量中,并向該用戶顯示一條消息之众。顯示的消息應(yīng)非常簡單拙毫,如“Hello ...
    她即我命閱讀 2,888評論 0 5
  • 我們都是軟弱的人,所以才會說謊棺禾。我們都是膽小的人缀蹄,所以才要武裝。我們都是一群笨蛋膘婶,所以才會互相傷害缺前。
    所羅門的偽證_dc0a閱讀 2,055評論 0 3
  • 為了讓我有一個(gè)更快速、更精彩悬襟、更輝煌的成長衅码,我將開始這段刻骨銘心的自我蛻變之旅!從今天開始脊岳,我將每天堅(jiān)持閱...
    李薇帆閱讀 1,712評論 0 2
  • 似乎最近一直都在路上逝段,每次出來走的時(shí)候感受都會很不一樣垛玻。 1、感恩一直遇到好心人奶躯,很幸運(yùn)帚桩。在路上總是...
    時(shí)間里的花Lily閱讀 1,185評論 0 1