Skip to content

为什么大模型会「说胡话」?如何解决大模型的「幻觉」问题?

标签
AI
LLM
DL
ML
字数
2983 字
阅读时间
16 分钟
Archive 自Archive 创建于分类原始作者原始地址原始资源创建时间原始资源更新时间
知乎2024-11-23 04:33分类XWJ链接2024-07-01 10:472024-07-01 10:47

本文主要围绕如何增强大模型(LLMs)的文本逻辑推理能力。文章分为七个部分:

第一部分是高引的综述文章;

第二部分从CoT开始介绍了各种典型的prompt提示词方法;

第三部分列举了神经符号方法的文章;

第四部分包括了大模型推理存在的各种问题;

第五部分列举了用于测试LLMs逻辑推理能力的数据集;

第六部分介绍如何使用外部工具辅助LLMs推理,与第三部分有所关联;

第七部分是其他相关文章。

1. 综述文章

名称链接简介
A Survey of Reasoning with Foundation Modelshttps://http😕/osf.io/preprints/osf/ac4sp
Reasoning with Language Model Prompting: A Surveyhttps://http😕/arxiv.org/abs/2212.09597
Efficient Prompting Methods for Large Language Models: A Surveyhttps://http😕/arxiv.org/abs/2404.01077

2. 大模型推理prompt方法

名称链接简介
Chain-of-Thought Prompting Elicits Reasoning in Large Language Modelshttps://http😕/arxiv.org/pdf/2201.11903.pdfCoT
Chain of Thought Prompting Elicits Reasoning in Large Language Modelshttps://http😕/arxiv.org/abs/2201.11903COT帮助推理
Tree of Thoughts: Deliberate Problem Solving with Large Language Modelshttps://http😕/arxiv.org/abs/2305.10601ToT
Graph of Thoughts: Solving Elaborate Problems with Large Language Modelshttps://http😕/arxiv.org/pdf/2308.09687v2.pdfGoT
SELF-CONSISTENCY IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELShttps://http😕/arxiv.org/pdf/2203.11171.pdfCOT-SC
ReAct: Synergizing Reasoning and Acting in Language Modelshttps://http😕/arxiv.org/abs/2210.03629React
Least-to-Most Prompting Enables Complex Reasoning in Large Language Modelshttps://http😕/arxiv.org/abs/2205.10625把大问题先分解成小问题
Guiding Large Language Models with Divide-and-Conquer Program for Discerning Problem Solvinghttps://http😕/arxiv.org/abs/2402.05359DAC 指导大模型用分而治之的方法解决问题
LAMBADA: Backward Chaining for Automated Reasoning in Natural Languagehttps://http😕/arxiv.org/abs/2212.13894反向链 解决组合爆炸问题 即从目标出发,将其递归分解为子目标,直到子目标可以根据事实被证明或推翻。
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughtshttps://http😕/arxiv.org/pdf/2401.14295v1COT,TOT,GOT综述
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoninghttps://http😕/arxiv.org/pdf/2205.09712CR的cot版本?
Cumulative Reasoning with Large Language Modelshttps://http😕/arxiv.org/abs/2308.04371CR
From Indeterminacy to Determinacy: Augmenting Logical Reasoning Capabilities with Large Language Modelshttps://http😕/arxiv.org/abs/2310.18659DetermLR
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Modelshttps://http😕/arxiv.org/pdf/2405.12939AoR,层级聚合推理突破大模型复杂推理上限
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructionshttps://http😕/arxiv.org/abs/2404.13208优先执行具有更高优先级的指令

3. 神经符号方法

名称链接简介
Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Texthttps://http😕/arxiv.org/abs/2105.03659LReasoner
Enhancing Logical Reasoning of Large Language Models through Logic-Driven Data Augmentationarxiv.dosf.top/abs/2305基于LReasoner,使用更精细的语义解析
A & B == B & A: Triggering Logical Reasoning Failures in Large Language Modelshttps://http😕/arxiv.org/abs/2401.00757LogicAsker
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provershttps://http😕/arxiv.org/abs/2310.15164v2LINC,使用LLM解析为符号再用Prover9求解器
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoninghttps://http😕/arxiv.org/abs/2305.12295聚合五种符号求解器
SATLM: Satisffability-Aided Language Models Using Declarative Promptinghttps://http😕/arxiv.org/pdf/2305.09656.pdf与LINC类似,不过使用Z3求解器
Meta-Reasoning: Semantics-Symbol Deconstruction For Large Language Modelshttps://www.http😕/semanticscholar.org/reader/8568d7bd9dfb5ba0b91940b938b44a88fafdf95b“元推理”,文章认为将自然语言解析为语义符号,可以更好地理解核心问题
The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoninghttps://http😕/openreview.net/pdf?id=qLgQpeQX3x1

4. 大模型推理存在的问题

名称链接简介
LLMs with Chain-of-Thought Are Non-Causal Reasonershttps://http😕/arxiv.org/abs/2402.16048发现错误的思维链常常能引出正确的结果,正确的思维链也会引出错误的结果
Premise Order Matters in Reasoning with Large Language Modelshttps://http😕/arxiv.org/abs/2402.08939“反转诅咒”:语句调换顺序影响推理
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Cursehttps://http😕/arxiv.org/abs/2311.07468“反转诅咒”的缓解方法,接上篇
Measuring Faithfulness in Chain-of-Thought Reasoninghttps://http😕/arxiv.org/abs/2307.13702
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4https://http😕/arxiv.org/abs/2304.03439
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thoughthttps://http😕/arxiv.org/abs/2210.01240一阶逻辑生成PRONTOQA数据集
Knowledge Conflicts for LLMs: A Surveyhttps://http😕/arxiv.org/abs/2403.08319知识冲突综述,分为上下文与记忆、上下文间冲突和记忆内冲突
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Modelshttps://http😕/arxiv.org/abs/2312.11720文末有各数据集介绍和地址https://http😕/drive.google.com/drive/folders/1YpRoveEJJZIOUyAMeeo5LF6kt8eAFkya
Conditional and Modal Reasoning in Large Language Modelshttps://http😕/arxiv.org/abs/2401.17169
Large Language Models Cannot Self-Correct Reasoning Yethttps://http😕/arxiv.org/abs/2310.01798
Survey of Hallucination in Natural Language Generationhttps://http😕/arxiv.org/abs/2202.03629v6大模型幻觉综述
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questionshttps://http😕/arxiv.org/abs/2311.05232大模型幻觉综述
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasonershttps://http😕/arxiv.org/pdf/2305.14825.pdf大模型不容易理解符号,更容易理解语义
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Taskshttps://http😕/arxiv.org/abs/2307.02477反事实数据集
The Impact of Reasoning Step Length on Large Language Modelshttps://http😕/arxiv.org/abs/2401.04925推理步数对正确率影响
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Modelshttps://http😕/arxiv.org/abs/2402.14848input长度对回答正确率影响

5. 数据集

名称链接简介
ReClorhttps://http😕/openreview.net/pdf?id=HJgJtT4tvB分为easy和hard,easy只用常识即可答出
FOLIO: Natural Language Reasoning with First-Order Logichttps://http😕/arxiv.org/pdf/2209.00840.pdf
Diagnosing the First-Order Logical Reasoning Ability Through LogicNLIhttps://http😕/aclanthology.org/2021.emnlp-main.303.pdfLogicNLI数据集
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoninghttps://www.http😕/ijcai.org/proceedings/2020/0501.pdfLogiQA数据集,采集自中国公务员考试
CLUTRRhttps://http😕/arxiv.org/pdf/1908.06177v2.pdfCLUTRR,家庭关系
Transformers as Soft Reasoners over Languagehttps://www.http😕/ijcai.org/proceedings/2020/0537.pdfRuleTaker/Softreasoner,构建不带量词的数据集,命题由基本谓词(is,like等)组成,条件句用 if then
On the Paradox of Learning to Reason from Datahttps://http😕/arxiv.org/pdf/2205.11502.pdfSimpleLogic数据集
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Taskshttps://http😕/arxiv.org/abs/2307.02477反事实数据集
Counterfactual reasoning: Testing language models’ understanding of hypothetical scenarioshttps://http😕/arxiv.org/pdf/2305.16572.pdf反事实数据集
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Languagehttps://http😕/arxiv.org/pdf/2012.13048.pdf
AR-LSAT: Investigating Analytical Reasoning of Texthttps://http😕/arxiv.org/abs/2104.06598
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Exampleshttps://http😕/arxiv.org/abs/2305.15269基于PROTONQA增加规则和数据范围,构建数据集
Counterfactual Story Reasoning and Generationhttps://http😕/arxiv.org/pdf/1909.04076.pdf生成反事实数据集:收集反事实数据训练语言模型进行续写。。。。

6. 外部工具使用

名称链接简介
Efficient Tool Use with Chain-of-Abstraction Reasoninghttps://http😕/arxiv.org/abs/2401.17464将问题抽象为符号再用外部求解器,这篇文章中总结了很多LLM运用Tool的相关工作

7. 其他

名称链接简介
The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Codehttps://http😕/aclanthology.org/2023.findings-acl.574.pdf将逻辑关系解析为code形式,提高因果推理能力
Improved Logical Reasoning of Language Models via Differentiable Symbolic Programminghttps://http😕/aclanthology.org/2023.findings-acl.191.pdf可微分的符号推理框架
LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learninghttps://http😕/aclanthology.org/2022.findings-emnlp.1/解决代数问题,构造树形求解器,调用计算公式
Reasoning with Language Model is Planning with World Modelhttps://http😕/arxiv.org/abs/2305.14992树形推理,加入反馈
ThinkSum: Probabilistic reasoning over sets using large language modelshttps://http😕/aclanthology.org/2023.acl-long.68.pdf多目标推理问题,概率推理范式
Reasoning in Large Language Models Through Symbolic Math Word Problemshttps://http😕/aclanthology.org/2023.findings-acl.364.pdf数学符号问题
Symbolic Knowledge Distillation: from General Language Models to Commonsense Modelshttps://http😕/aclanthology.org/2022.naacl-main.341/符号化知识蒸馏,大模型训练小模型
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Surveyhttps://http😕/arxiv.org/abs/2305.18703
Large Language Models are Better Reasoners with Self-Veriffcationhttps://http😕/aclanthology.org/2023.findings-emnlp.167.pdf
LeanDojo: Theorem Proving with Retrieval-Augmented Language Modelshttps://http😕/arxiv.org/abs/2306.15626基本原理:根据当前的证明状态,它可以检索出少数可能有用的前提,并根据状态和检索出的前提的连接情况生成一个策略。Lean 是一种编程语言,既可以写传统的程序,也可以写定理和证明。
One Embedder, Any Task: Instruction-Finetuned Text Embeddingshttps://http😕/arxiv.org/abs/2212.09741每个input都和都和解释用例的instruction(例如,任务和领域描述)一起进行emb计算
RESOLVING KNOWLEDGE CONFLICTS IN LARGE LANGUAGE MODELShttps://http😕/arxiv.org/pdf/2310.00935.pdf一种改善LLMs知识冲突的框架

贡献者

文件历史