为什么大模型会「说胡话」？如何解决大模型的「幻觉」问题？

标签

AI

LLM

DL

ML

字数

2983 字

阅读时间

16 分钟

Archive 自	Archive 创建于	分类	原始作者	原始地址	原始资源创建时间	原始资源更新时间
知乎	2024-11-23 04:33	分类	XWJ	链接	2024-07-01 10:47	2024-07-01 10:47

本文主要围绕如何增强大模型（LLMs）的文本逻辑推理能力。文章分为七个部分：

第一部分是高引的综述文章；

第二部分从CoT开始介绍了各种典型的prompt提示词方法；

第三部分列举了神经符号方法的文章；

第四部分包括了大模型推理存在的各种问题；

第五部分列举了用于测试LLMs逻辑推理能力的数据集；

第六部分介绍如何使用外部工具辅助LLMs推理，与第三部分有所关联；

第七部分是其他相关文章。

1. 综述文章

名称	链接	简介
A Survey of Reasoning with Foundation Models	https://http😕/osf.io/preprints/osf/ac4sp
Reasoning with Language Model Prompting: A Survey	https://http😕/arxiv.org/abs/2212.09597
Efficient Prompting Methods for Large Language Models: A Survey	https://http😕/arxiv.org/abs/2404.01077

2. 大模型推理prompt方法

名称	链接	简介
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	https://http😕/arxiv.org/pdf/2201.11903.pdf	CoT
Chain of Thought Prompting Elicits Reasoning in Large Language Models	https://http😕/arxiv.org/abs/2201.11903	COT帮助推理
Tree of Thoughts: Deliberate Problem Solving with Large Language Models	https://http😕/arxiv.org/abs/2305.10601	ToT
Graph of Thoughts: Solving Elaborate Problems with Large Language Models	https://http😕/arxiv.org/pdf/2308.09687v2.pdf	GoT
SELF-CONSISTENCY IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELS	https://http😕/arxiv.org/pdf/2203.11171.pdf	COT-SC
ReAct: Synergizing Reasoning and Acting in Language Models	https://http😕/arxiv.org/abs/2210.03629	React
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models	https://http😕/arxiv.org/abs/2205.10625	把大问题先分解成小问题
Guiding Large Language Models with Divide-and-Conquer Program for Discerning Problem Solving	https://http😕/arxiv.org/abs/2402.05359	DAC 指导大模型用分而治之的方法解决问题
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language	https://http😕/arxiv.org/abs/2212.13894	反向链解决组合爆炸问题即从目标出发，将其递归分解为子目标，直到子目标可以根据事实被证明或推翻。
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts	https://http😕/arxiv.org/pdf/2401.14295v1	COT,TOT,GOT综述
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning	https://http😕/arxiv.org/pdf/2205.09712	CR的cot版本？
Cumulative Reasoning with Large Language Models	https://http😕/arxiv.org/abs/2308.04371	CR
From Indeterminacy to Determinacy: Augmenting Logical Reasoning Capabilities with Large Language Models	https://http😕/arxiv.org/abs/2310.18659	DetermLR
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models	https://http😕/arxiv.org/pdf/2405.12939	AoR，层级聚合推理突破大模型复杂推理上限
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions	https://http😕/arxiv.org/abs/2404.13208	优先执行具有更高优先级的指令

3. 神经符号方法

名称	链接	简介
Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text	https://http😕/arxiv.org/abs/2105.03659	LReasoner
Enhancing Logical Reasoning of Large Language Models through Logic-Driven Data Augmentation	https://arxiv.dosf.top/abs/2305.12599	基于LReasoner，使用更精细的语义解析
A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models	https://http😕/arxiv.org/abs/2401.00757	LogicAsker
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers	https://http😕/arxiv.org/abs/2310.15164v2	LINC，使用LLM解析为符号再用Prover9求解器
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning	https://http😕/arxiv.org/abs/2305.12295	聚合五种符号求解器
SATLM: Satisffability-Aided Language Models Using Declarative Prompting	https://http😕/arxiv.org/pdf/2305.09656.pdf	与LINC类似，不过使用Z3求解器
Meta-Reasoning: Semantics-Symbol Deconstruction For Large Language Models	https://www.http😕/semanticscholar.org/reader/8568d7bd9dfb5ba0b91940b938b44a88fafdf95b	“元推理”，文章认为将自然语言解析为语义符号，可以更好地理解核心问题
The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning	https://http😕/openreview.net/pdf?id=qLgQpeQX3x1

4. 大模型推理存在的问题

名称	链接	简介
LLMs with Chain-of-Thought Are Non-Causal Reasoners	https://http😕/arxiv.org/abs/2402.16048	发现错误的思维链常常能引出正确的结果，正确的思维链也会引出错误的结果
Premise Order Matters in Reasoning with Large Language Models	https://http😕/arxiv.org/abs/2402.08939	“反转诅咒”：语句调换顺序影响推理
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse	https://http😕/arxiv.org/abs/2311.07468	“反转诅咒”的缓解方法，接上篇
Measuring Faithfulness in Chain-of-Thought Reasoning	https://http😕/arxiv.org/abs/2307.13702
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4	https://http😕/arxiv.org/abs/2304.03439
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought	https://http😕/arxiv.org/abs/2210.01240	一阶逻辑生成PRONTOQA数据集
Knowledge Conflicts for LLMs: A Survey	https://http😕/arxiv.org/abs/2403.08319	知识冲突综述，分为上下文与记忆、上下文间冲突和记忆内冲突
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models	https://http😕/arxiv.org/abs/2312.11720	文末有各数据集介绍和地址 https://http😕/drive.google.com/drive/folders/1YpRoveEJJZIOUyAMeeo5LF6kt8eAFkya
Conditional and Modal Reasoning in Large Language Models	https://http😕/arxiv.org/abs/2401.17169
Large Language Models Cannot Self-Correct Reasoning Yet	https://http😕/arxiv.org/abs/2310.01798
Survey of Hallucination in Natural Language Generation	https://http😕/arxiv.org/abs/2202.03629v6	大模型幻觉综述
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions	https://http😕/arxiv.org/abs/2311.05232	大模型幻觉综述
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners	https://http😕/arxiv.org/pdf/2305.14825.pdf	大模型不容易理解符号，更容易理解语义
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks	https://http😕/arxiv.org/abs/2307.02477	反事实数据集
The Impact of Reasoning Step Length on Large Language Models	https://http😕/arxiv.org/abs/2401.04925	推理步数对正确率影响
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models	https://http😕/arxiv.org/abs/2402.14848	input长度对回答正确率影响

5. 数据集

名称	链接	简介
ReClor	https://http😕/openreview.net/pdf?id=HJgJtT4tvB	分为easy和hard，easy只用常识即可答出
FOLIO: Natural Language Reasoning with First-Order Logic	https://http😕/arxiv.org/pdf/2209.00840.pdf
Diagnosing the First-Order Logical Reasoning Ability Through LogicNLI	https://http😕/aclanthology.org/2021.emnlp-main.303.pdf	LogicNLI数据集
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning	https://www.http😕/ijcai.org/proceedings/2020/0501.pdf	LogiQA数据集，采集自中国公务员考试
CLUTRR	https://http😕/arxiv.org/pdf/1908.06177v2.pdf	CLUTRR，家庭关系
Transformers as Soft Reasoners over Language	https://www.http😕/ijcai.org/proceedings/2020/0537.pdf	RuleTaker/Softreasoner，构建不带量词的数据集，命题由基本谓词（is，like等）组成，条件句用 if then
On the Paradox of Learning to Reason from Data	https://http😕/arxiv.org/pdf/2205.11502.pdf	SimpleLogic数据集
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks	https://http😕/arxiv.org/abs/2307.02477	反事实数据集
Counterfactual reasoning: Testing language models’ understanding of hypothetical scenarios	https://http😕/arxiv.org/pdf/2305.16572.pdf	反事实数据集
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language	https://http😕/arxiv.org/pdf/2012.13048.pdf
AR-LSAT: Investigating Analytical Reasoning of Text	https://http😕/arxiv.org/abs/2104.06598
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples	https://http😕/arxiv.org/abs/2305.15269	基于PROTONQA增加规则和数据范围，构建数据集
Counterfactual Story Reasoning and Generation	https://http😕/arxiv.org/pdf/1909.04076.pdf	生成反事实数据集：收集反事实数据训练语言模型进行续写。。。。

6. 外部工具使用

名称	链接	简介
Efficient Tool Use with Chain-of-Abstraction Reasoning	https://http😕/arxiv.org/abs/2401.17464	将问题抽象为符号再用外部求解器，这篇文章中总结了很多LLM运用Tool的相关工作

7. 其他

名称	链接	简介
The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code	https://http😕/aclanthology.org/2023.findings-acl.574.pdf	将逻辑关系解析为code形式，提高因果推理能力
Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming	https://http😕/aclanthology.org/2023.findings-acl.191.pdf	可微分的符号推理框架
LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning	https://http😕/aclanthology.org/2022.findings-emnlp.1/	解决代数问题，构造树形求解器，调用计算公式
Reasoning with Language Model is Planning with World Model	https://http😕/arxiv.org/abs/2305.14992	树形推理，加入反馈
ThinkSum: Probabilistic reasoning over sets using large language models	https://http😕/aclanthology.org/2023.acl-long.68.pdf	多目标推理问题，概率推理范式
Reasoning in Large Language Models Through Symbolic Math Word Problems	https://http😕/aclanthology.org/2023.findings-acl.364.pdf	数学符号问题
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models	https://http😕/aclanthology.org/2022.naacl-main.341/	符号化知识蒸馏，大模型训练小模型
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey	https://http😕/arxiv.org/abs/2305.18703
Large Language Models are Better Reasoners with Self-Veriffcation	https://http😕/aclanthology.org/2023.findings-emnlp.167.pdf
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models	https://http😕/arxiv.org/abs/2306.15626	基本原理：根据当前的证明状态，它可以检索出少数可能有用的前提，并根据状态和检索出的前提的连接情况生成一个策略。Lean 是一种编程语言，既可以写传统的程序，也可以写定理和证明。
One Embedder, Any Task: Instruction-Finetuned Text Embeddings	https://http😕/arxiv.org/abs/2212.09741	每个input都和都和解释用例的instruction(例如，任务和领域描述)一起进行emb计算
RESOLVING KNOWLEDGE CONFLICTS IN LARGE LANGUAGE MODELS	https://http😕/arxiv.org/pdf/2310.00935.pdf	一种改善LLMs知识冲突的框架

贡献者

文件历史

最后编辑于 3 个月前查看完整历史