为什么大模型会「说胡话」?如何解决大模型的「幻觉」问题?
标签
AI
LLM
DL
ML
字数
2983 字
阅读时间
16 分钟
Archive 自 | Archive 创建于 | 分类 | 原始作者 | 原始地址 | 原始资源创建时间 | 原始资源更新时间 |
---|---|---|---|---|---|---|
知乎 | 2024-11-23 04:33 | 分类 | XWJ | 链接 | 2024-07-01 10:47 | 2024-07-01 10:47 |
本文主要围绕如何增强大模型(LLMs)的文本逻辑推理能力。文章分为七个部分:
第一部分是高引的综述文章;
第二部分从CoT开始介绍了各种典型的prompt提示词方法;
第三部分列举了神经符号方法的文章;
第四部分包括了大模型推理存在的各种问题;
第五部分列举了用于测试LLMs逻辑推理能力的数据集;
第六部分介绍如何使用外部工具辅助LLMs推理,与第三部分有所关联;
第七部分是其他相关文章。
1. 综述文章
名称 | 链接 | 简介 |
---|---|---|
A Survey of Reasoning with Foundation Models | https://http😕/osf.io/preprints/osf/ac4sp | |
Reasoning with Language Model Prompting: A Survey | https://http😕/arxiv.org/abs/2212.09597 | |
Efficient Prompting Methods for Large Language Models: A Survey | https://http😕/arxiv.org/abs/2404.01077 |
2. 大模型推理prompt方法
名称 | 链接 | 简介 |
---|---|---|
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | https://http😕/arxiv.org/pdf/2201.11903.pdf | CoT |
Chain of Thought Prompting Elicits Reasoning in Large Language Models | https://http😕/arxiv.org/abs/2201.11903 | COT帮助推理 |
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | https://http😕/arxiv.org/abs/2305.10601 | ToT |
Graph of Thoughts: Solving Elaborate Problems with Large Language Models | https://http😕/arxiv.org/pdf/2308.09687v2.pdf | GoT |
SELF-CONSISTENCY IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELS | https://http😕/arxiv.org/pdf/2203.11171.pdf | COT-SC |
ReAct: Synergizing Reasoning and Acting in Language Models | https://http😕/arxiv.org/abs/2210.03629 | React |
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models | https://http😕/arxiv.org/abs/2205.10625 | 把大问题先分解成小问题 |
Guiding Large Language Models with Divide-and-Conquer Program for Discerning Problem Solving | https://http😕/arxiv.org/abs/2402.05359 | DAC 指导大模型用分而治之的方法解决问题 |
LAMBADA: Backward Chaining for Automated Reasoning in Natural Language | https://http😕/arxiv.org/abs/2212.13894 | 反向链 解决组合爆炸问题 即从目标出发,将其递归分解为子目标,直到子目标可以根据事实被证明或推翻。 |
Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts | https://http😕/arxiv.org/pdf/2401.14295v1 | COT,TOT,GOT综述 |
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning | https://http😕/arxiv.org/pdf/2205.09712 | CR的cot版本? |
Cumulative Reasoning with Large Language Models | https://http😕/arxiv.org/abs/2308.04371 | CR |
From Indeterminacy to Determinacy: Augmenting Logical Reasoning Capabilities with Large Language Models | https://http😕/arxiv.org/abs/2310.18659 | DetermLR |
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | https://http😕/arxiv.org/pdf/2405.12939 | AoR,层级聚合推理突破大模型复杂推理上限 |
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions | https://http😕/arxiv.org/abs/2404.13208 | 优先执行具有更高优先级的指令 |
3. 神经符号方法
名称 | 链接 | 简介 |
---|---|---|
Logic-Driven Context Extension and Data Augmentation for Logical Reasoning of Text | https://http😕/arxiv.org/abs/2105.03659 | LReasoner |
Enhancing Logical Reasoning of Large Language Models through Logic-Driven Data Augmentation | https://arxiv.dosf.top/abs/2305.12599 | 基于LReasoner,使用更精细的语义解析 |
A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models | https://http😕/arxiv.org/abs/2401.00757 | LogicAsker |
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers | https://http😕/arxiv.org/abs/2310.15164v2 | LINC,使用LLM解析为符号再用Prover9求解器 |
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning | https://http😕/arxiv.org/abs/2305.12295 | 聚合五种符号求解器 |
SATLM: Satisffability-Aided Language Models Using Declarative Prompting | https://http😕/arxiv.org/pdf/2305.09656.pdf | 与LINC类似,不过使用Z3求解器 |
Meta-Reasoning: Semantics-Symbol Deconstruction For Large Language Models | https://www.http😕/semanticscholar.org/reader/8568d7bd9dfb5ba0b91940b938b44a88fafdf95b | “元推理”,文章认为将自然语言解析为语义符号,可以更好地理解核心问题 |
The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning | https://http😕/openreview.net/pdf?id=qLgQpeQX3x1 |
4. 大模型推理存在的问题
名称 | 链接 | 简介 |
---|---|---|
LLMs with Chain-of-Thought Are Non-Causal Reasoners | https://http😕/arxiv.org/abs/2402.16048 | 发现错误的思维链常常能引出正确的结果,正确的思维链也会引出错误的结果 |
Premise Order Matters in Reasoning with Large Language Models | https://http😕/arxiv.org/abs/2402.08939 | “反转诅咒”:语句调换顺序影响推理 |
Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse | https://http😕/arxiv.org/abs/2311.07468 | “反转诅咒”的缓解方法,接上篇 |
Measuring Faithfulness in Chain-of-Thought Reasoning | https://http😕/arxiv.org/abs/2307.13702 | |
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4 | https://http😕/arxiv.org/abs/2304.03439 | |
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought | https://http😕/arxiv.org/abs/2210.01240 | 一阶逻辑生成PRONTOQA数据集 |
Knowledge Conflicts for LLMs: A Survey | https://http😕/arxiv.org/abs/2403.08319 | 知识冲突综述,分为上下文与记忆、上下文间冲突和记忆内冲突 |
Assessing Logical Reasoning Capabilities of Encoder-Only Transformer Models | https://http😕/arxiv.org/abs/2312.11720 | 文末有各数据集介绍和地址https://http😕/drive.google.com/drive/folders/1YpRoveEJJZIOUyAMeeo5LF6kt8eAFkya |
Conditional and Modal Reasoning in Large Language Models | https://http😕/arxiv.org/abs/2401.17169 | |
Large Language Models Cannot Self-Correct Reasoning Yet | https://http😕/arxiv.org/abs/2310.01798 | |
Survey of Hallucination in Natural Language Generation | https://http😕/arxiv.org/abs/2202.03629v6 | 大模型幻觉综述 |
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | https://http😕/arxiv.org/abs/2311.05232 | 大模型幻觉综述 |
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners | https://http😕/arxiv.org/pdf/2305.14825.pdf | 大模型不容易理解符号,更容易理解语义 |
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | https://http😕/arxiv.org/abs/2307.02477 | 反事实数据集 |
The Impact of Reasoning Step Length on Large Language Models | https://http😕/arxiv.org/abs/2401.04925 | 推理步数对正确率影响 |
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models | https://http😕/arxiv.org/abs/2402.14848 | input长度对回答正确率影响 |
5. 数据集
名称 | 链接 | 简介 |
---|---|---|
ReClor | https://http😕/openreview.net/pdf?id=HJgJtT4tvB | 分为easy和hard,easy只用常识即可答出 |
FOLIO: Natural Language Reasoning with First-Order Logic | https://http😕/arxiv.org/pdf/2209.00840.pdf | |
Diagnosing the First-Order Logical Reasoning Ability Through LogicNLI | https://http😕/aclanthology.org/2021.emnlp-main.303.pdf | LogicNLI数据集 |
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning | https://www.http😕/ijcai.org/proceedings/2020/0501.pdf | LogiQA数据集,采集自中国公务员考试 |
CLUTRR | https://http😕/arxiv.org/pdf/1908.06177v2.pdf | CLUTRR,家庭关系 |
Transformers as Soft Reasoners over Language | https://www.http😕/ijcai.org/proceedings/2020/0537.pdf | RuleTaker/Softreasoner,构建不带量词的数据集,命题由基本谓词(is,like等)组成,条件句用 if then |
On the Paradox of Learning to Reason from Data | https://http😕/arxiv.org/pdf/2205.11502.pdf | SimpleLogic数据集 |
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | https://http😕/arxiv.org/abs/2307.02477 | 反事实数据集 |
Counterfactual reasoning: Testing language models’ understanding of hypothetical scenarios | https://http😕/arxiv.org/pdf/2305.16572.pdf | 反事实数据集 |
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language | https://http😕/arxiv.org/pdf/2012.13048.pdf | |
AR-LSAT: Investigating Analytical Reasoning of Text | https://http😕/arxiv.org/abs/2104.06598 | |
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples | https://http😕/arxiv.org/abs/2305.15269 | 基于PROTONQA增加规则和数据范围,构建数据集 |
Counterfactual Story Reasoning and Generation | https://http😕/arxiv.org/pdf/1909.04076.pdf | 生成反事实数据集:收集反事实数据训练语言模型进行续写。。。。 |
6. 外部工具使用
名称 | 链接 | 简介 |
---|---|---|
Efficient Tool Use with Chain-of-Abstraction Reasoning | https://http😕/arxiv.org/abs/2401.17464 | 将问题抽象为符号再用外部求解器,这篇文章中总结了很多LLM运用Tool的相关工作 |
7. 其他
名称 | 链接 | 简介 |
---|---|---|
The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code | https://http😕/aclanthology.org/2023.findings-acl.574.pdf | 将逻辑关系解析为code形式,提高因果推理能力 |
Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming | https://http😕/aclanthology.org/2023.findings-acl.191.pdf | 可微分的符号推理框架 |
LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning | https://http😕/aclanthology.org/2022.findings-emnlp.1/ | 解决代数问题,构造树形求解器,调用计算公式 |
Reasoning with Language Model is Planning with World Model | https://http😕/arxiv.org/abs/2305.14992 | 树形推理,加入反馈 |
ThinkSum: Probabilistic reasoning over sets using large language models | https://http😕/aclanthology.org/2023.acl-long.68.pdf | 多目标推理问题,概率推理范式 |
Reasoning in Large Language Models Through Symbolic Math Word Problems | https://http😕/aclanthology.org/2023.findings-acl.364.pdf | 数学符号问题 |
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models | https://http😕/aclanthology.org/2022.naacl-main.341/ | 符号化知识蒸馏,大模型训练小模型 |
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey | https://http😕/arxiv.org/abs/2305.18703 | |
Large Language Models are Better Reasoners with Self-Veriffcation | https://http😕/aclanthology.org/2023.findings-emnlp.167.pdf | |
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models | https://http😕/arxiv.org/abs/2306.15626 | 基本原理:根据当前的证明状态,它可以检索出少数可能有用的前提,并根据状态和检索出的前提的连接情况生成一个策略。Lean 是一种编程语言,既可以写传统的程序,也可以写定理和证明。 |
One Embedder, Any Task: Instruction-Finetuned Text Embeddings | https://http😕/arxiv.org/abs/2212.09741 | 每个input都和都和解释用例的instruction(例如,任务和领域描述)一起进行emb计算 |
RESOLVING KNOWLEDGE CONFLICTS IN LARGE LANGUAGE MODELS | https://http😕/arxiv.org/pdf/2310.00935.pdf | 一种改善LLMs知识冲突的框架 |