大模型工具学习（中英文）.pdf

资源描述

1、0工具学习工具学习1背景背景THUNLP工具和情报2工具是人类能力的延伸，旨在提高生产力、效率和解决问题的能力纵观历史，人类一直是发明和操纵工具的主要媒介问题：人工智能在工具使用方面能像人类一样有能力吗？工具和情报3基础模型的答案是肯定的较强的语义理解广泛的世界知识强大的推理和计划能力.工具和情报1 秦，于佳，等。“基础模型的工具学习”。arXiv 预印本 arXiv：2304.08354（2023）。4工具学习 1：基础模型可以遵循人类指令和操纵工具来解决任务工具学习的分类5工具增强学习具有来自工具的执行结果的扩充基础模型工具被视为有助于产生高质量产出的补充资源工具学习的分类6面向工具的学习

2、利用模型来管理工具并代替人类做出顺序决策利用基础模型的广阔世界知识和推理能力进行复杂的推理和规划7框架框架THUNLP框架8环境提供了工具运行的平台感知者汇总反馈给控制器工具集：具有不同功能的工具的集合控制器提供可行的计划来满足用户请求意图理解userid:444287,docid:155342,日期:2024-05-19,理解指令的基本目的学习从指令空间到模型认知空间的映射指令调整用不同的指令包装任务监督微调非凡的泛化能力1精细语言模型是零分学习者2多任务提示训练可实现零镜头任务概括3OPT-IML：通过镜头扩展语言模型指令元学习泛化9意图理解10扩大模型大小和指令调整数据集的多样性泛化能力

3、的增强挑战理解模糊指令：用户查询中的模糊和歧义理论上的无限指令空间：无限表达和个性化指令工具理解11通过提示激发工具理解零拍提示:描述 API 功能，其输入/输出格式，可能的参数等。允许模型了解每个 API 可以处理的任务少量提示：向模型提供具体的工具使用演示通过从这些演示中模仿人类行为，模型可以学习如何利用这些工具工具理解12通过提示激发工具理解规划与推理13内省推理在不与环境交互的情况下生成静态计划外向推理生成考虑环境变化和反馈的动态计划规划与推理内省推理如果提示适当，PLM 可以有效地将高级任务分解为中级计划，而无需任何进一步的培训作为零镜头策划者的语言模型：为具体化提取可操作的知识代理

4、商14规划与推理外向推理挑战：基础模型没有体现或扎根于物理世界解决方案：约束模型提出既可行又符合上下文的自然语言动作尽我所能，不要像我说的那样！Ahn,Michael,et al.Do as I can,not as I say:Grounding language in robot affiliences.arXiv preprint arXiv:2204.01691(2022).15规划与推理外向推理内心独白 1：将各种反馈来源的信息注入模型规划1黄文龙等。“内心独白：通过语言模型进行规划的具身推理。”arXiv 预印本 arXiv：2207.05608(2022).16规划与推理17多步

5、骤多工具方案人类不会坚持一个场景和一个工具了解不同工具之间的相互作用模型不仅要了解单个工具，还要了解它们的组合用法并对工具进行逻辑排序从顺序执行到并行执行工具不必顺序执行，并行执行导致叠加效果从单代理问题解决到多代理协作复杂的任务通常需要多个代理之间的协作，每个代理都有其独特的专业知识培训策略18从演示中学习从演示中学习：通常涉及（人类）注释从反馈中学习从反馈中学习：通常涉及强化学习WebGPT监督学习克隆人类行为以使用搜索引擎监督微调+强化学习只需要 6,000 个注释数据Nakano,Reiichiro,et al.WebGPT:Browser-assisted question-answ

6、ering with human feedback.arXiv preprint arXiv:2112.09332(2021).19WebCPM20动机WebGPT 不是公开的，其内部运作仍然不透明我们的努力（WebCPM）开源交互式网络搜索界面第一个涉及交互式网络搜索的公共 QA 数据集，也是第一个中国 LFQA 数据集框架和模型实现WebCPM21接口（搜索模式）和预定义的操作WebCPM22WebCPM23我们的框架由两个模型组成：1.搜索模型，包括：行动预测模块搜索查询生成模块支持事实提取模块2.信息综合模型WebCPM24对于 T 个步骤的动作序列,搜索模型执行动作以收集支持事实,这

7、些支持事实被发送到合成模型以用于答案生成。WebCPM25整体管道评价整体管道评价（基于人类偏好）模型生成的答案 v.s.人类注释支持事实的三个来源被发送到综合模型(1)管道收集，(2)人工收集，(3)非交互式搜索(TF-IDF)Webshop26学习进行网上购物Toolformer27自监督工具学习预定义的工具 API鼓励模型调用和执行工具 API设计自监督损失，看看工具执行是否可以帮助语言建模如果工具执行减少了 LM 损失，请将实例保存为训练数据工具创建28从工具用户到工具创建者人类是从石器时代到 21 世纪创造和使用工具的主要媒介大多数工具是为人类创造的，而不是 AI为模型制作的工具模块

8、化：将工具组成较小的单元新的输入和输出格式：更可计算且适用于 AI工具创建29工具创建30钱，程，等。“CREATOR：通过工具创造解开大型语言模型的抽象和具体推理。”现有工程的限制大多数现有工作往往集中在有限数量的工具上用于确定工具最佳利用率的模型所采用的推理过程本质上是复杂的当前管道在检索执行结果后缺乏错误处理机制我们不是让 LLM 充当工具的用户，而是让他们成为创造者 1工具创建31四个程序创建DecisionExecution整改工具创建32实验数据:数学、TabMWP对 PoT 和纯 CoT 的重大改进33应用程序应用程序THUNLPChatGPT 插件34OpenAI 官方工具库为

9、 ChatGPT 提供更广泛的应用程序通过简单地提供带有描述的 API，ChatGPT 能够调用应用程序并完成更复杂的任务开源解决方案35BMTools 一个开源存储库，可扩展语言模型以使用工具，并作为社区构建和共享工具的平台开源解决方案https:/ python 函数并使用外部 ChatGPT-Plugins 轻松构建新插件用户可以托管其本地模型（例如 LLaMA、CPM）以使用工具开源解决方案https:/ 30+工具工具，欢迎投稿！数据库天气APIPPT谷歌学者Huggingface 模型图像生成开源解决方案https:/ BabyAGI 和 AutoGPT100k+工具-使用 SFT

10、数据的方式!开源解决方案39开源解决方案https:/ SFT 数据，以促进一般的工具使用能力我们提供数据集，相应的训练和评估脚本，以及在 ToolBench 上微调的强大模型 ToolLLaMA开源解决方案https:/ 提供的响应不仅包括最终答案，而且还包含模型的思想链过程，工具执行和工具执行结果多步骤决策和工具执行另一个值得注意的优势是我们的 API 的多样性，它是为现实世界的场景而设计的98k 实例，312k API 调用开源解决方案https:/ OpenAI API 自动生成，然后进行过滤，整个数据创建过程易于扩展开源解决方案https:/ ToolLLaMA开源解决方案http

11、s:/ 与 ChatGPT 在工具使用方面的能力相匹配ChatGPT 自动评估（越高越好）Summary45传统的语言任务（几乎）得到了很好的解决句法分析、实体识别、情感分析.我们面临着更具挑战性的任务我们面临着更具挑战性的任务！基础模型可以通过使用语言在复杂的场景中利用，性能可能在很大程度上依赖于 LLM 的有效性理论问题依然存在实际问题仍然存在探索在复杂场景中利用工具学习工具学习论文列表46https:/ Learning秦禹嘉0THUNLPBackground1 Tools are extensions of human capabilities designed to enhance

12、productivity,efficiency,and problem-solving Throughout history,humans have been the primary agents in the invention and manipulation of tools Question:can artificial intelligence be as capable as humans in tool use?2Tools and IntelligenceTools and Intelligence The answer is yes with foundation model

13、s Strong semantic understanding Extensive world knowledge Powerful reasoning and planning capabilities3Tools and IntelligenceTools and Intelligence4Tools and IntelligenceTools and Intelligence Tool Learning 1:foundation models can follow human instructions and manipulate tools for task solving1 Qin,

14、Yujia,et al.Tool Learning with Foundation Models.arXiv preprint arXiv:2304.08354(2023).Tool-augmented learning Augment foundation models with the execution results from tools Tools are viewed as complementary resources that aid in the generation of high-quality outputs5Categorization of Tool Learnin

15、gCategorization of Tool Learning6Categorization of Tool LearningCategorization of Tool Learning Tool-oriented learning Utilize models to govern tools and make sequential decisions in place of humans Exploiting foundation models vast world knowledge and reasoning ability for complex reasoning and pla

16、nningTHUNLPFramework78FrameworkFrameworkTool Set:a collection of tools with different functionalitiesEnvironment provides the platform where tools operateThe perceiver summarizes feedback to the controllerController provides feasible plans to fulfill user requests Comprehending the underlying purpos

17、e of an instruction Learning a mapping from the instruction space to themodels cognition space Instruction Tuning9Intent UnderstandingIntent Understanding Wrap tasks with diverseinstructions Supervised fine-tuning Extraordinarygeneralization capability1 Finetuned Language Models Are Zero-Shot Learne

18、rs2 Multitask Prompted Training Enables Zero-Shot Task Generalization3 OPT-IML:Scaling Language Model Instruction Meta Learning through the Lens of Generalizationuserid:444287,docid:155342,date:2024-05-19, Scaling up the model size and the diversity of instruction-tuning datasets Enhancement of gene

19、ralization capability Challenges Understanding Vague Instructions:vagueness and ambiguity in the user query Theoretically Infinite Instruction Space:infinite expression and personalized instructions 10Intent UnderstandingIntent Understanding11Tool UnderstandingTool Understanding Eliciting tool under

20、standing with prompting Zero-shot prompting:Describe API functionalities,their input/output formats,possible parameters,etc.Allow the model to understand the tasks that each API can tackle Few-shot prompting:Provide concrete tool-use demonstrations to the model By mimicking human behaviors from thes

21、e demonstrations,the model can learn how to utilize these tools12Tool UnderstandingTool Understanding Eliciting tool understanding with prompting Introspective Reasoning Generate a static plan without interacting with the environment Extrospective Reasoning Generate a dynamic plan considering the ch

22、ange of environment and feedbacks13Planning and ReasoningPlanning and Reasoning Introspective Reasoning If prompted appropriately,PLMs can effectively decompose high-level tasks into mid-level plans without any further training14Planning and ReasoningPlanning and ReasoningLanguage Models as Zero-Sho

23、t Planners:Extracting Actionable Knowledge for Embodied Agents Extrospective Reasoning Challenge:foundation models are not embodied or grounded to the physical world Solution:constrain the model to propose natural language actions that are both feasible and contextually appropriate15Planning and Rea

24、soningPlanning and ReasoningDo as I can,Not as I say!Ahn,Michael,et al.Do as i can,not as i say:Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691(2022).Extrospective Reasoning Inner Monologue 1:injecting information from varioussources of feedback into model planning16Plannin

25、g and ReasoningPlanning and Reasoning1 Huang,Wenlong,et al.Inner monologue:Embodied reasoning through planning with language models.arXiv preprint arXiv:2207.05608(2022).Multi-step Multi-tool Scenarios Humans wont stick to one scenario and one tool Understanding the Interplay among Different Tools M

26、odels should not only understand individual tools,but learn their combination usage and order the tools logically From Sequential Execution to Parallel Execution Tools do not have to be performed sequentially,parallel performing leads to superimposed effects From Single-agent Problem-Solving to Mult

27、i-agent Collaboration Complex tasks often necessitate collaboration among multiple agents,each with their unique expertise17Planning and ReasoningPlanning and Reasoning Learning from demonstrations:often involves(human)annotations Learning from feedback:often involves reinforcement learning18Trainin

28、g StrategiesTraining Strategies Supervised Learning Clone human behavior to use search engines Supervised fine-tuning+reinforcement learning Only need 6,000 annotated data19WebGPTWebGPTNakano,Reiichiro,et al.WebGPT:Browser-assisted question-answering with human feedback.arXiv preprint arXiv:2112.093

29、32(2021).Motivation WebGPT is not public,and its inner workings remain opaque Our Efforts(WebCPM)Open-source interactive web search interface The first public QA dataset that involves interactive web search,and also the first Chinese LFQA dataset Framework and Model Implementation20WebCPMWebCPM Inte

30、rface(search mode)and pre-defined actions21WebCPMWebCPM22WebCPMWebCPM Our framework consists of two models:1.Search model,consisting of:Action prediction module Search query generation module Supporting fact extraction module 2.Information synthesis model23WebCPMWebCPMFor an action sequence of T ste

31、ps,the search model executes actions to collect supporting facts,which are sent to the synthesis model for answer generation.24WebCPMWebCPMHolistic Pipeline Evaluation(based on human preference)Model-generated Answer v.s.Human AnnotationThree sources of supporting facts are sent to the synthesis mod

32、el(1)pipeline-collected,(2)human-collected,(3)non-interactive search(TF-IDF)25WebCPMWebCPM Learning to perform online shopping26WebShopWebShop Self-supervised Tool Learning Pre-defined tool APIs Encourage models to call and execute tool APIs Design self-supervised loss to see if the tool execution c

33、an help language modeling27ToolformerToolformerIf the tool execution reduces LM loss,save the instances as training data From Tool User to Tool Creator Humans are the primary agents that create and use tools from Stone Age to 21st century Most tools are created for humans,not AI Tools Made for Model

34、s Modularized:compose tools into smaller units New input and output formats:more computable and suitable for AI28Tool CreationTool Creation29Tool CreationTool Creation Limitations of Existing Works Most existing work tends to concentrate on a limited number of tools The reasoning process employed by

35、 models for determining the optimal utilization of tools is inherently complex The current pipelines lack a error-handling mechanism after retrieving execution results Instead of letting LLMs act as the users of tools,we enable them to be the creators 130Tool CreationTool CreationQian,Cheng,et al.CR

36、EATOR:Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation.31Tool CreationTool Creation Four Procedures Creation Decision Execution Rectification32Tool CreationTool Creation Experiments Datasts:MATH,TabMWP Significant improvements over PoT and pure CoTTHUNLPA

37、pplication33 OpenAIs official tool library Empower ChatGPT with broader applications By simply providing APIs with descriptions,ChatGPT is enabled to call applications and complete more complex tasks34ChatGPTChatGPT PluginsPlugins BMTools An open-source repository that extends language models to use

38、 tools and serves as a platform for the community to build and share tools35OpenOpen-source Solutionssource Solutions Features:Users can easily build a new plugin by writing python functionsand use external ChatGPT-Plugins Users can host their local models(e.g.,LLaMA,CPM)to usetools36Open-source Sol

39、utionsOpen-source Solutionshttps:/ Features:30+tools tools supported,welcome contributing!37OpenOpen-source Solutionssource SolutionsdatabaseWeather APIPPTGoogle ScholarHuggingface ModelsImage Generationhttps:/ Features:Support BabyAGI and AutoGPT 100k+tool-use SFT data on the way!38OpenOpen-source

40、Solutionssource Solutionshttps:/ Solutionssource Solutions40OpenOpen-source Solutionssource Solutions ToolBench An open-source,large-scale,high-quality instruction tuning SFT data to facilitate general tool-use capability We provide the dataset,the corresponding training and evaluation scripts,and a

41、 capable model ToolLLaMA fine-tuned on ToolBenchhttps:/ SolutionsOpen-source Solutions Features Both single-tool and multi-tool scenarios are supported ToolBench provides responses that not only include the finalanswer but also incorporate the models chain-of-thoughtprocess,tool execution,and tool e

42、xecution results Multi-step decision making and tool execution Another notable advantage is the diversity of our API,which isdesigned for real-world scenarios 98k instances,312k API callshttps:/ Solutionssource Solutions Construction Process All the data is automatically generated by OpenAI API and

43、then filtered,the whole data creation process is easy to scale uphttps:/ Solutionssource Solutions Creation Process We provide the dataset,the corresponding training and evaluation scripts,and a capable model ToolLLaMAhttps:/ Solutionssource Solutions Evaluation ToolLLaMA matches ChatGPTs capabiliti

44、es in tool use Auto-evaluated by ChatGPT(higher is better)https:/ Traditional language tasks are(almost)well solved Syntactic parsing,entity recognition,sentiment analysis We are facing more challenging tasks!Foundation models can be leveraged in complex scenarios byusing language,and the performance may largely rely on LLMseffectiveness Theoretical issues still exist Practical issues still exist Explore leveraging tool learning in complex scenarios46Tool Learning Paper ListTool Learning Paper Listhttps:/

展开阅读全文