arXiv

LangChain implements the latest research in the field of Natural Language Processing. This page contains arXiv papers referenced in the LangChain Documentation, API Reference, Templates, and Cookbooks.

From the opposite direction, scientists use LangChain in research and reference it in the research papers.

arXiv papers with references to: LangChain | LangGraph | LangSmith

Summary

arXiv id / Title	Authors	Published date 🔻	LangChain Documentation
`2403.14403v2` Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity	Soyeong Jeong, Jinheon Baek, Sukmin Cho, et al.	2024‑03‑21	`Docs:` docs/concepts
`2402.03620v1` Self-Discover: Large Language Models Self-Compose Reasoning Structures	Pei Zhou, Jay Pujara, Xiang Ren, et al.	2024‑02‑06	`Cookbook:` Self-Discover
`2402.03367v2` RAG-Fusion: a New Take on Retrieval-Augmented Generation	Zackary Rackauckas	2024‑01‑31	`Docs:` docs/concepts
`2401.18059v1` RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval	Parth Sarthi, Salman Abdullah, Aditi Tuli, et al.	2024‑01‑31	`Cookbook:` Raptor
`2401.15884v2` Corrective Retrieval Augmented Generation	Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, et al.	2024‑01‑29	`Docs:` docs/concepts, `Cookbook:` Langgraph Crag
`2401.08500v1` Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering	Tal Ridnik, Dedy Kredo, Itamar Friedman	2024‑01‑16	`Docs:` docs/concepts
`2401.04088v1` Mixtral of Experts	Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, et al.	2024‑01‑08	`Cookbook:` Together Ai
`2312.06648v2` Dense X Retrieval: What Retrieval Granularity Should We Use?	Tong Chen, Hongwei Wang, Sihao Chen, et al.	2023‑12‑11	`Template:` propositional-retrieval
`2311.09210v1` Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models	Wenhao Yu, Hongming Zhang, Xiaoman Pan, et al.	2023‑11‑15	`Template:` chain-of-note-wiki
`2310.11511v1` Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection	Akari Asai, Zeqiu Wu, Yizhong Wang, et al.	2023‑10‑17	`Docs:` docs/concepts, `Cookbook:` Langgraph Self Rag
`2310.06117v2` Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models	Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, et al.	2023‑10‑09	`Docs:` docs/concepts, `Template:` stepback-qa-prompting, `Cookbook:` Stepback-Qa
`2307.15337v3` Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation	Xuefei Ning, Zinan Lin, Zixuan Zhou, et al.	2023‑07‑28	`Template:` skeleton-of-thought
`2307.09288v2` Llama 2: Open Foundation and Fine-Tuned Chat Models	Hugo Touvron, Louis Martin, Kevin Stone, et al.	2023‑07‑18	`Cookbook:` Semi Structured Rag
`2307.03172v3` Lost in the Middle: How Language Models Use Long Contexts	Nelson F. Liu, Kevin Lin, John Hewitt, et al.	2023‑07‑06	`Docs:` docs/how_to/long_context_reorder
`2305.14283v3` Query Rewriting for Retrieval-Augmented Large Language Models	Xinbei Ma, Yeyun Gong, Pengcheng He, et al.	2023‑05‑23	`Template:` rewrite-retrieve-read, `Cookbook:` Rewrite
`2305.08291v1` Large Language Model Guided Tree-of-Thought	Jieyi Long	2023‑05‑15	`API:` langchain_experimental.tot, `Cookbook:` Tree Of Thought
`2305.04091v3` Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models	Lei Wang, Wanyu Xu, Yihuai Lan, et al.	2023‑05‑06	`Cookbook:` Plan And Execute Agent
`2305.02156v1` Zero-Shot Listwise Document Reranking with a Large Language Model	Xueguang Ma, Xinyu Zhang, Ronak Pradeep, et al.	2023‑05‑03	`Docs:` docs/how_to/contextual_compression, `API:` langchain...LLMListwiseRerank
`2304.08485v2` Visual Instruction Tuning	Haotian Liu, Chunyuan Li, Qingyang Wu, et al.	2023‑04‑17	`Cookbook:` Semi Structured Multi Modal Rag Llama2, Semi Structured And Multi Modal Rag
`2304.03442v2` Generative Agents: Interactive Simulacra of Human Behavior	Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, et al.	2023‑04‑07	`Cookbook:` Generative Agents Interactive Simulacra Of Human Behavior, Multiagent Bidding
`2303.17760v2` CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society	Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, et al.	2023‑03‑31	`Cookbook:` Camel Role Playing
`2303.17580v4` HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face	Yongliang Shen, Kaitao Song, Xu Tan, et al.	2023‑03‑30	`API:` langchain_experimental.autonomous_agents, `Cookbook:` Hugginggpt
`2301.10226v4` A Watermark for Large Language Models	John Kirchenbauer, Jonas Geiping, Yuxin Wen, et al.	2023‑01‑24	`API:` langchain_community...OCIModelDeploymentTGI, langchain_huggingface...HuggingFaceEndpoint, langchain_community...HuggingFaceTextGenInference, langchain_community...HuggingFaceEndpoint
`2212.10496v1` Precise Zero-Shot Dense Retrieval without Relevance Labels	Luyu Gao, Xueguang Ma, Jimmy Lin, et al.	2022‑12‑20	`Docs:` docs/concepts, `API:` langchain...HypotheticalDocumentEmbedder, `Template:` hyde, `Cookbook:` Hypothetical Document Embeddings
`2212.08073v1` Constitutional AI: Harmlessness from AI Feedback	Yuntao Bai, Saurav Kadavath, Sandipan Kundu, et al.	2022‑12‑15	`Docs:` docs/versions/migrating_chains/constitutional_chain
`2212.07425v3` Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments	Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande, et al.	2022‑12‑12	`API:` langchain_experimental.fallacy_removal
`2211.13892v2` Complementary Explanations for Effective In-Context Learning	Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, et al.	2022‑11‑25	`API:` langchain_core...MaxMarginalRelevanceExampleSelector
`2211.10435v2` PAL: Program-aided Language Models	Luyu Gao, Aman Madaan, Shuyan Zhou, et al.	2022‑11‑18	`API:` langchain_experimental.pal_chain, langchain_experimental...PALChain, `Cookbook:` Program Aided Language Model
`2210.11934v2` An Analysis of Fusion Functions for Hybrid Retrieval	Sebastian Bruch, Siyu Gai, Amir Ingber	2022‑10‑21	`Docs:` docs/concepts
`2210.03629v3` ReAct: Synergizing Reasoning and Acting in Language Models	Shunyu Yao, Jeffrey Zhao, Dian Yu, et al.	2022‑10‑06	`Docs:` docs/integrations/tools/ionic_shopping, docs/integrations/providers/cohere, docs/concepts, `API:` langchain...create_react_agent, langchain...TrajectoryEvalChain
`2209.10785v2` Deep Lake: a Lakehouse for Deep Learning	Sasun Hambardzumyan, Abhinav Tuli, Levon Ghukasyan, et al.	2022‑09‑22	`Docs:` docs/integrations/providers/activeloop_deeplake
`2205.13147v4` Matryoshka Representation Learning	Aditya Kusupati, Gantavya Bhatt, Aniket Rege, et al.	2022‑05‑26	`Docs:` docs/integrations/providers/snowflake
`2205.12654v1` Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages	Kevin Heffernan, Onur Çelebi, Holger Schwenk	2022‑05‑25	`API:` langchain_community...LaserEmbeddings
`2204.00498v1` Evaluating the Text-to-SQL Capabilities of Large Language Models	Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau	2022‑03‑15	`Docs:` docs/tutorials/sql_qa, `API:` langchain_community...SQLDatabase, langchain_community...SparkSQL
`2202.00666v5` Locally Typical Sampling	Clara Meister, Tiago Pimentel, Gian Wiher, et al.	2022‑02‑01	`API:` langchain_huggingface...HuggingFaceEndpoint, langchain_community...HuggingFaceTextGenInference, langchain_community...HuggingFaceEndpoint
`2112.01488v3` ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction	Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, et al.	2021‑12‑02	`Docs:` docs/integrations/retrievers/ragatouille, docs/integrations/providers/ragatouille, docs/concepts
`2103.00020v1` Learning Transferable Visual Models From Natural Language Supervision	Alec Radford, Jong Wook Kim, Chris Hallacy, et al.	2021‑02‑26	`API:` langchain_experimental.open_clip
`2005.14165v4` Language Models are Few-Shot Learners	Tom B. Brown, Benjamin Mann, Nick Ryder, et al.	2020‑05‑28	`Docs:` docs/concepts
`2005.11401v4` Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	Patrick Lewis, Ethan Perez, Aleksandra Piktus, et al.	2020‑05‑22	`Docs:` docs/concepts
`1909.05858v2` CTRL: A Conditional Transformer Language Model for Controllable Generation	Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, et al.	2019‑09‑11	`API:` langchain_huggingface...HuggingFaceEndpoint, langchain_community...HuggingFaceTextGenInference, langchain_community...HuggingFaceEndpoint

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, et al.
arXiv id: 2403.14403v2 Published Date: 2024-03-21
LangChain:
- Documentation: docs/concepts

Abstract: Retrieval-Augmented Large Language Models (LLMs), which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA). However, even though there are various approaches dealing with queries of different complexities, they either handle simple queries with unnecessary computational overhead or fail to adequately address complex multi-step queries; yet, not all user requests fall into only one of the simple or complex categories. In this work, we propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs from the simplest to the most sophisticated ones based on the query complexity. Also, this selection process is operationalized with a classifier, which is a smaller LM trained to predict the complexity level of incoming queries with automatically collected labels, obtained from actual predicted outcomes of models and inherent inductive biases in datasets. This approach offers a balanced strategy, seamlessly adapting between the iterative and single-step retrieval-augmented LLMs, as well as the no-retrieval methods, in response to a range of query complexities. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems, compared to relevant baselines including the adaptive retrieval approaches. Code is available at: https://github.com/starsuzi/Adaptive-RAG.

Self-Discover: Large Language Models Self-Compose Reasoning Structures

Authors: Pei Zhou, Jay Pujara, Xiang Ren, et al.
arXiv id: 2402.03620v1 Published Date: 2024-02-06
LangChain:
- Cookbook: self-discover

Abstract: We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

RAG-Fusion: a New Take on Retrieval-Augmented Generation

Authors: Zackary Rackauckas
arXiv id: 2402.03367v2 Published Date: 2024-01-31
LangChain:
- Documentation: docs/concepts

Abstract: Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots, but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal scores and fusing the documents and scores. Through manually evaluating answers on accuracy, relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and comprehensive answers due to the generated queries contextualizing the original query from various perspectives. However, some answers strayed off topic when the generated queries' relevance to the original query is insufficient. This research marks significant progress in artificial intelligence (AI) and natural language processing (NLP) applications and demonstrates transformations in a global and multi-industry context.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Authors: Parth Sarthi, Salman Abdullah, Aditi Tuli, et al.
arXiv id: 2401.18059v1 Published Date: 2024-01-31
LangChain:
- Cookbook: RAPTOR

Abstract: Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

Corrective Retrieval Augmented Generation

Authors: Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, et al.
arXiv id: 2401.15884v2 Published Date: 2024-01-29
LangChain:
- Documentation: docs/concepts
- Cookbook: langgraph_crag

Abstract: Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering

Authors: Tal Ridnik, Dedy Kredo, Itamar Friedman
arXiv id: 2401.08500v1 Published Date: 2024-01-16
LangChain:
- Documentation: docs/concepts

Abstract: Code generation problems differ from common natural language problems - they require matching the exact syntax of the target language, identifying happy paths and edge cases, paying attention to numerous small details in the problem spec, and addressing other code-specific issues and requirements. Hence, many of the optimizations and tricks that have been successful in natural language generation may not be effective for code tasks. In this work, we propose a new approach to code generation by LLMs, which we call AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems. We tested AlphaCodium on a challenging code generation dataset called CodeContests, which includes competitive programming problems from platforms such as Codeforces. The proposed flow consistently and significantly improves results. On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Many of the principles and best practices acquired in this work, we believe, are broadly applicable to general code generation tasks. Full implementation is available at: https://github.com/Codium-ai/AlphaCodium

Mixtral of Experts

Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, et al.
arXiv id: 2401.04088v1 Published Date: 2024-01-08
LangChain:
- Cookbook: together_ai

Abstract: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks. Both the base and instruct models are released under the Apache 2.0 license.

Dense X Retrieval: What Retrieval Granularity Should We Use?

Authors: Tong Chen, Hongwei Wang, Sihao Chen, et al.
arXiv id: 2312.06648v2 Published Date: 2023-12-11
LangChain:
- Template: propositional-retrieval

Abstract: Dense retrieval has become a prominent method to obtain relevant context or world knowledge in open-domain NLP tasks. When we use a learned dense retriever on a retrieval corpus at inference time, an often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We discover that the retrieval unit choice significantly impacts the performance of both retrieval and downstream tasks. Distinct from the typical approach of using passages or sentences, we introduce a novel retrieval unit, proposition, for dense retrieval. Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid and presented in a concise, self-contained natural language format. We conduct an empirical comparison of different retrieval granularity. Our results reveal that proposition-based retrieval significantly outperforms traditional passage or sentence-based methods in dense retrieval. Moreover, retrieval by proposition also enhances the performance of downstream QA tasks, since the retrieved texts are more condensed with question-relevant information, reducing the need for lengthy input tokens and minimizing the inclusion of extraneous, irrelevant information.

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models

Authors: Wenhao Yu, Hongming Zhang, Xiaoman Pan, et al.
arXiv id: 2311.09210v1 Published Date: 2023-11-15
LangChain:
- Template: chain-of-note-wiki

Abstract: Retrieval-augmented language models (RALMs) represent a substantial advancement in the capabilities of large language models, notably in reducing factual hallucination by leveraging external knowledge sources. However, the reliability of the retrieved information is not always guaranteed. The retrieval of irrelevant data can lead to misguided responses, and potentially causing the model to overlook its inherent knowledge, even when it possesses adequate information to address the query. Moreover, standard RALMs often struggle to assess whether they possess adequate knowledge, both intrinsic and retrieved, to provide an accurate answer. In situations where knowledge is lacking, these systems should ideally respond with "unknown" when the answer is unattainable. In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios. The core idea of CoN is to generate sequential reading notes for retrieved documents, enabling a thorough evaluation of their relevance to the given question and integrating this information to formulate the final answer. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. Our experiments across four open-domain QA benchmarks show that RALMs equipped with CoN significantly outperform standard RALMs. Notably, CoN achieves an average improvement of +7.9 in EM score given entirely noisy retrieved documents and +10.5 in rejection rates for real-time questions that fall outside the pre-training knowledge scope.

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Authors: Akari Asai, Zeqiu Wu, Yizhong Wang, et al.
arXiv id: 2310.11511v1 Published Date: 2023-10-17
LangChain:
- Documentation: docs/concepts
- Cookbook: langgraph_self_rag

Abstract: Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that augments LMs with retrieval of relevant knowledge, decreases such issues. However, indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. Experiments show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models.

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

Authors: Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, et al.
arXiv id: 2310.06117v2 Published Date: 2023-10-09
LangChain:
- Documentation: docs/concepts
- Template: stepback-qa-prompting
- Cookbook: stepback-qa

Abstract: We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution. We conduct experiments of Step-Back Prompting with PaLM-2L, GPT-4 and Llama2-70B models, and observe substantial performance gains on various challenging reasoning-intensive tasks including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back Prompting improves PaLM-2L performance on MMLU (Physics and Chemistry) by 7% and 11% respectively, TimeQA by 27%, and MuSiQue by 7%.

Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation

Authors: Xuefei Ning, Zinan Lin, Zixuan Zhou, et al.
arXiv id: 2307.15337v3 Published Date: 2023-07-28
LangChain:
- Template: skeleton-of-thought

Abstract: This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One of the major causes of the high generation latency is the sequential decoding approach adopted by almost all state-of-the-art LLMs. In this work, motivated by the thinking and writing process of humans, we propose Skeleton-of-Thought (SoT), which first guides LLMs to generate the skeleton of the answer, and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point in parallel. Not only does SoT provide considerable speed-ups across 12 LLMs, but it can also potentially improve the answer quality on several question categories. SoT is an initial attempt at data-centric optimization for inference efficiency, and showcases the potential of eliciting high-quality answers by explicitly planning the answer structure in language.

Llama 2: Open Foundation and Fine-Tuned Chat Models

Authors: Hugo Touvron, Louis Martin, Kevin Stone, et al.
arXiv id: 2307.09288v2 Published Date: 2023-07-18
LangChain:
- Cookbook: Semi_Structured_RAG

Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

Lost in the Middle: How Language Models Use Long Contexts

Authors: Nelson F. Liu, Kevin Lin, John Hewitt, et al.
arXiv id: 2307.03172v3 Published Date: 2023-07-06
LangChain:
- Documentation: docs/how_to/long_context_reorder

Abstract: While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.

Query Rewriting for Retrieval-Augmented Large Language Models

Authors: Xinbei Ma, Yeyun Gong, Pengcheng He, et al.
arXiv id: 2305.14283v3 Published Date: 2023-05-23
LangChain:
- Template: rewrite-retrieve-read
- Cookbook: rewrite

Abstract: Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline, making remarkable progress in knowledge-intensive tasks. This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs from the perspective of the query rewriting. Unlike prior studies focusing on adapting either the retriever or the reader, our approach pays attention to the adaptation of the search query itself, for there is inevitably a gap between the input text and the needed knowledge in retrieval. We first prompt an LLM to generate the query, then use a web search engine to retrieve contexts. Furthermore, to better align the query to the frozen modules, we propose a trainable scheme for our pipeline. A small language model is adopted as a trainable rewriter to cater to the black-box LLM reader. The rewriter is trained using the feedback of the LLM reader by reinforcement learning. Evaluation is conducted on downstream tasks, open-domain QA and multiple-choice QA. Experiments results show consistent performance improvement, indicating that our framework is proven effective and scalable, and brings a new framework for retrieval-augmented LLM.

Large Language Model Guided Tree-of-Thought

Authors: Jieyi Long
arXiv id: 2305.08291v1 Published Date: 2023-05-15
LangChain:
- API Reference: langchain_experimental.tot
- Cookbook: tree_of_thought

Abstract: In this paper, we introduce the Tree-of-Thought (ToT) framework, a novel approach aimed at improving the problem-solving capabilities of auto-regressive large language models (LLMs). The ToT technique is inspired by the human mind's approach for solving complex reasoning tasks through trial and error. In this process, the human mind explores the solution space through a tree-like thought process, allowing for backtracking when necessary. To implement ToT as a software system, we augment an LLM with additional modules including a prompter agent, a checker module, a memory module, and a ToT controller. In order to solve a given problem, these modules engage in a multi-round conversation with the LLM. The memory module records the conversation and state history of the problem solving process, which allows the system to backtrack to the previous steps of the thought-process and explore other directions from there. To verify the effectiveness of the proposed technique, we implemented a ToT-based solver for the Sudoku Puzzle. Experimental results show that the ToT framework can significantly increase the success rate of Sudoku puzzle solving. Our implementation of the ToT-based Sudoku solver is available on GitHub.

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Authors: Lei Wang, Wanyu Xu, Yihuai Lan, et al.
arXiv id: 2305.04091v3 Published Date: 2023-05-06
LangChain:
- Cookbook: plan_and_execute_agent

Abstract: Large language models (LLMs) have recently been shown to deliver impressive performance in various NLP tasks. To tackle multi-step reasoning tasks, few-shot chain-of-thought (CoT) prompting includes a few manually crafted step-by-step reasoning demonstrations which enable LLMs to explicitly generate reasoning steps and improve their reasoning task accuracy. To eliminate the manual effort, Zero-shot-CoT concatenates the target problem statement with "Let's think step by step" as an input prompt to LLMs. Despite the success of Zero-shot-CoT, it still suffers from three pitfalls: calculation errors, missing-step errors, and semantic misunderstanding errors. To address the missing-step errors, we propose Plan-and-Solve (PS) Prompting. It consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. To address the calculation errors and improve the quality of generated reasoning steps, we extend PS prompting with more detailed instructions and derive PS+ prompting. We evaluate our proposed prompting strategy on ten datasets across three reasoning problems. The experimental results over GPT-3 show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin, is comparable to or exceeds Zero-shot-Program-of-Thought Prompting, and has comparable performance with 8-shot CoT prompting on the math reasoning problem. The code can be found at https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.

Zero-Shot Listwise Document Reranking with a Large Language Model

Authors: Xueguang Ma, Xinyu Zhang, Ronak Pradeep, et al.
arXiv id: 2305.02156v1 Published Date: 2023-05-03
LangChain:
- Documentation: docs/how_to/contextual_compression
- API Reference: langchain...LLMListwiseRerank

Abstract: Supervised ranking methods based on bi-encoder or cross-encoder architectures have shown success in multi-stage text ranking tasks, but they require large amounts of relevance judgments as training data. In this work, we propose Listwise Reranker with a Large Language Model (LRL), which achieves strong reranking effectiveness without using any task-specific training data. Different from the existing pointwise ranking methods, where documents are scored independently and ranked according to the scores, LRL directly generates a reordered list of document identifiers given the candidate documents. Experiments on three TREC web search datasets demonstrate that LRL not only outperforms zero-shot pointwise methods when reranking first-stage retrieval results, but can also act as a final-stage reranker to improve the top-ranked results of a pointwise method for improved efficiency. Additionally, we apply our approach to subsets of MIRACL, a recent multilingual retrieval dataset, with results showing its potential to generalize across different languages.

Visual Instruction Tuning

Authors: Haotian Liu, Chunyuan Li, Qingyang Wu, et al.
arXiv id: 2304.08485v2 Published Date: 2023-04-17
LangChain:
- Cookbook: Semi_structured_multi_modal_RAG_LLaMA2, Semi_structured_and_multi_modal_RAG

Abstract: Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.

Generative Agents: Interactive Simulacra of Human Behavior

Authors: Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, et al.
arXiv id: 2304.03442v2 Published Date: 2023-04-07
LangChain:
- Cookbook: generative_agents_interactive_simulacra_of_human_behavior, multiagent_bidding

Abstract: Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

Authors: Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, et al.
arXiv id: 2303.17760v2 Published Date: 2023-03-31
LangChain:
- Cookbook: camel_role_playing

Abstract: The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond: https://github.com/camel-ai/camel.

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Authors: Yongliang Shen, Kaitao Song, Xu Tan, et al.
arXiv id: 2303.17580v4 Published Date: 2023-03-30
LangChain:
- API Reference: langchain_experimental.autonomous_agents
- Cookbook: hugginggpt

Abstract: Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are numerous AI models available for various domains and modalities, they cannot handle complicated AI tasks autonomously. Considering large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks, with language serving as a generic interface to empower this. Based on this philosophy, we present HuggingGPT, an LLM-powered agent that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT can tackle a wide range of sophisticated AI tasks spanning different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards the realization of artificial general intelligence.

A Watermark for Large Language Models

Authors: John Kirchenbauer, Jonas Geiping, Yuxin Wen, et al.
arXiv id: 2301.10226v4 Published Date: 2023-01-24
LangChain:
- API Reference: langchain_community...OCIModelDeploymentTGI, langchain_huggingface...HuggingFaceEndpoint, langchain_community...HuggingFaceTextGenInference, langchain_community...HuggingFaceEndpoint

Abstract: Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.

Precise Zero-Shot Dense Retrieval without Relevance Labels

Authors: Luyu Gao, Xueguang Ma, Jimmy Lin, et al.
arXiv id: 2212.10496v1 Published Date: 2022-12-20
LangChain:
- Documentation: docs/concepts
- API Reference: langchain...HypotheticalDocumentEmbedder
- Template: hyde
- Cookbook: hypothetical_document_embeddings

Abstract: While dense retrieval has been shown effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance label is available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings~(HyDE). Given a query, HyDE first zero-shot instructs an instruction-following language model (e.g. InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is unreal and may contain false details. Then, an unsupervised contrastively learned encoder~(e.g. Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity. This second step ground the generated document to the actual corpus, with the encoder's dense bottleneck filtering out the incorrect details. Our experiments show that HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever and shows strong performance comparable to fine-tuned retrievers, across various tasks (e.g. web search, QA, fact verification) and languages~(e.g. sw, ko, ja).

Constitutional AI: Harmlessness from AI Feedback

Authors: Yuntao Bai, Saurav Kadavath, Sandipan Kundu, et al.
arXiv id: 2212.08073v1 Published Date: 2022-12-15
LangChain:
- Documentation: docs/versions/migrating_chains/constitutional_chain

Abstract: As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels.

Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments

Authors: Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande, et al.
arXiv id: 2212.07425v3 Published Date: 2022-12-12
LangChain:
- API Reference: langchain_experimental.fallacy_removal

Abstract: The spread of misinformation, propaganda, and flawed argumentation has been amplified in the Internet era. Given the volume of data and the subtlety of identifying violations of argumentation norms, supporting information analytics tasks, like content moderation, with trustworthy methods that can identify logical fallacies is essential. In this paper, we formalize prior theoretical work on logical fallacies into a comprehensive three-stage evaluation framework of detection, coarse-grained, and fine-grained classification. We adapt existing evaluation datasets for each stage of the evaluation. We employ three families of robust and explainable methods based on prototype reasoning, instance-based reasoning, and knowledge injection. The methods combine language models with background knowledge and explainable mechanisms. Moreover, we address data sparsity with strategies for data augmentation and curriculum learning. Our three-stage framework natively consolidates prior datasets and methods from existing tasks, like propaganda detection, serving as an overarching evaluation testbed. We extensively evaluate these methods on our datasets, focusing on their robustness and explainability. Our results provide insight into the strengths and weaknesses of the methods on different components and fallacy classes, indicating that fallacy identification is a challenging task that may require specialized forms of reasoning to capture various classes. We share our open-source code and data on GitHub to support further work on logical fallacy identification.

Complementary Explanations for Effective In-Context Learning

Authors: Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, et al.
arXiv id: 2211.13892v2 Published Date: 2022-11-25
LangChain:
- API Reference: langchain_core...MaxMarginalRelevanceExampleSelector

Abstract: Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts, but there has been limited understanding of exactly how these explanations function or why they are effective. This work aims to better understand the mechanisms by which explanations are used for in-context learning. We first study the impact of two different factors on the performance of prompts with explanations: the computation trace (the way the solution is decomposed) and the natural language used to express the prompt. By perturbing explanations on three controlled tasks, we show that both factors contribute to the effectiveness of explanations. We further study how to form maximally effective sets of explanations for solving a given test query. We find that LLMs can benefit from the complementarity of the explanation set: diverse reasoning skills shown by different exemplars can lead to better performance. Therefore, we propose a maximal marginal relevance-based exemplar selection approach for constructing exemplar sets that are both relevant as well as complementary, which successfully improves the in-context learning performance across three real-world tasks on multiple LLMs.

PAL: Program-aided Language Models

Authors: Luyu Gao, Aman Madaan, Shuyan Zhou, et al.
arXiv id: 2211.10435v2 Published Date: 2022-11-18
LangChain:
- API Reference: langchain_experimental.pal_chain, langchain_experimental...PALChain
- Cookbook: program_aided_language_model

Abstract: Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and other benchmarks. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1. Our code and data are publicly available at http://reasonwithpal.com/ .

An Analysis of Fusion Functions for Hybrid Retrieval

Authors: Sebastian Bruch, Siyu Gai, Amir Ingber
arXiv id: 2210.11934v2 Published Date: 2022-10-21
LangChain:
- Documentation: docs/concepts

Abstract: We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a CC fusion is generally agnostic to the choice of score normalization; that CC outperforms RRF in in-domain and out-of-domain settings; and finally, that CC is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.

ReAct: Synergizing Reasoning and Acting in Language Models

Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, et al.
arXiv id: 2210.03629v3 Published Date: 2022-10-06
LangChain:
- Documentation: docs/integrations/tools/ionic_shopping, docs/integrations/providers/cohere, docs/concepts
- API Reference: langchain...create_react_agent, langchain...TrajectoryEvalChain

Abstract: While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples. Project site with code: https://react-lm.github.io

Deep Lake: a Lakehouse for Deep Learning

Authors: Sasun Hambardzumyan, Abhinav Tuli, Levon Ghukasyan, et al.
arXiv id: 2209.10785v2 Published Date: 2022-09-22
LangChain:
- Documentation: docs/integrations/providers/activeloop_deeplake

Abstract: Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage increases, traditional data lakes are not well-designed for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.

Matryoshka Representation Learning

Authors: Aditya Kusupati, Gantavya Bhatt, Aniket Rege, et al.
arXiv id: 2205.13147v4 Published Date: 2022-05-26
LangChain:
- Documentation: docs/integrations/providers/snowflake

Abstract: Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

Authors: Kevin Heffernan, Onur Çelebi, Holger Schwenk
arXiv id: 2205.12654v1 Published Date: 2022-05-25
LangChain:
- API Reference: langchain_community...LaserEmbeddings

Abstract: Scaling multilingual representation learning beyond the hundred most frequent languages is challenging, in particular to cover the long tail of low-resource languages. A promising approach has been to train one-for-all multilingual models capable of cross-lingual transfer, but these models often suffer from insufficient capacity and interference between unrelated languages. Instead, we move away from this approach and focus on training multiple language (family) specific representations, but most prominently enable all languages to still be encoded in the same representational space. To achieve this, we focus on teacher-student training, allowing all encoders to be mutually compatible for bitext mining, and enabling fast learning of new languages. We introduce a new teacher-student training scheme which combines supervised and self-supervised training, allowing encoders to take advantage of monolingual training data, which is valuable in the low-resource setting. Our approach significantly outperforms the original LASER encoder. We study very low-resource languages and handle 50 African languages, many of which are not covered by any other model. For these languages, we train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.

Evaluating the Text-to-SQL Capabilities of Large Language Models

Authors: Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau
arXiv id: 2204.00498v1 Published Date: 2022-03-15
LangChain:
- Documentation: docs/tutorials/sql_qa
- API Reference: langchain_community...SQLDatabase, langchain_community...SparkSQL

Abstract: We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model. We find that, without any finetuning, Codex is a strong baseline on the Spider benchmark; we also analyze the failure modes of Codex in this setting. Furthermore, we demonstrate on the GeoQuery and Scholar benchmarks that a small number of in-domain examples provided in the prompt enables Codex to perform better than state-of-the-art models finetuned on such few-shot examples.

Locally Typical Sampling

Authors: Clara Meister, Tiago Pimentel, Gian Wiher, et al.
arXiv id: 2202.00666v5 Published Date: 2022-02-01
LangChain:
- API Reference: langchain_huggingface...HuggingFaceEndpoint, langchain_community...HuggingFaceTextGenInference, langchain_community...HuggingFaceEndpoint

Abstract: Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language generation as a discrete stochastic process--which allows for an information-theoretic analysis--can provide new insights into the behavior of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, aiming to do so in a simultaneously efficient and error-minimizing manner; in fact, psycholinguistics research suggests humans choose each word in a string with this subconscious goal in mind. We formally define the set of strings that meet this criterion: those for which each word has an information content close to the expected information content, i.e., the conditional entropy of our model. We then propose a simple and efficient procedure for enforcing this criterion when generating from probabilistic models, which we call locally typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, locally typical sampling offers competitive performance (in both abstractive summarization and story generation) in terms of quality while consistently reducing degenerate repetitions.

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Authors: Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, et al.
arXiv id: 2112.01488v3 Published Date: 2021-12-02
LangChain:
- Documentation: docs/integrations/retrievers/ragatouille, docs/integrations/providers/ragatouille, docs/concepts

Abstract: Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been shown to make late interaction more effective, but it inflates the space footprint of these models by an order of magnitude. In this work, we introduce ColBERTv2, a retriever that couples an aggressive residual compression mechanism with a denoised supervision strategy to simultaneously improve the quality and space footprint of late interaction. We evaluate ColBERTv2 across a wide range of benchmarks, establishing state-of-the-art quality within and outside the training domain while reducing the space footprint of late interaction models by 6--10$\times$.

Learning Transferable Visual Models From Natural Language Supervision

Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, et al.
arXiv id: 2103.00020v1 Published Date: 2021-02-26
LangChain:
- API Reference: langchain_experimental.open_clip

Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.

Language Models are Few-Shot Learners

Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, et al.
arXiv id: 2005.14165v4 Published Date: 2020-05-28
LangChain:
- Documentation: docs/concepts

Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Authors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, et al.
arXiv id: 2005.11401v4 Published Date: 2020-05-22
LangChain:
- Documentation: docs/concepts

Abstract: Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

CTRL: A Conditional Transformer Language Model for Controllable Generation

Authors: Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, et al.
arXiv id: 1909.05858v2 Published Date: 2019-09-11
LangChain:
- API Reference: langchain_huggingface...HuggingFaceEndpoint, langchain_community...HuggingFaceTextGenInference, langchain_community...HuggingFaceEndpoint

Abstract: Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at https://github.com/salesforce/ctrl.

Summary​

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity​

Self-Discover: Large Language Models Self-Compose Reasoning Structures​

RAG-Fusion: a New Take on Retrieval-Augmented Generation​

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval​

Corrective Retrieval Augmented Generation​

Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering​

Mixtral of Experts​

Dense X Retrieval: What Retrieval Granularity Should We Use?​

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models​

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection​

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models​

Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation​

Llama 2: Open Foundation and Fine-Tuned Chat Models​

Lost in the Middle: How Language Models Use Long Contexts​

Query Rewriting for Retrieval-Augmented Large Language Models​

Large Language Model Guided Tree-of-Thought​

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models​

Zero-Shot Listwise Document Reranking with a Large Language Model​

Visual Instruction Tuning​

Generative Agents: Interactive Simulacra of Human Behavior​

CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society​

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face​

A Watermark for Large Language Models​

Precise Zero-Shot Dense Retrieval without Relevance Labels​

Constitutional AI: Harmlessness from AI Feedback​

Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments​

Complementary Explanations for Effective In-Context Learning​

PAL: Program-aided Language Models​

An Analysis of Fusion Functions for Hybrid Retrieval​

ReAct: Synergizing Reasoning and Acting in Language Models​

Deep Lake: a Lakehouse for Deep Learning​

Matryoshka Representation Learning​

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages​

Evaluating the Text-to-SQL Capabilities of Large Language Models​

Locally Typical Sampling​

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction​

Learning Transferable Visual Models From Natural Language Supervision​

Language Models are Few-Shot Learners​

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks​

CTRL: A Conditional Transformer Language Model for Controllable Generation​