rl_chain#

RL (Reinforcement Learning) Chain leverages the Vowpal Wabbit (VW) models for reinforcement learning with a context, with the goal of modifying the prompt before the LLM call.

[Vowpal Wabbit](https://vowpalwabbit.org/) provides fast, efficient, and flexible online machine learning techniques for reinforcement learning, supervised learning, and more.

Classes

rl_chain.base.AutoSelectionScorer

Auto selection scorer.

rl_chain.base.Embedder(*args, **kwargs)

Abstract class to represent an embedder.

rl_chain.base.Event(inputs[, selected])

Abstract class to represent an event.

rl_chain.base.Policy(**kwargs)

Abstract class to represent a policy.

rl_chain.base.RLChain

Chain that leverages the Vowpal Wabbit (VW) model as a learned policy for reinforcement learning.

rl_chain.base.Selected()

Abstract class to represent the selected item.

rl_chain.base.SelectionScorer

Abstract class to grade the chosen selection or the response of the llm.

rl_chain.base.VwPolicy(model_repo, vw_cmd, ...)

Vowpal Wabbit policy.

rl_chain.metrics.MetricsTrackerAverage(step)

Metrics Tracker Average.

rl_chain.metrics.MetricsTrackerRollingWindow(...)

Metrics Tracker Rolling Window.

rl_chain.model_repository.ModelRepository(folder)

Model Repository.

rl_chain.pick_best_chain.PickBest

Chain that leverages the Vowpal Wabbit (VW) model for reinforcement learning with a context, with the goal of modifying the prompt before the LLM call.

rl_chain.pick_best_chain.PickBestEvent(...)

Event class for PickBest chain.

rl_chain.pick_best_chain.PickBestFeatureEmbedder(...)

Embed the BasedOn and ToSelectFrom inputs into a format that can be used by the learning policy.

rl_chain.pick_best_chain.PickBestRandomPolicy(...)

Random policy for PickBest chain.

rl_chain.pick_best_chain.PickBestSelected([...])

Selected class for PickBest chain.

rl_chain.vw_logger.VwLogger(path)

Vowpal Wabbit custom logger.

Functions

rl_chain.base.BasedOn(anything)

Wrap a value to indicate that it should be based on.

rl_chain.base.Embed(anything[, keep])

Wrap a value to indicate that it should be embedded.

rl_chain.base.EmbedAndKeep(anything)

Wrap a value to indicate that it should be embedded and kept.

rl_chain.base.ToSelectFrom(anything)

Wrap a value to indicate that it should be selected from.

rl_chain.base.get_based_on_and_to_select_from(inputs)

Get the BasedOn and ToSelectFrom from the inputs.

rl_chain.base.parse_lines(parser, input_str)

Parse the input string into a list of examples.

rl_chain.base.prepare_inputs_for_autoembed(inputs)

Prepare the inputs for auto embedding.

rl_chain.helpers.embed(to_embed, model[, ...])

Embed the actions or context using the SentenceTransformer model (or a model that has an encode function).

rl_chain.helpers.embed_dict_type(item, model)

Embed a dictionary item.

rl_chain.helpers.embed_list_type(item, model)

Embed a list item.

rl_chain.helpers.embed_string_type(item, model)

Embed a string or an _Embed object.

rl_chain.helpers.is_stringtype_instance(item)

Check if an item is a string.

rl_chain.helpers.stringify_embedding(embedding)

Convert an embedding to a string.