Runnable Interface Link to heading
From Langchain documentation, there are a few abstractions that provide a consistent API for applications. For example, ChatModel
takes a list of strings (or a list of chat messages, PromptValue) and generates ChatMessage.
Prompt Dictionary PromptValue ChatModel Single string, list of chat messages or a PromptValue ChatMessage LLM Single string, list of chat messages or a PromptValue String OutputParser The output of an LLM or ChatModel Depends on the parser Retriever Single string List of Documents Tool Single string or dictionary, depending on the tool Depends on the tool
The Runnable interface is a Langchain abstraction to create objects and chain them.
Runnable interface To make it as easy as possible to create custom chains, we’ve implemented a “Runnable” protocol. Many LangChain components implement the Runnable protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about in this section.
This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way. The standard interface includes:
stream: stream back chunks of the response invoke: call the chain on an input batch: call the chain on a list of inputs
Ollama Deep Dive Link to heading
The smallest example to run Langchain can be done with Ollama
as follows:
from langchain_community.llms import Ollama
input = input("What is your question?")
llm = Ollama(model="llama2")
res = llm.invoke(input)
print(res)
Let’s start with the Ollama
LLM object used in the above example defined in libs/community/langchain_community/llms/ollama.py
class Ollama(BaseLLM, _OllamaCommon):
"""Ollama locally runs large language models.
To use, follow the instructions at https://ollama.ai/.
Example:
.. code-block:: python
from langchain_community.llms import Ollama
ollama = Ollama(model="llama2")
Tracing the Ollama
class hierarchy all the way up to the Pydantic class (need to do more Pydantic cough cough)
class BaseLLM(BaseLanguageModel[str], ABC):
"""Base LLM abstract interface."""
class BaseLanguageModel(
RunnableSerializable[LanguageModelInput, LanguageModelOutputVar], ABC
):
class RunnableSerializable(Serializable, Runnable[Input, Output]):
"""Runnable that can be serialized to JSON."""
class Serializable(BaseModel, ABC):
"""Serializable base class."""
Back to libs/core/langchain_core/language_models/llms.py
, where _OllamaCommon
defines some Ollama-specific attributes.
class _OllamaCommon(BaseLanguageModel):
base_url: str = "http://localhost:11434"
"""Base URL the model is hosted under."""
So, let’s look at invoke
. The Ollama LLM object gets it from libs/core/langchain_core/language_models/llms.py
. Here is the stack of function calls starting from invoke
def invoke(
self,
input: LanguageModelInput,
config: Optional[RunnableConfig] = None,
*,
stop: Optional[List[str]] = None,
**kwargs: Any,
) -> str:
config = ensure_config(config)
return (
self.generate_prompt(
[self._convert_input(input)],
stop=stop,
callbacks=config.get("callbacks"),
tags=config.get("tags"),
metadata=config.get("metadata"),
run_name=config.get("run_name"),
run_id=config.pop("run_id", None),
**kwargs,
)
.generations[0][0]
.text
)
def generate_prompt(
...
) -> LLMResult:
prompt_strings = [p.to_string() for p in prompts]
return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
def generate(
...
)
def _generate_helper(
self,
prompts: List[str],
stop: Optional[List[str]],
run_managers: List[CallbackManagerForLLMRun],
new_arg_supported: bool,
**kwargs: Any,
) -> LLMResult:
try:
output = (
self._generate(
prompts,
stop=stop,
# TODO: support multiple run managers
run_manager=run_managers[0] if run_managers else None,
**kwargs,
)
if new_arg_supported
else self._generate(prompts, stop=stop)
And eventually _generate
is called from Ollama which calls _stream_with_aggregation
def _generate( # type: ignore[override]
self,
prompts: List[str],
stop: Optional[List[str]] = None,
images: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> LLMResult:
"""Call out to Ollama's generate endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
response = ollama("Tell me a joke.")
"""
# TODO: add caching here.
generations = []
for prompt in prompts:
final_chunk = super()._stream_with_aggregation(...
Starting with _stream_with_aggregation
calling a few functions until it calls the actual post
on the Ollama REST API.
def _stream_with_aggregation(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
verbose: bool = False,
**kwargs: Any,
) -> GenerationChunk:
final_chunk: Optional[GenerationChunk] = None
for stream_resp in self._create_generate_stream(prompt, stop, **kwargs):
def _create_generate_stream(
self,
prompt: str,
stop: Optional[List[str]] = None,
images: Optional[List[str]] = None,
**kwargs: Any,
) -> Iterator[str]:
payload = {"prompt": prompt, "images": images}
yield from self._create_stream(
payload=payload,
stop=stop,
api_url=f"{self.base_url}/api/generate",
**kwargs,
)
def _create_stream(
self,
api_url: str,
payload: Any,
stop: Optional[List[str]] = None,
**kwargs: Any,
) -> Iterator[str]:
if self.stop is not None and stop is not None:
raise ValueError("`stop` found in both the input and default params.")
elif self.stop is not None:
stop = self.stop
...
response = requests.post(
url=api_url,
headers={
"Content-Type": "application/json",
**(self.headers if isinstance(self.headers, dict) else {}),
},
json=request_payload,
stream=True,
timeout=self.timeout,
)