Runnable Interface Link to heading

From Langchain documentation, there are a few abstractions that provide a consistent API for applications. For example, ChatModel takes a list of strings (or a list of chat messages, PromptValue) and generates ChatMessage.

Prompt Dictionary PromptValue ChatModel Single string, list of chat messages or a PromptValue ChatMessage LLM Single string, list of chat messages or a PromptValue String OutputParser The output of an LLM or ChatModel Depends on the parser Retriever Single string List of Documents Tool Single string or dictionary, depending on the tool Depends on the tool

The Runnable interface is a Langchain abstraction to create objects and chain them.

Runnable interface To make it as easy as possible to create custom chains, we’ve implemented a “Runnable” protocol. Many LangChain components implement the Runnable protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about in this section.

This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way. The standard interface includes:

stream: stream back chunks of the response invoke: call the chain on an input batch: call the chain on a list of inputs

Ollama Deep Dive Link to heading

The smallest example to run Langchain can be done with Ollama as follows:

from langchain_community.llms import Ollama
input = input("What is your question?")
llm = Ollama(model="llama2")
res = llm.invoke(input)
print(res)

Let’s start with the Ollama LLM object used in the above example defined in libs/community/langchain_community/llms/ollama.py

class Ollama(BaseLLM, _OllamaCommon):
    """Ollama locally runs large language models.

    To use, follow the instructions at https://ollama.ai/.

    Example:
        .. code-block:: python

            from langchain_community.llms import Ollama
            ollama = Ollama(model="llama2")

Tracing the Ollama class hierarchy all the way up to the Pydantic class (need to do more Pydantic cough cough)

class BaseLLM(BaseLanguageModel[str], ABC):
    """Base LLM abstract interface."""

class BaseLanguageModel(
    RunnableSerializable[LanguageModelInput, LanguageModelOutputVar], ABC
):

class RunnableSerializable(Serializable, Runnable[Input, Output]):
    """Runnable that can be serialized to JSON."""

class Serializable(BaseModel, ABC):
    """Serializable base class."""

Back to libs/core/langchain_core/language_models/llms.py, where _OllamaCommon defines some Ollama-specific attributes.

class _OllamaCommon(BaseLanguageModel):
    base_url: str = "http://localhost:11434"
    """Base URL the model is hosted under."""

So, let’s look at invoke. The Ollama LLM object gets it from libs/core/langchain_core/language_models/llms.py. Here is the stack of function calls starting from invoke

    def invoke(
        self,
        input: LanguageModelInput,
        config: Optional[RunnableConfig] = None,
        *,
        stop: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> str:
        config = ensure_config(config)
        return (
            self.generate_prompt(
                [self._convert_input(input)],
                stop=stop,
                callbacks=config.get("callbacks"),
                tags=config.get("tags"),
                metadata=config.get("metadata"),
                run_name=config.get("run_name"),
                run_id=config.pop("run_id", None),
                **kwargs,
            )
            .generations[0][0]
            .text
        )


    def generate_prompt(
        ...
    ) -> LLMResult:
        prompt_strings = [p.to_string() for p in prompts]
        return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)


    def generate(
        ...
    )
    def _generate_helper(
        self,
        prompts: List[str],
        stop: Optional[List[str]],
        run_managers: List[CallbackManagerForLLMRun],
        new_arg_supported: bool,
        **kwargs: Any,
    ) -> LLMResult:
        try:
            output = (
                self._generate(
                    prompts,
                    stop=stop,
                    # TODO: support multiple run managers
                    run_manager=run_managers[0] if run_managers else None,
                    **kwargs,
                )
                if new_arg_supported
                else self._generate(prompts, stop=stop)

And eventually _generate is called from Ollama which calls _stream_with_aggregation

    def _generate(  # type: ignore[override]
        self,
        prompts: List[str],
        stop: Optional[List[str]] = None,
        images: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> LLMResult:
        """Call out to Ollama's generate endpoint.

        Args:
            prompt: The prompt to pass into the model.
            stop: Optional list of stop words to use when generating.

        Returns:
            The string generated by the model.

        Example:
            .. code-block:: python

                response = ollama("Tell me a joke.")
        """
        # TODO: add caching here.
        generations = []
        for prompt in prompts:
            final_chunk = super()._stream_with_aggregation(...

Starting with _stream_with_aggregation calling a few functions until it calls the actual post on the Ollama REST API.

    def _stream_with_aggregation(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        verbose: bool = False,
        **kwargs: Any,
    ) -> GenerationChunk:
        final_chunk: Optional[GenerationChunk] = None
        for stream_resp in self._create_generate_stream(prompt, stop, **kwargs):

    def _create_generate_stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        images: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> Iterator[str]:
        payload = {"prompt": prompt, "images": images}
        yield from self._create_stream(
            payload=payload,
            stop=stop,
            api_url=f"{self.base_url}/api/generate",
            **kwargs,
        )

    def _create_stream(
        self,
        api_url: str,
        payload: Any,
        stop: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> Iterator[str]:
        if self.stop is not None and stop is not None:
            raise ValueError("`stop` found in both the input and default params.")
        elif self.stop is not None:
            stop = self.stop

...
        response = requests.post(
            url=api_url,
            headers={
                "Content-Type": "application/json",
                **(self.headers if isinstance(self.headers, dict) else {}),
            },
            json=request_payload,
            stream=True,
            timeout=self.timeout,
        )