This is a quick hello world to run local model with Ollama.

Ollama docker Link to heading

The simplest way is running Ollama docker. To create the container, we just to fire up 2 commands.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama2

You also can open a shell that container with

docker exec -it ollama bash

Side note, Sometimes the containers can linger around, so we need to clean up the containers before restarting a new one.

docker ps
docker container prune

Ollama build Link to heading

Another option is building ollama from source. This needs recent version of go (>2.1 i think).

cd ollama
cmake -B build
cmake --build build

go run . serve

go build .
go install .

In one terminal, Let’s start the server.

ollama serve

In another terminal, let’s run ollama shell with llama3.2. This will take a while the first time as it needs to download the model.

ollama run llama3.2

Ollama with python ollama Link to heading

An easy way to access Ollama end points, is to use the ollama python package.

import ollama
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

Ollama with lLamaIndex Link to heading

or if you are llamaIndex fan, we can use the ollama integration with those 2 packages.

!pip install llama-index
!pip install llama-index-embeddings-ollama llama-index-llms-ollama
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama2", request_timeout=30000.0)

resp = llm.complete("Who is Paul Graham?")
print(resp)

That’s it.