Hello world Link to heading

This is a small example from LiteLLM docs using Ollama. I have Ollama running locally, so that was easy.

from litellm import completion

response = completion(
    model="ollama/llama2",
    messages=[{ "content": "respond in 20 words. who are you?", "role": "user" }],
    api_base="http://localhost:11434"
)
print(response)

ModelResponse(id=‘chatcmpl-d1c86df4-5feb-419d-8e8a-fd876ad46085’, choices=[Choices(finish_reason=‘stop’, index=0, message=Message(content=“I’m just an AI assistant, here to help!”, role=‘assistant’))], created=1716627590, model=‘ollama/llama2’, object=‘chat.completion’, system_fingerprint=None, usage=Usage(prompt_tokens=31, completion_tokens=14, total_tokens=45))

Python SDK Link to heading

Well,This is a fancy way of saying, they abstract API on top of different LLM providers’ interfaces. Still, super helpful because OpenAI and Hugging Face are using different names for their APIs.

from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
  model="gpt-3.5-turbo",
  messages=[{ "content": "Hello, how are you?", "role": "user" }]
)

I tried it with local Ollama and it works!

from litellm import completion

response = completion(
    model="ollama/llama2",
    messages=[{ "content": "respond in 20 words. who are you?", "role": "user" }],
    api_base="http://localhost:11434"
)
print(response)

Proxy Link to heading

This is where things get interesting. LiteLLM can start a local server working as a proxy. It provides these features:

Hooks for auth
Hooks for logging
Cost tracking
Rate limiting

The most interesting part is it can mock API for different providers. For example, you can use a local model from a provider and use the OpenAI library exactly like you have OpenAI access. Cool!

pip install 'litellm[proxy]'

litellm --model huggingface/bigcode/starcoder

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000") # set proxy to base_url

# request sent to model set on LiteLLM proxy, `litellm --model`

response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])

print(response)

Bonus: They have a cool dashboard to track things for your app. Again, COOL! It can be accessed at localhost:4000/ui but it needs a PostgreSQL database.