The tagline for LiteLLM is simple and awesome
Call 100+ LLMs using the same Input/Output Format
Hello world Link to heading
This is a small example from LiteLLM docs using Ollama. I have Ollama running locally, so that was easy.
from litellm import completion
response = completion(
model="ollama/llama2",
messages=[{ "content": "respond in 20 words. who are you?", "role": "user" }],
api_base="http://localhost:11434"
)
print(response)
ModelResponse(id=‘chatcmpl-d1c86df4-5feb-419d-8e8a-fd876ad46085’, choices=[Choices(finish_reason=‘stop’, index=0, message=Message(content=“I’m just an AI assistant, here to help!”, role=‘assistant’))], created=1716627590, model=‘ollama/llama2’, object=‘chat.completion’, system_fingerprint=None, usage=Usage(prompt_tokens=31, completion_tokens=14, total_tokens=45))
Python SDK Link to heading
Well,This is a fancy way of saying, they abstract API on top of different LLM providers’ interfaces. Still, super helpful because OpenAI and Hugging Face are using different names for their APIs.
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model="gpt-3.5-turbo",
messages=[{ "content": "Hello, how are you?", "role": "user" }]
)
I tried it with local Ollama and it works!
from litellm import completion
response = completion(
model="ollama/llama2",
messages=[{ "content": "respond in 20 words. who are you?", "role": "user" }],
api_base="http://localhost:11434"
)
print(response)
Proxy Link to heading
This is where things get interesting. LiteLLM can start a local server working as a proxy. It provides these features:
- Hooks for auth
- Hooks for logging
- Cost tracking
- Rate limiting
The most interesting part is it can mock API for different providers. For example, you can use a local model from a provider and use the OpenAI library exactly like you have OpenAI access. Cool!
pip install 'litellm[proxy]'
litellm --model huggingface/bigcode/starcoder
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000") # set proxy to base_url
# request sent to model set on LiteLLM proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
Bonus: They have a cool dashboard to track things for your app. Again, COOL!
It can be accessed at localhost:4000/ui
but it needs a PostgreSQL database.