Using PDF Input
How to send / receive pdf's (other document types) to a /chat/completions endpoint
Works for:
- Vertex AI models (Gemini + Anthropic)
- Bedrock Models
- Anthropic API Models
- OpenAI API Models
- Mistral (Only using file ID of already uploaded file, similar to OpenAI file_id input)
Quick Startโ
urlโ
- SDK
- PROXY
from litellm.utils import supports_pdf_input, completion
# set aws credentials
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
# pdf url
file_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
# model
model = "bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0"
file_content = [
    {"type": "text", "text": "What's this file about?"},
    {
        "type": "file",
        "file": {
            "file_id": file_url,
        }
    },
]
if not supports_pdf_input(model, None):
    print("Model does not support image input")
response = completion(
    model=model,
    messages=[{"role": "user", "content": file_content}],
)
assert response is not None
- Setup config.yaml
model_list:
  - model_name: bedrock-model
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: os.environ/AWS_REGION_NAME
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "bedrock-model",
    "messages": [
        {"role": "user", "content": [
            {"type": "text", "text": "What's this file about?"},
            {
                "type": "file",
                "file": {
                    "file_id": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
                }
            }
        ]},
    ]
}'
base64โ
- SDK
- PROXY
from litellm.utils import supports_pdf_input, completion
# set aws credentials
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
# pdf url
image_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
response = requests.get(url)
file_data = response.content
encoded_file = base64.b64encode(file_data).decode("utf-8")
base64_url = f"data:application/pdf;base64,{encoded_file}"
# model
model = "bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0"
file_content = [
    {"type": "text", "text": "What's this file about?"},
    {
        "type": "file",
        "file": {
            "file_data": base64_url,
        }
    },
]
if not supports_pdf_input(model, None):
    print("Model does not support image input")
response = completion(
    model=model,
    messages=[{"role": "user", "content": file_content}],
)
assert response is not None
- Setup config.yaml
model_list:
  - model_name: bedrock-model
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: os.environ/AWS_REGION_NAME
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "bedrock-model",
    "messages": [
        {"role": "user", "content": [
            {"type": "text", "text": "What's this file about?"},
            {
                "type": "file",
                "file": {
                    "file_data": "data:application/pdf;base64...",
                }
            }
        ]},
    ]
}'
Specifying formatโ
To specify the format of the document, you can use the format parameter.
- SDK
- PROXY
from litellm.utils import supports_pdf_input, completion
# set aws credentials
os.environ["AWS_ACCESS_KEY_ID"] = ""
os.environ["AWS_SECRET_ACCESS_KEY"] = ""
os.environ["AWS_REGION_NAME"] = ""
# pdf url
file_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
# model
model = "bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0"
file_content = [
    {"type": "text", "text": "What's this file about?"},
    {
        "type": "file",
        "file": {
            "file_id": file_url,
            "format": "application/pdf",
        }
    },
]
if not supports_pdf_input(model, None):
    print("Model does not support image input")
response = completion(
    model=model,
    messages=[{"role": "user", "content": file_content}],
)
assert response is not None
- Setup config.yaml
model_list:
  - model_name: bedrock-model
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: os.environ/AWS_REGION_NAME
- Start the proxy
litellm --config /path/to/config.yaml
- Test it!
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "bedrock-model",
    "messages": [
        {"role": "user", "content": [
            {"type": "text", "text": "What's this file about?"},
            {
                "type": "file",
                "file": {
                    "file_id": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
                    "format": "application/pdf",
                }
            }
        ]},
    ]
}'
Mistral Exampleโ
Here is a sample payload for using the Mistral model for document understanding:
- SDK
- PROXY
from litellm.utils import completion
# pdf file_id received from files endpoint
file_id = "fa778e5e-46ec-4562-8418-36623fe25a71"
# model
model = "mistral/mistral-large-latest"
file_content = [
    {"type": "text", "text": "What's this file about?"},
    {
        "type": "file",
        "file": {
            "file_id": file_id,
        }
    },
]
response = completion(
    model=model,
    messages=[{"role": "user", "content": file_content}],
)
assert response is not None
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "model": "mistral/mistral-large-latest",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is the content of the file?"
                },
                {
                    "type": "file",
                    "file": {
                        "file_id": "fa778e5e-46ec-4562-8418-36623fe25a71"
                    }
                }
            ]
        }
    ]
}
Checking if a model supports pdf inputโ
- SDK
- PROXY
Use litellm.supports_pdf_input(model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0") -> returns True if model can accept pdf input
assert litellm.supports_pdf_input(model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0") == True
- Define bedrock models on config.yaml
model_list:
  - model_name: bedrock-model # model group name
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
      aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
      aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
      aws_region_name: os.environ/AWS_REGION_NAME
    model_info: # OPTIONAL - set manually
      supports_pdf_input: True
- Run proxy server
litellm --config config.yaml
- Call /model_group/infoto check if a model supportspdfinput
curl -X 'GET' \
  'http://localhost:4000/model_group/info' \
  -H 'accept: application/json' \
  -H 'x-api-key: sk-1234'
Expected Response
{
  "data": [
    {
      "model_group": "bedrock-model",
      "providers": ["bedrock"],
      "max_input_tokens": 128000,
      "max_output_tokens": 16384,
      "mode": "chat",
      ...,
      "supports_pdf_input": true, # ๐ supports_pdf_input is true
    }
  ]
}