DAY 0 Support: Gemini 3 Flash on LiteLLM
LiteLLM now supports gemini-3-flash-preview and all the new API changes along with it.
If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.8-stable.1
pip install litellm==1.80.8.post1
What's New​
1. New Thinking Levels: thinkingLevel with MINIMAL & MEDIUM​
Gemini 3 Flash introduces granular thinking control with thinkingLevel instead of thinkingBudget.
- MINIMAL: Ultra-lightweight thinking for fast responses
- MEDIUM: Balanced thinking for complex reasoning
- HIGH: Maximum reasoning depth
LiteLLM automatically maps the OpenAI reasoning_effort parameter to Gemini's thinkingLevel, so you can use familiar reasoning_effort values (minimal, low, medium, high) without changing your code!
2. Thought Signatures​
Like gemini-3-pro, this model also includes thought signatures for tool calls. LiteLLM handles signature extraction and embedding internally. Learn more about thought signatures.
Edge Case Handling: If thought signatures are missing in the request, LiteLLM adds a dummy signature ensuring the API call doesn't break
Supported Endpoints​
LiteLLM provides full end-to-end support for Gemini 3 Flash on:
- ✅
/v1/chat/completions- OpenAI-compatible chat completions endpoint - ✅
/v1/responses- OpenAI Responses API endpoint (streaming and non-streaming) - ✅
/v1/messages- Anthropic-compatible messages endpoint - ✅
/v1/generateContent– Google Gemini API compatible endpoint All endpoints support: - Streaming and non-streaming responses
- Function calling with thought signatures
- Multi-turn conversations
- All Gemini 3-specific features
- Converstion of provider specific thinking related param to thinkingLevel
Quick Start​
- SDK
- PROXY
- LOW
- MEDIUM (NEW)
- HIGH
Basic Usage with MEDIUM thinking (NEW)
from litellm import completion
# No need to make any changes to your code as we map openai reasoning param to thinkingLevel
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Solve this complex math problem: 25 * 4 + 10"}],
reasoning_effort="medium", # NEW: MEDIUM thinking level
)
print(response.choices[0].message.content)
1. Setup config.yaml
model_list:
- model_name: gemini-3-flash
litellm_params:
model: gemini/gemini-3-flash-preview
api_key: os.environ/GEMINI_API_KEY
2. Start proxy
litellm --config /path/to/config.yaml
3. Call with MEDIUM thinking
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "gemini-3-flash",
"messages": [{"role": "user", "content": "Complex reasoning task"}],
"reasoning_effort": "medium"
}'
``'
</TabItem>
</Tabs>
---
## All `reasoning_effort` Levels
<Tabs>
<TabItem value="minimal" label="MINIMAL">
**Ultra-fast, minimal reasoning**
```python
from litellm import completion
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "What's 2+2?"}],
reasoning_effort="minimal",
)
Simple instruction following
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Write a haiku about coding"}],
reasoning_effort="low",
)
Balanced reasoning for complex tasks ✨
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Analyze this dataset and find patterns"}],
reasoning_effort="medium", # NEW!
)
Maximum reasoning depth
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Prove this mathematical theorem"}],
reasoning_effort="high",
)
Key Features​
✅ Thinking Levels: MINIMAL, LOW, MEDIUM, HIGH
✅ Thought Signatures: Track reasoning with unique identifiers
✅ Seamless Integration: Works with existing OpenAI-compatible client
✅ Backward Compatible: Gemini 2.5 models continue using thinkingBudget
Installation​
pip install litellm --upgrade
import litellm
from litellm import completion
response = completion(
model="gemini/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Your question here"}],
reasoning_effort="medium", # Use MEDIUM thinking
)
print(response)
If using this model via vertex_ai, keep the location as global as this is the only supported location as of now.
reasoning_effort Mapping for Gemini 3+​
| reasoning_effort | thinking_level |
|---|---|
minimal | minimal |
low | low |
medium | medium |
high | high |
disable | minimal |
none | minimal |


