Gemini 1.5 Flash API payload exceeds 32k window silently

Dec 16, 2024
gemini flash token-limit
1 min read

Gemini 1.5 Flash API payload exceeds 32k window silently

Pushed a long system prompt and history to the beta Flash model last night. Response came back with the choices array empty and no error.

Curl payload size about 38 KB, token count around 34 k. Google’s docs cap at 32 k but the endpoint returns 200 OK even when the limit is breached.

Trimmed the prompt:

removed markdown formatting
kept the last three turns only
ensured total tokens under 31 k by running tiktoken locally

Resent the request; completion returned normally.

If the response is blank check usage.totalTokens in the JSON; if it equals the max window your prompt was truncated. Flash discards overflow without warning.