Gemini 1.5 Flash API payload exceeds 32k window silently
Pushed a long system prompt and history to the beta Flash model last night. Response came back with the choices
array empty and no error.
Curl payload size about 38 KB, token count around 34 k. Google’s docs cap at 32 k but the endpoint returns 200 OK even when the limit is breached.
Trimmed the prompt:
- removed markdown formatting
- kept the last three turns only
- ensured total tokens under 31 k by running
tiktoken
locally
Resent the request; completion returned normally.
If the response is blank check usage.totalTokens
in the JSON; if it equals the max window your prompt was truncated. Flash discards overflow without warning.