Gemini 1.5 Flash API payload exceeds 32k window silently

Gemini 1.5 Flash API payload exceeds 32k window silently

Pushed a long system prompt and history to the beta Flash model last night. Response came back with the choices array empty and no error.

Curl payload size about 38 KB, token count around 34 k. Google’s docs cap at 32 k but the endpoint returns 200 OK even when the limit is breached.

Trimmed the prompt:

  • removed markdown formatting
  • kept the last three turns only
  • ensured total tokens under 31 k by running tiktoken locally

Resent the request; completion returned normally.

If the response is blank check usage.totalTokens in the JSON; if it equals the max window your prompt was truncated. Flash discards overflow without warning.

comments powered by Disqus