OpenAI GPT-3 API error: “This model’s maximum context length is 4097 tokens”

Question

The max_tokens parameter is shared between the prompt and the completion. Tokens from the prompt and the completion all together should not exceed the token limit of a particular OpenAI model.

As stated in the official OpenAI article:

Depending on the model used, requests can use up to 4097 tokens shared
between prompt and completion. If your prompt is 4000 tokens, your
completion can be 97 tokens at most.

The limit is currently a technical limitation, but there are often
creative ways to solve problems within the limit, e.g. condensing your
prompt, breaking the text into smaller pieces, etc.

Note: For counting tokens before(!) sending an API request, see this answer.

GPT-4 and GPT-4 Turbo models:

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
`gpt-4-1106-preview`	GPT-4 Turbo The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This preview model is not yet suited for production traffic. Learn more.	128,000 tokens	Up to Apr 2023
`gpt-4-vision-preview`	GPT-4 Turbo with vision Ability to understand images, in addition to all other GPT-4 Turbo capabilties. Returns a maximum of 4,096 output tokens. This is a preview model version and not suited yet for production traffic. Learn more.	128,000 tokens	Up to Apr 2023
`gpt-4`	Currently points to `gpt-4-0613`. See continuous model upgrades.	8,192 tokens	Up to Sep 2021
`gpt-4-0613`	Snapshot of `gpt-4` from June 13th 2023 with improved function calling support.	8,192 tokens	Up to Sep 2021
`gpt-4-32k`	Currently points to `gpt-4-32k-0613`. See continuous model upgrades.	32,768 tokens	Up to Sep 2021
`gpt-4-32k-0613`	Snapshot of `gpt-4-32k` from June 13th 2023 with improved function calling support.	32,768 tokens	Up to Sep 2021
`gpt-4-0314` (Legacy)	Snapshot of `gpt-4` from March 14th 2023 with function calling support. This model version will be deprecated on June 13th 2024.	8,192 tokens	Up to Sep 2021
`gpt-4-32k-0314` (Legacy)	Snapshot of `gpt-4-32k` from March 14th 2023 with function calling support. This model version will be deprecated on June 13th 2024.	32,768 tokens	Up to Sep 2021

GPT-3.5 models:

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
`gpt-3.5-turbo-1106`	Updated GPT 3.5 Turbo The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. Learn more.	16,385 tokens	Up to Sep 2021
`gpt-3.5-turbo`	Currently points to `gpt-3.5-turbo-0613`. Will point to `gpt-3.5-turbo-1106` starting Dec 11, 2023. See continuous model upgrades.	4,096 tokens	Up to Sep 2021
`gpt-3.5-turbo-16k`	Currently points to `gpt-3.5-turbo-0613`. Will point to `gpt-3.5-turbo-1106` starting Dec 11, 2023. See continuous model upgrades.	16,385 tokens	Up to Sep 2021
`gpt-3.5-turbo-instruct`	Similar capabilities as `text-davinci-003` but compatible with legacy Completions endpoint and not Chat Completions.	4,096 tokens	Up to Sep 2021
`gpt-3.5-turbo-0613` (Legacy)	Snapshot of `gpt-3.5-turbo` from June 13th 2023. Will be deprecated on June 13, 2024.	4,096 tokens	Up to Sep 2021
`gpt-3.5-turbo-16k-0613` (Legacy)	Snapshot of `gpt-3.5-16k-turbo` from June 13th 2023. Will be deprecated on June 13, 2024.	16,385 tokens	Up to Sep 2021
`gpt-3.5-turbo-0301` (Legacy)	Snapshot of `gpt-3.5-turbo` from March 1st 2023. Will be deprecated on June 13th 2024.	4,096 tokens	Up to Sep 2021

GPT-3 models (Legacy):

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
`text-curie-001`	Very capable, faster and lower cost than Davinci.	2,049 tokens	Up to Oct 2019
`text-babbage-001`	Capable of straightforward tasks, very fast, and lower cost.	2,049 tokens	Up to Oct 2019
`text-ada-001`	Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.	2,049 tokens	Up to Oct 2019
`davinci`	Most capable GPT-3 model. Can do any task the other models can do, often with higher quality.	2,049 tokens	Up to Oct 2019
`curie`	Very capable, but faster and lower cost than Davinci.	2,049 tokens	Up to Oct 2019
`babbage`	Capable of straightforward tasks, very fast, and lower cost.	2,049 tokens	Up to Oct 2019
`ada`	Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.	2,049 tokens	Up to Oct 2019

GPT base models:

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
`babbage-002`	Replacement for the GPT-3 `ada` and `babbage` base models.	16,384 tokens	Up to Sep 2021
`davinci-002`	Replacement for the GPT-3 `curie` and `davinci` base models.	16,384 tokens	Up to Sep 2021

Leave a Comment Cancel reply