The max_tokens parameter is shared between the prompt and the completion. Tokens from the prompt and the completion all together should not exceed the token limit of a particular OpenAI model.
As stated in the official OpenAI article:
Depending on the model used, requests can use up to
4097tokens shared
between prompt and completion. If your prompt is4000tokens, your
completion can be97tokens at most.The limit is currently a technical limitation, but there are often
creative ways to solve problems within the limit, e.g. condensing your
prompt, breaking the text into smaller pieces, etc.
Note: For counting tokens before(!) sending an API request, see this answer.
GPT-4 and GPT-4 Turbo models:
| LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
|---|---|---|---|
gpt-4-1106-preview |
GPT-4 Turbo The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This preview model is not yet suited for production traffic. Learn more. |
128,000 tokens | Up to Apr 2023 |
gpt-4-vision-preview |
GPT-4 Turbo with vision Ability to understand images, in addition to all other GPT-4 Turbo capabilties. Returns a maximum of 4,096 output tokens. This is a preview model version and not suited yet for production traffic. Learn more. |
128,000 tokens | Up to Apr 2023 |
gpt-4 |
Currently points to gpt-4-0613. See continuous model upgrades. |
8,192 tokens | Up to Sep 2021 |
gpt-4-0613 |
Snapshot of gpt-4 from June 13th 2023 with improved function calling support. |
8,192 tokens | Up to Sep 2021 |
gpt-4-32k |
Currently points to gpt-4-32k-0613. See continuous model upgrades. |
32,768 tokens | Up to Sep 2021 |
gpt-4-32k-0613 |
Snapshot of gpt-4-32k from June 13th 2023 with improved function calling support. |
32,768 tokens | Up to Sep 2021 |
gpt-4-0314 (Legacy) |
Snapshot of gpt-4 from March 14th 2023 with function calling support. This model version will be deprecated on June 13th 2024. |
8,192 tokens | Up to Sep 2021 |
gpt-4-32k-0314 (Legacy) |
Snapshot of gpt-4-32k from March 14th 2023 with function calling support. This model version will be deprecated on June 13th 2024. |
32,768 tokens | Up to Sep 2021 |
GPT-3.5 models:
| LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
|---|---|---|---|
gpt-3.5-turbo-1106 |
Updated GPT 3.5 Turbo The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. Learn more. |
16,385 tokens | Up to Sep 2021 |
gpt-3.5-turbo |
Currently points to gpt-3.5-turbo-0613. Will point to gpt-3.5-turbo-1106 starting Dec 11, 2023. See continuous model upgrades. |
4,096 tokens | Up to Sep 2021 |
gpt-3.5-turbo-16k |
Currently points to gpt-3.5-turbo-0613. Will point to gpt-3.5-turbo-1106 starting Dec 11, 2023. See continuous model upgrades. |
16,385 tokens | Up to Sep 2021 |
gpt-3.5-turbo-instruct |
Similar capabilities as text-davinci-003 but compatible with legacy Completions endpoint and not Chat Completions. |
4,096 tokens | Up to Sep 2021 |
gpt-3.5-turbo-0613 (Legacy) |
Snapshot of gpt-3.5-turbo from June 13th 2023. Will be deprecated on June 13, 2024. |
4,096 tokens | Up to Sep 2021 |
gpt-3.5-turbo-16k-0613 (Legacy) |
Snapshot of gpt-3.5-16k-turbo from June 13th 2023. Will be deprecated on June 13, 2024. |
16,385 tokens | Up to Sep 2021 |
gpt-3.5-turbo-0301 (Legacy) |
Snapshot of gpt-3.5-turbo from March 1st 2023. Will be deprecated on June 13th 2024. |
4,096 tokens | Up to Sep 2021 |
GPT-3 models (Legacy):
| LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
|---|---|---|---|
text-curie-001 |
Very capable, faster and lower cost than Davinci. | 2,049 tokens | Up to Oct 2019 |
text-babbage-001 |
Capable of straightforward tasks, very fast, and lower cost. | 2,049 tokens | Up to Oct 2019 |
text-ada-001 |
Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. | 2,049 tokens | Up to Oct 2019 |
davinci |
Most capable GPT-3 model. Can do any task the other models can do, often with higher quality. | 2,049 tokens | Up to Oct 2019 |
curie |
Very capable, but faster and lower cost than Davinci. | 2,049 tokens | Up to Oct 2019 |
babbage |
Capable of straightforward tasks, very fast, and lower cost. | 2,049 tokens | Up to Oct 2019 |
ada |
Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. | 2,049 tokens | Up to Oct 2019 |
GPT base models:
| LATEST MODEL | DESCRIPTION | MAX TOKENS | TRAINING DATA |
|---|---|---|---|
babbage-002 |
Replacement for the GPT-3 ada and babbage base models. |
16,384 tokens | Up to Sep 2021 |
davinci-002 |
Replacement for the GPT-3 curie and davinci base models. |
16,384 tokens | Up to Sep 2021 |