LLM API

LLM API

Documentation on our cost-friendly state-of-the-art Large Language Model API (ALPHA TEST)

GoAPI now allows Large Language Model Inference, referred to as LLM Inference. This service allows you access to APIs of endpoints for some exciting models available. Our service and pricing model best fit users who want high throughput scenarios.

Available models:

  1. gpt-3.5-turbo
  2. gpt-3.5-turbo-0301
  3. gpt-3.5-turbo-0613
  4. gpt-3.5-turbo-16k
  5. gpt-3.5-turbo-16k-0613
  6. gpt-3.5-turbo-1106
  7. gpt-4*
  8. gpt-4-0613*
  9. gpt-4-1106-preview*
  10. gpt-4-vision-preview*

*Note: all GPT-4 related models are only available for Developer Plan and above, check the Subscription Plans for information.


Pricing

The price of GPT-3.5 call is 1/5 of the price of OpenAI official website. Details: LLM API | PPU Quota | Endpoint Usage

Special Note

Due to Cloudflare's setting, we recommend using Stream method for openai's completions api whenever possible.
2023/11/28 Update: If you are determined to use Non-Stream method, you can change your domain to https://proxy.goapi.xyz


POST

https://api.goapi.xyz/v1/chat/completions

Creates a model response for the given chat conversation.

Parameters:

Header
NameTypeRequiredDescription
Authorizationstring✔️Your GoAPI Key used for request authorization
Body
NameTypeRequiredDescription
modelstring✔️ID of the model to use
messagesarray✔️A list of messages comprising the conversation so far
functionsarrayA list of functions the model may generate JSON inputs for.
function_callstringControls how the model calls functions. "none" means the model will not call a function and instead generates a message. "auto" means the model can pick between generating a message or calling a function. Specifying a particular function via {name:my_function} forces the model to call that function. "none" is the default when no functions are present. "auto" is the default if functions are present.
temperaturenumberDefaults to 1. What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_pnumberDefaults to 1. An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
nintegerDefaults to 1. How many chat completion choices to generate for each input message.
streambooleanDefaults to false. If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data:[none] message
stopstring/arraryDefaults to null. Up to 4 sequences where the API will stop generating further tokens.
max_tokensnumberDefaults to inf The maximum number of tokens to generate in the chat completion.
presence_penaltynumberDefaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
frequency_penaltynumberDefaults to 0. Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
logit_biasmapDefaults to null. Modify the likelihood of specified tokens appearing in the completion.

Response Codes:

200: OK
Successful Response
400: Bad Request
The request format does not meet the requirements.
401: Unauthorized
The API key is incorrect
500: Internal Server Error
Service is experiencing an error

NO STREAMING

Request Example

curl https://api.goapi.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer GOAPI_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
     {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Response Example

{
  "id": "chatcmpl-83jZ61GDHtdlsFUzXDbpGeoU193Mj",
  "object": "chat.completion",
  "created": 1695900828,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 9,
    "total_tokens": 28
  }
}

STREAMING

Request Example

curl https://api.goapi.xyz/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer GOAPI_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
   ],
    "stream": true
  }'

Response Example

data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"role":"assistant","content":""},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"Hello"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"!"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" How"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" can"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" I"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" assist"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" you"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":" today"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{"content":"?"},"finish_reason":null}]}
 
data: {"id":"chatcmpl-83jctesyk8nEkPytXDNLz1oV5dIQK","object":"chat.completion.c
hunk","created":1695901063,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"d
elta":{},"finish_reason":"stop"}]}
 
data: [DONE]
 

FUNCTION CALLING

Request Example

curl https://api.goapi.xyz/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer GOAPI_KEY" \
-d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston?"
    }
  ],
  "functions": [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
         },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ],
  "function_call": "auto"
}'

Response Example

{
  "id": "chatcmpl-83jfAmPmT0LwOgyD8iVDNR4aFIC04",
  "object": "chat.completion",
  "created": 1695901204,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"Boston, MA\"\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 82,
    "completion_tokens": 18,
    "total_tokens": 100
  }
}

After vision model was introduced

content in each message can be Array rather than String. Check openai GPT-4-vision guide for detail: https://platform.openai.com/docs/guides/vision (opens in a new tab)

Batch your request to avoid RPM and RPD limit

OpenAI introduce batch, a batch would be treated as 1 request. Detail: end of https://platform.openai.com/docs/guides/rate-limits?context=tier-five (opens in a new tab)