The Nebula chat endpoint takes a list of messages comprising a chat between a human and assistant, and returns a response containing an updated list of messages, with the last message containing a response from the model.
There are two API endpoints supported:
/v1/model/chat
: Returns the generated output in a single HTTP response./v1/model/chat/streaming
: Returns the generated response using Server-sent events (SSE) and allows the client to start receiving output progressively as it is getting generated.
Request body for /v1/model/chat
curl --location 'https://api-nebula.symbl.ai/v1/model/chat' \
--header 'ApiKey: <your_api_key>' \
--header 'Content-Type: application/json' \
--data '{
"max_new_tokens": 1024,
"top_p": 0.95,
"top_k": 1,
"system_prompt": "You are a sales coaching assistant. You help user to get better at selling. You are respectful, professional and you always respond politely.",
"messages": [
{
"role": "human",
"text": "Hi"
}
]
}'
import requests
import json
url = "https://api-nebula.symbl.ai/v1/model/chat"
payload = json.dumps({
"max_new_tokens": 1024,
"top_p": 0.95,
"top_k": 1,
"system_prompt": "You are a sales coaching assistant. You help user to get better at selling. You are respectful, professional and you always respond politely.",
"messages": [
{
"role": "human",
"text": "Hi"
}
]
})
headers = {
'ApiKey': '<your_api_key>',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Request body for /v1/model/chat/streaming
curl --location 'https://api-nebula.symbl.ai/v1/model/chat/streaming' \
--header 'ApiKey: <your_api_key>' \
--header 'Content-Type: application/json' \
--data '{
"max_new_tokens": 1024,
"top_p": 0.95,
"top_k": 1,
"system_prompt": "You are a sales coaching assistant. You help user to get better at selling. You are respectful, professional and you always respond politely.",
"messages": [
{
"role": "human",
"text": "Hi"
}
]
}' --no-buffer
import requests
import json
url = "https://api-nebula.symbl.ai/v1/model/chat/streaming"
payload = json.dumps({
"max_new_tokens": 1024,
"top_p": 0.95,
"top_k": 1,
"system_prompt": "You are a sales coaching assistant. You help user to get better at selling. You are respectful, professional and you always respond politely.",
"messages": [
{
"role": "human",
"text": "Hi"
}
]
})
headers = {
'ApiKey': '<your_api_key>',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload, stream=True)
print(response.text)
Request
Method: POST
URL
Non-streaming endpoint:/v1/model/chat
Streaming endpoint: /v1/model/chat/streaming
Headers
- Content-Type: must be set to
application/json
-Content-Type: application/json
- ApiKey: must be set to valid API key -
ApiKey: <api_key>
Body
{
"model": "<model name>", // Optional
"system_prompt": "<system prompt here>", // Optional
"messages": [
{ // First message must be with role = "human"
"role": "human",
"text": "<human message or prompt>"
},
{ // Following message must be with role = "assistant"
"role": "assistant",
"text": "<human message or prompt>"
}
],
"max_new_tokens": 1024,
"top_p": 1.0,
"top_k": 1,
"temperature": 0.0,
"repetition_penalty": 1.0,
"return_scores": false
}
model (optional, string): Name of the model. Defaults to nebula-chat-large
. If using a custom or fine-tuned model, the respective name of the model should be used.
system_prompt (optional, string): The system prompt sets the behavior for the model.
messages (required, List[Message]): List of messages. A minimum of 1 message must be present, and multiple messages with human
and assistant
roles can be passed, where the first and last message in the list must be have the role human
.
Structure of Message object
{
"role": "human" | "assistant", // can be either "human" or "assistant"
"text": "<text containing message from human or assistant>"
}
- role (string): Role associated with the message - must be either
human
orassistant
. - text (string): The text of the message.
max_new_tokens (optional, int, defaults to 128): The maximum number of tokens to generate, excluding the tokens in the input messages.
stop_sequences (optional, List[string], defaults to []): List of text sequences to stop generation at the earliest occurrence of one of the stop sequences. The maximum number of strings that can be passed is 5 and the maximum length of each passed string can be 50 characters.
temperature (optional, float, defaults to 0.0): Modulates the next token probabilities. Values can range from 0.0 to 1.0. Higher values increase the diversity and creativity of the output by increasing the likelihood of selecting lower-probability tokens. Lower values make the output more focused and deterministic by mostly selecting tokens with higher probabilities. Values must be between 0.0 and 1.0.
top_p (optional, float, defaults to 0): If set to a value less than 1, keep only the most probable tokens with probabilities that add up to top_p or higher.
top_k (optional, integer, defaults to 0): The number of highest probability vocabulary tokens to keep for top-k filtering.
repetition_penalty (optional, float, defaults to 1.0): The parameter for repetition penalty. 1.0 means no penalty. The maximum value allowed for repetition_penalty is 1.2.
return_scores (optional, boolean, defaults to False): Indicates whether to include token-level scores, which indicates the model's confidence in the response.
Response
{
"model": "<model_name>", // model used for generating response
"messages": [
{
"role": "human",
"text": "<message or prompt by human>"
},
{ // New message appended to message list from request with model's response
"role": "assistant",
"text": "<model's response>"
}
],
"stats": {
"input_tokens": <number of input tokens>,
"output_tokens": <number of output tokens generated>,
"total_tokens": <total tokens input + output>
},
"scores": [
{
"token": "<token text>",
"score": "<logprob score>"
]
}
model (string): The model used for generation.
messages (List[Message]) List of messages with one additional message with role assistant
containing content generated by the model.
stats (dictionary): Represents statistics about the generation process.
- input_tokens (integer): The number of tokens in the input.
- output_tokens (integer): The total number of tokens in the output.
- total_tokens (integer): The total number of tokens processed (input + output tokens).
scores (optional, List[dictionary]): List of token-level scores associated with the generated output tokens.
- token (string): The generated output token.
- score (float): The log-prob associated with the generated output token.
Streaming Response
When a request is made to/v1/model/chat/streaming
endpoint, the response is generated as SSE events to send output progressively.
{
"text": "<model's response so far>",
"delta": "<delta between last event and current event's content>",
"is_final": <true|false>,
"stats": {
"input_tokens": <number of input tokens>,
"output_tokens": <number of output tokens generated>,
"total_tokens": <total tokens input + output>
}
}
text (string): Text content of the response generated so far.
delta (string): Additional text in the current event from the last event. This is useful in building straightforward concatenation logic when rendering progressive text as API returns it in consecutive events.
is_final (boolean): Indicates if the event is the final or last event in the response or not. true
if the final event, else false
.
stats (dictionary): Represents statistics about the generation process.
- input_tokens (integer): The number of tokens in the input.
- output_tokens (integer): The total number of tokens in the output.
- total_tokens (integer): The total number of tokens processed (input + output tokens).
Stats are available only for the final streaming response when is_final
is true
Errors
HTTP Error Code is returned in response when the request fails with detail
field containing details of failure.
detail (string): Details about the error.
{
"detail": "<error message>"
}
Error Code | Error Description |
---|---|
400 - Bad Request | The request body is incorrect. |
401 - Unauthorized | Invalid authentication details were provided. Either the API Key is missing or not correct. |
429 - Rate limit reached | Too many requests are sent and have exceeded rate limits. |
500 - Server error | The servers had an error while processing the request. |
503 - Service Unavailable | The servers are overloaded due to high traffic. |