Nebula is currently in private beta.
To request access, see the sign up form.
Note: The Model API is in private beta.
Model API allows calling the model to perform various tasks by sending prompts and various generation parameters with instruction and conversation transcript, and returns the generated output.
There are two API endpoints supported:
/v1/model/generate: Returns the generated output in a single HTTP response.
/v1/model/generate/streaming: Returns the generated response using Server-sent events (SSE) the and allows client to start receiving output progressively as it is getting generated.
Both endpoints take the same request body in the request.
prompt(string or dictionary): Represents the prompt for the model.
- If the prompt is set to a string value, the prompt string is used as is to be sent to the model.
- Alternatively, the prompt can be set to an object represented as a dictionary of key-value pairs using the following structure:
instruction(string): The instruction for the generation prompt.
conversation(dictionary): Represents the conversation parameters for the generation prompt.
text(string): The conversation text used as part of the generation prompt. This should be a minimum of 50 words.
max_new_tokens(int, defaults to 128): The maximum number of tokens to generate, excluding the tokens in the prompt.
stop_sequences(optional, List[string], defaults to an empty list or array): List of text sequences to stop generation at the earliest occurrence of one of the stop sequences. The maximum number of strings that can be passed is 5 and the maximum length of each passed string can be 50 characters.
temperature(optional, float, defaults to 0.0): Modulates the next token probabilities. Values can range from 0.0 to 1.0. Higher values increase the diversity and creativity of the output by increasing the likelihood of selecting lower-probability tokens. Lower values make the output more focused and deterministic by mostly selecting tokens with higher probabilities. Values must be between 0.0 and 1.0.
top_p(optional, float, defaults to 0): If set to a value less than 1, keep only the most probable tokens with probabilities that add up to
top_k(optional, integer, defaults to 0): The number of highest probability vocabulary tokens to keep for top-k filtering.
repetition_penalty(optional, float, defaults to 1.0): The parameter for repetition penalty. 1.0 means no penalty. The maximum value allowed for repetition_penalty is 1.5.
return_scores(optional, boolean, defaults to False): Indicates whether to include token-level scores, which indicates the model's confidence in the response.
model(string): The model used for generation.
output(dictionary): Represents the generated output text and associated token-level scores.
text(string): The generated output text.
scores(optional, List[dictionary]): List of token-level scores associated with the generated output tokens.
token(string): The generated output token.
score(float): The score associated with the generated output token.
stats(optional, dictionary): Represents statistics about the generation process.
input_tokens(integer): The number of tokens in the input (instruction + conversation text).
output_tokens(integer): The total number of tokens in the generated output.
total_tokens(integer): The total number of tokens processed (input + output tokens).
|Error Type||Error Code||Response Body|
|Bad Request||400||Input parameters received in the request are invalid.|
|Unauthorized||401||Error in validating access token. Please provide a valid API key in the`ApiKey` header.|
|Too Many Requests||429||Concurrency limit for this API is reached. The server is unable to accept the request at the moment. Please wait and try again later.|
|Internal Server Error||500||Internal Server Error.|
Updated 2 months ago