Prompt design

This page talks about how to design prompts for the Nebula Chat model. The quality of prompts can significantly affect the quality of the output, especially for complex tasks.

Understanding Prompts

There are two prompts that a user can manipulate for Nebula Chat:

System Prompt: Designing a system prompt allows the user to define model behavior, specify context and set rules if necessary. In the absence of a system prompt, the model acts like a free-form conversational agent.
User Message: Triggering a conversation requires the user to specify the first user message in the “Send a message” field, following which chat-like communication ensues.

payload = json.dumps({
  "max_new_tokens": 1024,
  "system_prompt": """You will be given a conversation between two or more individuals. Do these:
      1. Give a brief paragraph summary of the conversation. 
      2. Extract three key points made by each speaker in the conversation and group them by the speaker.
      3. Generate a sentiment analysis for each speaker.""",
  "messages": [
    {
      "role": "human",
      "text": "– insert transcript –"
    }
  ],
  "top_k": 1,
  "top_p": 1,
  "temperature": 0,
  "repetition_penalty": 1
})

Creating Prompts

Nebula Chat is designed to be a conversational agent. You should consider this when creating prompts. A prompt in the context of conversational intelligence can consist of two components, an instruction and a transcript.

Instructions

Instructions in the system prompt or user message must clearly describe and express intent about the task you'd like Nebula to perform.

Here are some best practices and example for designing instructions.

Provide sufficient details

Bad instruction (too abstract)

Summarize.

Although the instruction still works, it excludes task details. It may be good enough for a small number of requests. However, providing details clearly and concisely would improve the output.

For example, this instruction provides precise, concise details of the same task.

Good instruction

Generate a short paragraph summary of this conversation.

Specify relevant tasks directly on the conversation

Make sure the instruction provided contains relevant tasks to the conversation provided. Don't include generic tasks that are irrelevant to the conversation, such as asking for code or information about the world. Instead include tasks that you'd like Nebula to perform on the conversation such as analyzing the conversation, summarizing content, or drafting an email.
In the instruction, ensure that the task is directly performed on the conversation. For example:

Bad instruction (no explicit focus on conversation included)

Generate a short summary.

Good instruction (explicitly mentioned focus on conversation)

Generate a short summary of this conversation.

Good instruction (explicitly mentioned focus on conversation)

Based on this conversation, what are the key concerns expressed regarding the launch plan?

Specify the conversation type

Whenever possible, instead of using the generic term "conversation," use a more specific term in your instruction. For example, you could indicate that it's a sales call, customer support call, or interview.

Good instruction (specific type of conversation term used)

What are the next steps for a support representative based on this customer support call?

Avoid instructions that are out of context

Avoid asking general questions such as the following example. The model will likely not respond with correct output in such situations because it is not trained to answer general-purpose questions. Instead it will attempt to find the information in the conversation.

Bad instruction (not a conversation relevant question)

Who is Barack Obama?

The following question works well because Anne Marie is involved in the conversation.

Good instruction (question is relevant for conversation)

Who is Anne Marie in this conversation?

In this example, the instruction works well if the conversation is a customer support call.

Good instruction (task is sensible for conversation)

Based on this conversation, make some recommendations for the support agent to help improve their performance in upcoming calls.

Give structure to longer instructions

In case of longer, complex instructions, structure them under appropriate headings. This helps Nebula better understand the intent of the user.

Bad instruction

You are a real time assistant to a sales agent who sells telecommunications products and services. You will be given a live call transcript. Suggest how the sales agent should respond to the given prospect. Keep the response under 80 words. Present factual response. Don't add pleasantries. Do not go too deep into resolving one objection, try to move the call forward

Good instruction

You are a real time assistant to a sales agent who sells telecommunications products and services. You will be given a live call transcript. Suggest how the sales agent should respond to the given prospect. # Rules # - keep the response under 80 words - present factual response - don't add pleasantries - do not go too deep into resolving one objection, try to move the call forward

Transcript Format

The conversation transcript is a textual representation of the actual conversation. To get accurate high quality output, use transcripts with clear separation between speakers, and minimal inaccuracies or word-error-rate (WER). The model will still attempt to adjust a few errors whenever possible, but reducing errors and having clear and accurate speaker annotation increases the quality and accuracy of the results.

We recommend this format for transcripts:

<Speaker 1 Name>: <Sentences spoken by Speaker 1 continuously>
<Speaker 2 Name>: <Sentences spoken by Speaker 2 continuously>
.
.
.

The details are Speaker Name and Sentences spoken by the Speaker separated by : and one white space, then a new line \n character. This format uses a minimal number of characters for separators and new lines. This format also avoids repeating the speaker's name more than required.

Although this is not a strict restriction, we recommend you optimize the token usage and attempt to achieve higher accuracy in the model's output.