One of the biggest challenges in prompt engineering is understanding if Prompt A performs better than Prompt B. PromptLayer helps you solve this.

The best way to understand your prompts is by analyzing them in production. Below are some ways you can use PromptLayer to answer the following key questions:

Scoring

Every request in PromptLayer has a “Score”. This score is an integer 0-100.

Untitled

Ranking in PromptLayer revolves around this Score value. You can update it through the UI (see screenshot above) or programmatically (More details here).

The three most common ways to Score to rank your prompts are:

  1. User feedback: Present a 👍 and 👎 to your users after the completion. A user press of one of those buttons fills in a score of [100, 0] respectively.

  2. RLHF: Use our visual dashboard to fill in scores by hand. You can then use this data to decide between prompt templates or to fine-tune.

  3. Synthetic Evaluation: Use LLMs to score LLMs. After getting a completion, run an evaluation prompt on it and translate that to a score [0, 100].

    For example, your prompt could be:

    The following is an AI chat message given to a user:
    
    {completion}
    
    --
    
    We are worried that the chatbot is being rude. How rude is the chat on a scale of 0 to 100?