InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

Find the optimal instruction is extremely important for achieving the "charm", and this holds true for Large Language Models as well. ("Wingardium Leviosa" was a charm used to make objects fly, or levitate. If you are interested in "leviOsa", please check the video.)

An application of InstructZero

Given a few input-output exemplars for the task, InstructZero optimizes the instruction applied to a black-box LLM step by step (three optimization interations shown in the plot) with progressively improving accuracy.


Large language models(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. On each iteration of the proposed method, which we call InstructZero, a soft prompt is converted into an instruction using the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, and the performance is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks.

TL;DR: The first framework to conduct instruction optimization for black-box LLM like ChatGPT, where Black-box API LLM can only provide textual output.


Pipeline of InstructZero. On each iteration, a soft prompt and a few exemplars of the target task are sent to the open-source LLM for generating an instruction, which then prompts the black-box LLM to produce answers to target-task queries. The score (e.g., accuracy) of the answers and the soft prompt is added as new training data for BO, which updates its posterior about the objective and produces a new soft prompt to explore in the next iteration. Both LLMs are frozen.

The pipeline of Bayesian optimization (BO) in InstructZero.

We apply BO with instruction-coupled kernel to optimize the soft prompt. For the details, please checkout our paper.


We evaluate InstructZero as a tool to find an instruction that steers a black-box LLM towards a desired downstream behavior on a target task, which also shows the potential of the InstructZero to optimize a initially bad prompt and finally obtain a good prompt!

Zero-shot test accuracy on 32 tasks from BIG-Bench. InstructZero achieves the best performance on all 32 out of 32 tasks among the three evaluated approaches.

Future Work

We will evaluate the effectiveness of our algorithm on other black-box LLMs like Claude(Anthropic) and Google(Bard). Since the application of the APIs is still in progress, we will update our webpage and project repo when the are approved! Stay tuned!