From Few-Shot to Guidelines: A Smarter Way to Prompt AI

In the world of large language models (LLMs), one of the biggest challenges is how to get the best possible response from the AI.

Traditionally, "shot" methods have been the go-to, where a model is given example questions and answers, like in few-shot learning. This approach prompts the AI to mimic the reasoning steps in the examples. However, this method isn’t without its downsides—it’s hard to choose the right examples, and sometimes, even the best examples can miss crucial task-specific knowledge.

Enter the "Guideline" method, a promising alternative that uses structured guidance rather than examples. This method explicitly instructs the model on how to think through a problem by providing it with a set of clear, task-specific rules.

But the real breakthrough comes from a new framework called FGT (Feedback, Guideline, and Tree-gather), which automates the creation of these guidelines directly from the data.

The FGT Framework: How It Works

It consists of 3 key agents:

Feedback Agent: It analyzes the outcomes of AI-generated answers, determining what worked and what didn’t. It provides feedback on both correct and incorrect responses, which is then used to improve the AI's reasoning process.

Guideline Agent: It takes the feedback and distills it into clear, concise guidelines that help the model perform better on similar tasks in the future.

Tree-gather Agent: As the final step, this agent organizes all the guidelines in a hierarchical structure, ensuring that no valuable information is left out and that the model can see the bigger picture.

This framework also encourages the model to show its work—literally. By prompting the AI to provide a thought process instead of just the answer, the system ensures that the reasoning aligns with the guidelines, leading to more accurate and reliable results.

Performance Gains Across Tasks

To test this approach, the researchers evaluated it on the Big-Bench Hard (BBH) dataset, which includes tasks like math calculations, logical reasoning, and context understanding. The results were impressive.

The FGT framework not only outperformed traditional few-shot methods but also did better than more advanced techniques like Chain-of-Thought (CoT) reasoning.

For logical reasoning tasks, FGT achieved a significant improvement, with accuracy jumping to 93.9%, far outstripping both few-shot and many-shot methods. Even in tasks requiring deep context understanding, where many-shot usually excels, FGT was competitive, proving that well-constructed guidelines can be just as effective as providing a large number of examples.

What makes guidelines so powerful is their ability to generalize. This allows the model to adapt more flexibly to new tasks without needing countless examples.

Chen, J., Wang, S., Li, Z., Xiong, W., Qu, L., Xu, Z., & Qi, Y. (2024). Can we only use guideline instead of shot in prompt? arXiv, 2409.12979. Retrieved from https://arxiv.org/abs/2409.12979v1