The Problem
Large Language Models often generate excessively verbose responses, even when concise, informative answers would be more valuable. This experiment explores a simple yet effective approach to guide models toward brevity without sacrificing information quality.
The Approach
Rather than abstract instructions like "be concise," this framework uses single-shot training: demonstrating the desired format with one concrete example in the system prompt.
Two-Phase Methodology
Phase 1: Baseline Evaluation
Tested 14 models using a standardized product recommendation prompt (power bank selection) without any brevity instructions to establish natural response lengths.
Phase 2: Single-Shot Training
Selected models received system prompts containing one optimized response example to guide future outputs toward similar brevity.
Key Findings
5.5x
Difference between longest and shortest responses
794
Mean response length (words)
60-75%
Word reduction in optimized examples
Model Response Length Comparison
Comparison of response lengths across 14 evaluated models
Comprehensive Verbosity Analysis
Multi-faceted examination of response characteristics and patterns
Response Length Variation
- Longest: 1,632 words (OpenAI GPT-OSS-120B)
- Shortest: 295 words (AI21 Jamba Large)
- Standard deviation: 456 words
Most Concise Performers
- AI21 Jamba Large - 295 words
- Mistral Large - 352 words
- Meta Llama 4 Maverick - 397 words
Most Verbose Performers
- OpenAI GPT-OSS-120B - 1,632 words
- Google Gemini 2.5 Flash - 1,607 words
Repository Contents
- Raw Response Data: Complete baseline outputs from all tested models
- Optimized Examples: Demonstrating ideal brevity (60-75% word reduction)
- Model-Specific System Prompts: Implementing single-shot training for practical application
- Statistical Analysis: Comprehensive comparison of response lengths and patterns
Practical Applications
This approach offers several benefits for LLM deployment:
- Cost Reduction: Shorter responses mean fewer output tokens and lower API costs
- User Experience: Concise responses are faster to read and process
- Efficiency: One example is simpler than complex prompt engineering
- Reusability: The framework can be adapted to different use cases and domains
Get Involved
This is an open experiment exploring effective LLM training techniques. The repository includes all data, prompts, and analysis for transparency and reproducibility.