arize-prompt-optimization

Overview

# Arize Prompt Optimization Skill

Concepts

Where Prompts Live in Trace Data

LLM applications emit spans following OpenInference semantic conventions. Prompts are stored in different span attributes depending on the span kind and instrumentation:

| Column | What it contains | When to use |

|--------|-----------------|-------------|

| `attributes.llm.input_messages` | Structured chat messages (system, user, assistant, tool) in role-based format | **Primary source** for chat-based LLM prompts |

| `attributes.llm.input_messages.roles` | Array of roles: `system`, `user`, `assistant`, `tool` | Extract individual message roles |

| `attributes.llm.input_messages.contents` | Array of message content strings | Extract message text |

| `attributes.input.value` | Serialized prompt or user question (generic, all span kinds) | Fallback when structured messages are not available |

| `attributes.llm.prompt_template.template` | Template with `{variable}` placeholders (e.g., `"Answer {question} using {context}"`) | When the app uses prompt templates |

| `attributes.llm.prompt_template.variables` | Template variable values (JSON object) | See what values were substituted into the template |

| `attributes.output.value` | Model response text | See what the LLM produced |

| `attributes.llm.output_messages` | Structured model output (including tool calls) | Inspect tool-calling responses |

Finding Prompts by Span Kind

**LLM span** (`attributes.openinference.span.kind = 'LLM'`): Check `attributes.llm.input_messages` for structured chat messages, OR `attributes.input.value` for a serialized prompt. Check `attributes.llm.prompt_template.template` for the template.

arize-prompt-optimization

Overview

Concepts

Where Prompts Live in Trace Data

Finding Prompts by Span Kind

Performance Signal Columns

Prerequisites

Phase 1: Extract the Current Prompt

Find LLM spans containing prompts

Export a trace to inspect prompt structure

Extract prompts from exported JSON

Reconstruct the prompt as messages

Phase 2: Gather Performance Data

From traces (production feedback)

From datasets and experiments

Merge dataset + experiment for analysis

Identify what to optimize

Phase 3: Optimize the Prompt

The Optimization Meta-Prompt

Preparing the performance data

Applying the revised prompt

Phase 4: Iterate

The optimization loop

Measure improvement

A/B compare two prompts

Prompt Engineering Best Practices

Variable preservation

Workflows

Optimize a prompt from a failing trace

Optimize using a dataset and experiment

Debug a prompt that produces wrong format

Reduce hallucination in a RAG prompt

Troubleshooting