Evaluation Metrics
Live stats across all pipeline runs — success rate, latency, retries, and failure breakdown.
📊
Evaluation Dataset
20 prompts — 10 real-world + 10 edge cases (vague, conflicting, underspecified)
10
Real prompts
10
Edge cases
20
Total prompts
Total Runs
4
all time
Successful
1
3 failed
Avg Latency
66.6s
per pipeline run
Avg Retries
0.25
per run
Success Rate
25%0%100%
Failure Types
Stage 5 failed: All LLM attempts failed. Last error: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 16000 tokens, but can only af1
Stage 2 failed: 1 validation error for DesignSchema
pages.4.primary_entity
Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
For further information visit 1
Stage 1 failed: All LLM attempts failed. Last error: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 16000 tokens, but can only af1
Recent Runs
PromptApp TypeLatencyStageStatus
—11.8s0/5✕
marketplace164.5s4/5✕
saas83.2s5/5✓
crm6.8s1/5✕
Page refreshes every 30 seconds · Powered by SQLite eval logger