Evaluation Metrics

Live stats across all pipeline runs — success rate, latency, retries, and failure breakdown.

← Back to Generator
📊

Evaluation Dataset

20 prompts — 10 real-world + 10 edge cases (vague, conflicting, underspecified)

10

Real prompts

10

Edge cases

20

Total prompts

Total Runs

4

all time

Successful

1

3 failed

Avg Latency

66.6s

per pipeline run

Avg Retries

0.25

per run

Success Rate

25%
0%100%

Failure Types

Stage 5 failed: All LLM attempts failed. Last error: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 16000 tokens, but can only af1
Stage 2 failed: 1 validation error for DesignSchema pages.4.primary_entity Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] For further information visit 1
Stage 1 failed: All LLM attempts failed. Last error: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 16000 tokens, but can only af1

Recent Runs

PromptApp TypeLatencyStageStatus
Build a job board where companies post jobs and candidates apply. Both get dashboards to manage their activity.11.8s0/5
Build a job board where companies post jobs and candidates apply. Both get dashboards to manage their activity.marketplace164.5s4/5
Build an LMS where instructors create courses with lessons and quizzes. Students enroll, track progress, and earn certificates.saas83.2s5/5
Build a CRM with login, contacts, deals pipeline, analytics dashboard, role-based access for admin and sales reps, and Stripe payments.crm6.8s1/5

Page refreshes every 30 seconds · Powered by SQLite eval logger