Evaluation Metrics

Live stats across all pipeline runs — success rate, latency, retries, and failure breakdown.

← Back to Generator

📊

Evaluation Dataset

20 prompts — 10 real-world + 10 edge cases (vague, conflicting, underspecified)

Real prompts

Edge cases

Total prompts

Total Runs

all time

Successful

3 failed

Avg Latency

66.6s

per pipeline run

Avg Retries

0.25

per run

Success Rate

25%

0%100%

Failure Types

Stage 5 failed: All LLM attempts failed. Last error: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 16000 tokens, but can only af1

Stage 2 failed: 1 validation error for DesignSchema pages.4.primary_entity Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] For further information visit 1

Stage 1 failed: All LLM attempts failed. Last error: Error code: 402 - {'error': {'message': 'This request requires more credits, or fewer max_tokens. You requested up to 16000 tokens, but can only af1

Recent Runs

PromptApp TypeLatencyStageStatus

Build a job board where companies post jobs and candidates apply. Both get dashboards to manage their activity.—11.8s0/5✕

Build a job board where companies post jobs and candidates apply. Both get dashboards to manage their activity.marketplace164.5s4/5✕

Build an LMS where instructors create courses with lessons and quizzes. Students enroll, track progress, and earn certificates.saas83.2s5/5✓

Build a CRM with login, contacts, deals pipeline, analytics dashboard, role-based access for admin and sales reps, and Stripe payments.crm6.8s1/5✕

Page refreshes every 30 seconds · Powered by SQLite eval logger