How to Evaluate AI Output Like a Senior Designer

A six-lens rubric for catching polished mediocrity, surface-level UI, and the gap between 'looks good' and 'actually works' in AI-generated design.

Six-lens critique wheel
Critique frame AI Output evaluate before accepting
  • Lens 01 Review Hierarchy Is the important thing prominent?
  • Lens 02 Review Consistency Does every element follow rules?
  • Lens 03 Review Accessibility Can everyone use this?
  • Lens 04 Review Originality Does it have a point of view?
  • Lens 05 Review Feasibility Can it be built and maintained?
  • Lens 06 Review Product fit Does it solve the right problem?
What usually changes after senior review
Reviewed Raw AI

The polished mediocrity problem

AI-generated UI has a specific failure mode. It looks professional. The spacing is clean. The typography is reasonable. The colors are not ugly. A junior designer might look at it and say β€œthat is fine.”

A senior designer looks at the same screen and sees:

  • No clear hierarchy. Everything is the same visual weight.
  • Generic structure. Hero, three cards, testimonials, footer. Same as every other site.
  • No brand point of view. This could belong to any company.
  • Surface polish hiding structural emptiness. The page looks designed but says nothing.
  • Accessibility theater. It looks accessible without actually being tested.

The problem is not that the output is bad. The problem is that it is not good enough to ship, and it takes experience to see why.

This guide gives you a framework for seeing it.

The six-lens rubric

Use these six lenses to evaluate any AI-generated screen, component, or page. Each lens catches a different category of problem.

1. Hierarchy

What to check: Is the most important thing the most prominent thing?

Red flags:

  • Every element has similar visual weight
  • The CTA is the same size as secondary links
  • Headlines do not stand out from body text
  • Multiple competing focal points

What good looks like: One clear primary action. One clear headline. Everything else is visually subordinate. A user can tell in 3 seconds what the page is about and what to do.

Ask yourself: If I blur my eyes, does the important thing still stand out?

2. Consistency

What to check: Does every element follow the same rules?

Red flags:

  • Two different button styles on the same page
  • Inconsistent spacing (32px here, 24px there, 40px somewhere else)
  • Mixed border-radius values
  • Font sizes that do not follow a scale
  • Colors that are close but not identical

What good looks like: Every button is the same. Every card follows the same spacing. Every heading uses the same scale. It looks like one person made it, not a committee.

Ask yourself: Could I write the rules this page follows? If not, there are no rules.

3. Accessibility

What to check: Can people with different abilities use this?

Red flags:

  • Text on backgrounds with insufficient contrast
  • Buttons or links smaller than 44x44px
  • No visible focus states
  • Color as the only indicator of state (red for error, green for success, nothing else)
  • Images without alt text
  • Form inputs without labels

What good looks like: Contrast ratios pass WCAG AA. Interactive elements are large enough. States are communicated through more than just color. Focus states are visible. Screen readers would make sense of the structure.

Ask yourself: Could someone use this with a keyboard only? With a screen reader? In direct sunlight?

4. Originality

What to check: Does this have a point of view, or could it belong to anyone?

Red flags:

  • Default SaaS layout (hero, features, testimonials, pricing, footer)
  • Inter or system font with no typographic personality
  • Blue or purple accent with no brand rationale
  • Illustrations that look like every other AI illustration
  • Copy that uses words like β€œstreamline,” β€œleverage,” or β€œnext-generation”

What good looks like: The page has visual decisions that reflect the brand, the audience, and the product. It could not be mistaken for a different company. The style choices are deliberate, not default.

Ask yourself: If I removed the logo, would I still know whose page this is?

5. Feasibility

What to check: Can this actually be built and maintained?

Red flags:

  • Layouts that work at one breakpoint but will break on mobile
  • Components that look unique but share no patterns with the rest of the system
  • Animations that require custom code for marginal visual benefit
  • Data that is hardcoded in the mockup but dynamic in production
  • States that are not accounted for (loading, empty, error, overflow)

What good looks like: Components follow patterns that scale. Responsive behavior is considered. Edge cases are handled. A developer could build this without guessing.

Ask yourself: What happens when the headline is twice as long? When there are zero items? When the data is loading?

6. Product fit

What to check: Does this actually solve the right problem for the right user?

Red flags:

  • Pretty UI with no clear user goal
  • Features displayed without context for why they matter
  • No indication of what happens after the CTA
  • Copy that describes the product instead of the user’s outcome
  • A page that looks like a brochure instead of a tool

What good looks like: The page is organized around what the user needs to understand and do. Every section has a job. The copy speaks to the user’s problem, not the product’s features. The design serves the conversion goal.

Ask yourself: If I were the target user, would I know exactly what this does and why I should care?

Surface critique vs structural critique

Surface critique catches visual issues. Structural critique catches design issues. Both matter, but they are different skills.

Surface critique (what most people do)

  • The spacing is off
  • The colors clash
  • The font is too small
  • The button is hard to see

These are real problems. But fixing them does not fix a bad design. It just makes a bad design look cleaner.

Structural critique (what senior designers do)

  • The page has no clear hierarchy
  • The information architecture does not match the user’s mental model
  • The CTA asks for commitment before building trust
  • The page solves a problem the user does not have
  • The component is not reusable across other contexts

Structural problems require rethinking, not polishing.

When evaluating AI output, check surface issues first (they are fast to fix), but always do a structural pass. AI is very good at producing clean surfaces with broken structures underneath.

Giving feedback that improves the next iteration

β€œMake it better” is not feedback. Here is what works:

Be specific about what is wrong

Bad: β€œThe hero does not feel right.”

Better: β€œThe hero has no hierarchy. The headline, subhead, and CTA are all the same visual weight. Make the headline 2x the size of the subhead and move the CTA below a clear value statement.”

Name the lens

β€œThis fails the accessibility lens. The contrast ratio on the muted text is too low and the button is smaller than 44px.”

When you name the lens, AI knows what category of improvement to make.

Provide a reference

β€œLook at the quiet intelligence style from the Style Explorer. The current output is too loud. Reduce to 2 colors, increase whitespace, and drop the gradient.”

Say what to keep

β€œThe section structure is good. Keep the order. Change the visual treatment.”

AI tends to regenerate everything unless you tell it what to preserve.

The β€œearning its place” heuristic

Senior designers carry one question into every review that AI does not: does this element earn its place?

When you scan an AI-generated screen, you’ll see filler. A divider that has no information value. A β€œtrusted by” row of fake logos because the section felt empty. A second CTA that competes with the first. An info icon that links nowhere. Three feature cards because the layout grid was three columns wide.

Filler is a design problem, not a content problem. If a section feels empty, the right move is to fix the layout, not to invent content to fill it. One thousand no’s for every yes. This is the hardest critique skill to teach because the missing element is invisible until you name it.

Walk through the screen element by element. For each one, ask: if I deleted this, would the page lose something the user actually needs? If the answer is no, delete it. The screens that read as β€œdesigned” are the ones where every element passed this test.

This is the lens AI lacks by default. It will fill space because it learned that pages have content. Your job is to be the editor that removes.

The 2-minute checklist

Run this on any AI-generated screen before accepting it.

  • One clear action. Can I tell in 3 seconds what to do?
  • Hierarchy exists. Headline is dominant, body is subordinate, CTA is obvious.
  • Consistency holds. Buttons, spacing, fonts, and colors follow visible rules.
  • Contrast passes. No text on backgrounds I cannot read easily.
  • It has a point of view. This could not be mistaken for a generic template.
  • Edge cases considered. What happens when content is longer, shorter, empty, or loading?
  • Every element earns its place. Nothing is there just because the layout had room.

If three or more fail, redo. Do not polish.

When to accept AI output vs when to redo

Accept and refine when:

  • The structure is right but the surface needs polish
  • The hierarchy is clear but the style needs adjustment
  • The content is good but the copy tone is off

Redo from scratch when:

  • The structure does not match the user’s goal
  • The page has no clear hierarchy at all
  • The design is generic with no brand point of view
  • The component is not reusable or scalable
  • You would be embarrassed to show it to a senior colleague

Polishing a bad structure is wasted effort. Redoing with a better prompt is faster.

Exercise

Run the six-lens rubric on the last AI output you accepted too fast

20 min
  1. Score the output against all six lenses

    Pick one AI-generated artifact you used recently: a Figma Make frame, a v0 component, a Lovable page, a Claude-generated hero. Open a blank note. For each of the six lenses (Hierarchy, Consistency, Accessibility, Originality, Feasibility, Product Fit), write pass or fail and one sentence explaining why.

    • Six entries, one per lens, each with a verdict and a one-sentence reason
    • Reasons cite specific elements (β€œThe primary CTA and the secondary link are the same size”), not vague impressions
    • At least one lens scored a clear fail, even if you originally thought the output was good enough
  2. Decide refine or redo, then give feedback that names the lens

    Count your fails. Three or fewer: refine. Four or more: redo. Either way, write the feedback in the format from the β€œName the lens” section: lens name, specific problem, specific fix. Paste it into your AI tool and regenerate.

    • Your feedback prompt names at least one lens explicitly (β€œThis fails the hierarchy lens…”)
    • The feedback is specific enough that a junior designer could apply it without asking follow-up questions
    • The regenerated output fixes at least one of the failures you identified, not a different problem the AI picked on its own

Finished this lesson?

Mark it complete to track your progress through "AI Design Starter Path".

Lesson
8 / 11
Progress
73%
Free to try Cancel anytime
The guides alone saved me a full day of work every sprint.
Senior Design Systems Lead
Enterprise SaaS
Pro
Full access to everything.
$39 /month
  • All guides, prompts, and templates
  • Starter kits and templates
  • New content every week
  • Priority support