How to Evaluate AI Output Like a Senior Designer
A six-lens rubric for catching polished mediocrity, surface-level UI, and the gap between 'looks good' and 'actually works' in AI-generated design.
- Hierarchy Is the important thing prominent?
- Consistency Does every element follow rules?
- Accessibility Can everyone use this?
- Originality Does it have a point of view?
- Feasibility Can it be built and maintained?
- Product fit Does it solve the right problem?
The polished mediocrity problem
AI-generated UI has a specific failure mode. It looks professional. The spacing is clean. The typography is reasonable. The colors are not ugly. A junior designer might look at it and say βthat is fine.β
A senior designer looks at the same screen and sees:
- No clear hierarchy. Everything is the same visual weight.
- Generic structure. Hero, three cards, testimonials, footer. Same as every other site.
- No brand point of view. This could belong to any company.
- Surface polish hiding structural emptiness. The page looks designed but says nothing.
- Accessibility theater. It looks accessible without actually being tested.
The problem is not that the output is bad. The problem is that it is not good enough to ship, and it takes experience to see why.
This guide gives you a framework for seeing it.
The six-lens rubric
Use these six lenses to evaluate any AI-generated screen, component, or page. Each lens catches a different category of problem.
1. Hierarchy
What to check: Is the most important thing the most prominent thing?
Red flags:
- Every element has similar visual weight
- The CTA is the same size as secondary links
- Headlines do not stand out from body text
- Multiple competing focal points
What good looks like: One clear primary action. One clear headline. Everything else is visually subordinate. A user can tell in 3 seconds what the page is about and what to do.
Ask yourself: If I blur my eyes, does the important thing still stand out?
2. Consistency
What to check: Does every element follow the same rules?
Red flags:
- Two different button styles on the same page
- Inconsistent spacing (32px here, 24px there, 40px somewhere else)
- Mixed border-radius values
- Font sizes that do not follow a scale
- Colors that are close but not identical
What good looks like: Every button is the same. Every card follows the same spacing. Every heading uses the same scale. It looks like one person made it, not a committee.
Ask yourself: Could I write the rules this page follows? If not, there are no rules.
3. Accessibility
What to check: Can people with different abilities use this?
Red flags:
- Text on backgrounds with insufficient contrast
- Buttons or links smaller than 44x44px
- No visible focus states
- Color as the only indicator of state (red for error, green for success, nothing else)
- Images without alt text
- Form inputs without labels
What good looks like: Contrast ratios pass WCAG AA. Interactive elements are large enough. States are communicated through more than just color. Focus states are visible. Screen readers would make sense of the structure.
Ask yourself: Could someone use this with a keyboard only? With a screen reader? In direct sunlight?
4. Originality
What to check: Does this have a point of view, or could it belong to anyone?
Red flags:
- Default SaaS layout (hero, features, testimonials, pricing, footer)
- Inter or system font with no typographic personality
- Blue or purple accent with no brand rationale
- Illustrations that look like every other AI illustration
- Copy that uses words like βstreamline,β βleverage,β or βnext-generationβ
What good looks like: The page has visual decisions that reflect the brand, the audience, and the product. It could not be mistaken for a different company. The style choices are deliberate, not default.
Ask yourself: If I removed the logo, would I still know whose page this is?
5. Feasibility
What to check: Can this actually be built and maintained?
Red flags:
- Layouts that work at one breakpoint but will break on mobile
- Components that look unique but share no patterns with the rest of the system
- Animations that require custom code for marginal visual benefit
- Data that is hardcoded in the mockup but dynamic in production
- States that are not accounted for (loading, empty, error, overflow)
What good looks like: Components follow patterns that scale. Responsive behavior is considered. Edge cases are handled. A developer could build this without guessing.
Ask yourself: What happens when the headline is twice as long? When there are zero items? When the data is loading?
6. Product fit
What to check: Does this actually solve the right problem for the right user?
Red flags:
- Pretty UI with no clear user goal
- Features displayed without context for why they matter
- No indication of what happens after the CTA
- Copy that describes the product instead of the userβs outcome
- A page that looks like a brochure instead of a tool
What good looks like: The page is organized around what the user needs to understand and do. Every section has a job. The copy speaks to the userβs problem, not the productβs features. The design serves the conversion goal.
Ask yourself: If I were the target user, would I know exactly what this does and why I should care?
Surface critique vs structural critique
Surface critique catches visual issues. Structural critique catches design issues. Both matter, but they are different skills.
Surface critique (what most people do)
- The spacing is off
- The colors clash
- The font is too small
- The button is hard to see
These are real problems. But fixing them does not fix a bad design. It just makes a bad design look cleaner.
Structural critique (what senior designers do)
- The page has no clear hierarchy
- The information architecture does not match the userβs mental model
- The CTA asks for commitment before building trust
- The page solves a problem the user does not have
- The component is not reusable across other contexts
Structural problems require rethinking, not polishing.
When evaluating AI output, check surface issues first (they are fast to fix), but always do a structural pass. AI is very good at producing clean surfaces with broken structures underneath.
Giving feedback that improves the next iteration
βMake it betterβ is not feedback. Here is what works:
Be specific about what is wrong
Bad: βThe hero does not feel right.β
Better: βThe hero has no hierarchy. The headline, subhead, and CTA are all the same visual weight. Make the headline 2x the size of the subhead and move the CTA below a clear value statement.β
Name the lens
βThis fails the accessibility lens. The contrast ratio on the muted text is too low and the button is smaller than 44px.β
When you name the lens, AI knows what category of improvement to make.
Provide a reference
βLook at the quiet intelligence style from the Style Explorer. The current output is too loud. Reduce to 2 colors, increase whitespace, and drop the gradient.β
Say what to keep
βThe section structure is good. Keep the order. Change the visual treatment.β
AI tends to regenerate everything unless you tell it what to preserve.
The βearning its placeβ heuristic
Senior designers carry one question into every review that AI does not: does this element earn its place?
When you scan an AI-generated screen, youβll see filler. A divider that has no information value. A βtrusted byβ row of fake logos because the section felt empty. A second CTA that competes with the first. An info icon that links nowhere. Three feature cards because the layout grid was three columns wide.
Filler is a design problem, not a content problem. If a section feels empty, the right move is to fix the layout, not to invent content to fill it. One thousand noβs for every yes. This is the hardest critique skill to teach because the missing element is invisible until you name it.
Walk through the screen element by element. For each one, ask: if I deleted this, would the page lose something the user actually needs? If the answer is no, delete it. The screens that read as βdesignedβ are the ones where every element passed this test.
This is the lens AI lacks by default. It will fill space because it learned that pages have content. Your job is to be the editor that removes.
The 2-minute checklist
Run this on any AI-generated screen before accepting it.
- One clear action. Can I tell in 3 seconds what to do?
- Hierarchy exists. Headline is dominant, body is subordinate, CTA is obvious.
- Consistency holds. Buttons, spacing, fonts, and colors follow visible rules.
- Contrast passes. No text on backgrounds I cannot read easily.
- It has a point of view. This could not be mistaken for a generic template.
- Edge cases considered. What happens when content is longer, shorter, empty, or loading?
- Every element earns its place. Nothing is there just because the layout had room.
If three or more fail, redo. Do not polish.
When to accept AI output vs when to redo
Accept and refine when:
- The structure is right but the surface needs polish
- The hierarchy is clear but the style needs adjustment
- The content is good but the copy tone is off
Redo from scratch when:
- The structure does not match the userβs goal
- The page has no clear hierarchy at all
- The design is generic with no brand point of view
- The component is not reusable or scalable
- You would be embarrassed to show it to a senior colleague
Polishing a bad structure is wasted effort. Redoing with a better prompt is faster.
Run the six-lens rubric on the last AI output you accepted too fast
-
Score the output against all six lenses
Pick one AI-generated artifact you used recently: a Figma Make frame, a v0 component, a Lovable page, a Claude-generated hero. Open a blank note. For each of the six lenses (Hierarchy, Consistency, Accessibility, Originality, Feasibility, Product Fit), write pass or fail and one sentence explaining why.
- Six entries, one per lens, each with a verdict and a one-sentence reason
- Reasons cite specific elements (βThe primary CTA and the secondary link are the same sizeβ), not vague impressions
- At least one lens scored a clear fail, even if you originally thought the output was good enough
-
Decide refine or redo, then give feedback that names the lens
Count your fails. Three or fewer: refine. Four or more: redo. Either way, write the feedback in the format from the βName the lensβ section: lens name, specific problem, specific fix. Paste it into your AI tool and regenerate.
- Your feedback prompt names at least one lens explicitly (βThis fails the hierarchy lensβ¦β)
- The feedback is specific enough that a junior designer could apply it without asking follow-up questions
- The regenerated output fixes at least one of the failures you identified, not a different problem the AI picked on its own
Finished this lesson?
Mark it complete to track your progress through "AI Design Starter Path".
The guides alone saved me a full day of work every sprint.
- All guides, prompts, and templates
- Starter kits and templates
- New content every week
- Priority support