OPEN SOURCE

The ARTEKNE Creative Benchmark

An Open Standard for Measuring AI Creative Quality Against Luxury Standards

March 30, 2026

15 min read

The ARTEKNE Creative Benchmark (ACB) is an 8-axis evaluation framework designed to measure whether AI-generated creative output meets the standard of luxury brand campaigns. We're open-sourcing the methodology.

The Problem with Existing Benchmarks

Current AI image quality benchmarks were designed for research, not commerce. FID (Fréchet Inception Distance) measures statistical similarity to a reference dataset. CLIP Score measures text-image alignment. ImageReward measures human aesthetic preference.

None of these capture what a creative director actually evaluates when reviewing campaign imagery: Does this image build the brand? Does it maintain visual consistency with the rest of the campaign? Does it tell a story that commands premium pricing?

The 8-Axis Framework

ACB evaluates AI creative output across eight dimensions, each scored on a 1–10 scale:

AXIS	MEASURES	WHY IT MATTERS
1. Color Science	Palette sophistication, harmony, tonal range	First thing the eye registers
2. Composition	Visual weight, negative space, focal point	Separates "editorial" from "snapshot"
3. Lighting	Directionality, mood, shadow quality	$500 shot vs. $50K campaign
4. Model Realism	Anatomy, skin texture, expression	Uncanny valley destroys credibility
5. Styling	Garment fit, accessory coordination	Signals "catalog" vs. "campaign"
6. Environmental Integration	Model-environment relationship, scale	Compositing artifacts are instant tells
7. Campaign Consistency	Visual language across a series	One image ≠ a campaign
8. Brand Narrative	Does it tell a story? Build value?	The highest-order evaluation

Baseline Results (v1.0)

We scored outputs from Midjourney v7, DALL-E 4, Stable Diffusion 3, Runway Gen-4, and ARTEKNE/Hephaestus across all 8 axes:

SYSTEM	CS	CO	LI	MR	ST	EI	CC	BN	AVG
Midjourney v7	8.0	7.5	7.5	6.5	5.0	7.0	3.0	2.0	5.8
DALL-E 4	7.0	7.0	6.5	6.0	4.5	6.5	2.5	2.0	5.3
Stable Diffusion 3	6.5	6.5	6.0	5.5	4.0	6.0	2.0	1.5	4.8
Runway Gen-4	7.5	7.0	7.0	6.0	4.5	7.0	3.0	2.5	5.6
ARTEKNE	8.2	7.5	7.8	7.0	7.5	8.0	8.5	8.0	7.8

Key Finding

Single-image generators (Midjourney, DALL-E, SD) score competitively on individual image quality axes (CS, CO, LI) but collapse on system-level axes (CC, BN). This is because they have no concept of "campaign" or "brand" — each image is generated independently.

ARTEKNE's multi-agent architecture specifically addresses this gap. The 209-agent system maintains brand DNA, visual language, and narrative consistency across every generated image — which is why it scores 8.0+ on Campaign Consistency and Brand Narrative where competitors score 2.0–3.0.

The gap between AI creative tools is not in individual image quality. It's in system-level creative thinking. That's the gap Autonomous Creative is designed to close.

How to Contribute

ACB is open-source. We welcome contributions in three forms:

New system evaluations — Run the benchmark on additional AI systems and submit results
Methodology improvements — Propose new axes, refined scoring criteria, or alternative protocols
Evaluator participation — Join the evaluator pool (especially creative directors with luxury brand experience)

Repository: github.com/artekne/creative-benchmark

ACB methodology is released under MIT License. Results data is released under CC BY 4.0.

THE AUTONOMOUS CREATIVE ENGINE

ARTEKNE gives any brand the creative capability of a $4B luxury house.

LEARN MORE