How to Make Your App AI-Testable

We built a complete AI test generator for DarJS in a single session. The AI never saw the HTML. It never read the Nunjucks templates, never inspected a DOM, never ran the app. It generated valid, executable E2E tests from plain English because it only needed to know the model’s field names and the action it was testing. Everything else was already specified by contract.

This didn’t happen because the AI was clever. It happened because DarJS was designed in a way that made it possible. Most apps are not. This is what that design looks like, and why it matters.

What AI-Testability Actually Requires

When you ask an AI to write a test, you’re asking it to produce a series of instructions for navigating a browser. Those instructions must survive contact with a real running application — they must find the right elements, fill the right fields, click the right buttons, and verify the right state.

For AI to produce instructions that work reliably, three things must be true:

First, routes must be derivable from domain concepts. If the URL for “create a new order” is /ui/order/new, and that pattern holds for every model in the system, the AI can construct any URL from a model name. If the URL is /pages/orders/create for orders but /admin/invoices/add for invoices, the AI has to be told every URL individually. It will eventually get one wrong.

Second, form fields must match model keys. If a form input has name="numero_ordre" and that matches the field key in collectFields(), then feeding the AI the model’s field list gives it everything it needs to fill any form. If the input name was chosen by a template designer who thought order-number sounded better, the correspondence breaks. The AI has no way to know that order-number and numero_ordre refer to the same thing.

Third, actions must be named, not implied. A “delete” action should be reachable by a stable contract — a button whose selector can be derived from the action name and the model. Not by its position in a dropdown, not by a CSS class that a designer might rename, not by a hardcoded pixel offset.

DarJS satisfies all three. Most apps satisfy none.

The Anti-Pattern

The standard way web apps grow is that developers pick names for things as they build them. A route gets created, it makes sense at the time, it stays. A field gets named by the person who wrote the template. A button gets a class that made sense in 2022 and hasn’t been changed because the tests don’t cover it anyway.

The result is a codebase where the relationship between your domain model and your DOM is implicit, historical, and only fully understood by the people who built it.

AI cannot learn this. It can be told, piece by piece, what each element is called. It can scrape a running app and infer structure. But both approaches break as soon as the app changes. The AI’s knowledge of the DOM is a snapshot. The DOM is not.

This is why most “AI test generation” products feel unreliable. The bottleneck isn’t the AI’s ability to produce test syntax. The bottleneck is the gap between domain intent and DOM reality, and AI cannot bridge that gap by cleverness alone. It can only bridge it if the gap was closed at design time.

The DarJS Contract Chain

DarJS closes the gap at three layers, and they all connect.

Layer 1: Model fields → form inputs. Every DarJS form is rendered by PageDefRenderer.formContext(). It reads ModelClass.collectFields() and produces one field per model key, with name: col in the output. The form renderer doesn’t choose names — it derives them from the model. So the field list you get from collectFields() is identical to the field list the browser will render. Every time. No divergence possible.

Layer 2: Model name → route. Every DarJS model gets a route via toId(ModelClass.modelName), which converts PascalCase to kebab-case deterministically. Ordonnancier → ordonnancier. PharmacyOrder → pharmacy-order. The router registers /ui/:modelId for every model. The test runner uses the same toId() function. So when the AI writes "model": "Ordonnancier", the runner knows the route is /ui/ordonnancier without being told.

Layer 3: PageDef → actions. PageDefGenerator.fromModel() produces an actions array: ['create', 'view', 'update', 'delete'], plus 'transition' if the model has a state machine. These are the only actions that exist. The runner maps each action to a specific selector derived from the route contract. clickNew becomes a[href="/ui/${modelId}/new"]. There is no other interpretation.

Each layer derives from the previous. The model is the source of truth. The DOM is a rendering of it.

What This Design Enables

When the runner loads a test scenario, the first thing it does is call ModelClass.collectFields() and validate every key in every fillForm step against the real field list. Unknown keys throw UnknownFieldError with the valid field list before Playwright ever launches. This isn’t a nice-to-have — it’s only possible because the runner has access to the same metadata the renderer uses.

The AI prompt template doesn’t contain HTML. It contains the action vocabulary and a space to paste the output of collectFields(). The developer gives the AI domain context; the runner handles the browser. The AI never needs to know what the DOM looks like.

When DarJS gets a new model, the tests for it can be generated from scratch by an AI that has never seen the app run. The developer writes the model, runs collectFields(), pastes the output into the prompt, and describes the workflows in plain English. The AI produces a JSON test file. The runner executes it. No template reading required on either side.

The Practical Checklist

If you want to apply this to a non-DarJS app, the questions to ask are:

Can you derive every route from the model name algorithmically? If not, the AI will need a route map, which you’ll maintain manually.

Do your form field name attributes match your model field keys? If the template designer chose different names, you need an explicit mapping layer.

Are your action elements findable by a stable semantic attribute rather than CSS classes or position? data-action="create" survives a redesign. .btn-primary:first-child does not.

Can you query your model metadata at runtime and use it to validate test inputs? If yes, you can fail fast before the browser opens. If no, failures are silent.

Four questions. A yes to all four means the AI can write reliable tests for your app today.

What You Get

The test files generated by this system describe intent, not mechanics. "action": "fillForm", "data": { "numero_ordre": "REG-001" } reads like a specification. It’s also exactly what runs. There’s no translation layer where intent gets lost in selector soup.

When a model field is renamed, you update the model. The runner picks up the new name from collectFields() automatically. The test still passes because it refers to the model field, not to the DOM attribute.

When the layout changes — new template, redesigned form — the tests don’t break. The form still has input[name="${fieldKey}"] because that’s a DarJS renderer contract, not a template choice.

The AI isn’t clever here. The app is honest. That’s the thing to build.

Next: The DSL Layer Between AI and Your App