The DSL Layer Between AI and Your App

The first version of the DarJS test architecture had the AI writing raw Playwright.

page.click('a[href*="/new"]'). page.fill('input[name="cin_patient"]', 'AB123456'). page.waitForSelector('.record-id:has-text("REG-001")'). It worked until a template changed. Then it silently broke — tests passed, wrong elements clicked, wrong state verified.

The second version has the AI writing { "action": "clickNew" }. That cannot break unless DarJS stops having a create button. And if DarJS stops having a create button, the test should break, because the workflow being tested no longer exists.

The difference between these two versions is a DSL. A thin layer of structured JSON that lives between what the AI generates and what the browser executes. This is the pattern, why it works, and where it applies.

The Wrong Target

Ask an AI to write Playwright and it will write Playwright. The code will look plausible. It will even run the first time. The problem is what happens after.

Playwright code contains selectors. Selectors are references to the current state of your HTML — specific elements, in a specific place, with specific attributes chosen by whoever wrote the template. They encode assumptions that may be true now and wrong next sprint. The template designer adds a wrapper div. The redesign changes a class name. The HTMX upgrade changes which element triggers the action. None of these changes break the feature. They all break the selector.

AI cannot know when a selector is stale. It generated the selector from a description of the app, not from live HTML. When you ask it to fix a broken test, it generates a new selector from the same description — which may already be out of date. You’re in a loop where the tests never quite keep up.

The failure mode is particularly bad for confidence. The test runs. It succeeds. The thing it was testing has changed, but the AI couldn’t know, so the test didn’t change with it. You have green tests and a broken feature. This is worse than no tests.

Three Layers, Not One

The right architecture separates intent, structure, and mechanics.

Intent is natural language. “Create a new ordonnancier with numero_ordre REG-001 and verify it appears as pending.” This is what the developer writes or describes.

Structure is the semantic JSON the AI generates. { "action": "fillForm", "data": { "numero_ordre": "REG-001" } }. This encodes the intent in a form that a program can interpret, using domain vocabulary, not browser vocabulary.

Mechanics is what the runner does. It reads the structure, looks up ModelClass.collectFields() to validate the field name, derives the selector input[name="numero_ordre"] from the field key, and executes the Playwright call. The runner knows your app. The AI does not.

The key insight is: AI is responsible for Layer 2. Not Layer 3. The runner owns Layer 3 entirely. The AI never touches the DOM, never learns a selector, and is never wrong about one.

What Makes a DSL Good for AI

A DSL that AI will reliably generate correctly has specific properties.

Narrow vocabulary. If there are 50 possible action types, the AI will invent new ones, use wrong ones, and combine them incorrectly. If there are 12, it won’t. The DarJS test schema has 14 actions. That’s a number an AI can keep fully in context and use consistently. Each action maps to one concept; no two actions mean the same thing.

Domain nouns, not browser nouns. clickNew instead of click button. fillForm instead of type into input. assertRecord instead of check text exists. Browser nouns change with your HTML. Domain nouns change with your product. Your product changes much more slowly than your HTML.

Data as domain data. "data": { "numero_ordre": "REG-001" } — a model field name and a value. Not "selector": "#order-number-input", "value": "REG-001". The AI knows your model fields because you gave them to the prompt. It does not know your input IDs, which you never gave it and which may have changed since you wrote the prompt.

Assertions at the semantic level. assertRecord says: this field should have this value on the current page. It doesn’t say where the value appears or how it’s formatted. The runner derives that. If detail pages change their layout, the assertion still works. If you assert against a specific CSS selector, it works until the layout changes.

The Runner as a Translation Layer

The runner is the load-bearing part of this system. It knows your domain, your routes, your field contracts. The AI does not need to.

The runner has three responsibilities.

Validate before running. Before Playwright opens a browser, the runner calls ModelClass.collectFields() for every model referenced in the scenario and checks every fillForm.data key against the field list. Unknown keys fail immediately with the valid field list in the error message. This catches AI hallucinations — invented field names, misspelled keys, fields that existed in an older version of the model — before they become confusing browser errors.

Derive, don’t hardcode. Every selector in the runner is computed from domain data, not written by hand. The route for “Ordonnancier list” is always /ui/${toId('Ordonnancier')}. The input selector for numero_ordre is always input[name="numero_ordre"]. The create button is always a[href="/ui/${modelId}/new"]. These derivations encode the contract between your domain model and your DOM. When the contract holds, the runner works. When the contract breaks, that’s a framework bug, not a test maintenance task.

Fail loudly at the boundary. When a step fails, the runner reports which action failed, what selector it tried, and what contract it derived that selector from. Not Element not found. Selector a[href="/ui/ordonnancier/new"] not found — expected create button for model Ordonnancier. The runner knows enough about the domain to give the developer a useful error.

Where This Pattern Transfers

The pattern isn’t DarJS-specific. It applies whenever:

You have repetitive workflows that share a common structure. CRUD apps, admin panels, CMS interfaces, order management systems, booking systems — anywhere the same operations appear across many entities. The DSL captures the shared shape; the runner parameterizes it.

Your selectors are fragile relative to your domain stability. If your HTML changes every sprint but your domain model hasn’t changed in months, the selector is the wrong thing to put in the test.

You want non-engineers to write tests, or to describe tests. A JSON schema with 14 clearly named actions is something a product manager can read and verify. Raw Playwright is not. The DSL is your shared language between the person who knows what the test should do and the system that executes it.

What AI Is For in This Architecture

AI’s role is translation, not generation. It translates a workflow description into a structured JSON document using a fixed vocabulary. This is something AI does reliably, because the output space is constrained. There are 14 valid actions. The field names are given in the prompt. The model names are given in the prompt. The AI’s job is to put the right things in the right places.

The creative work — designing the DSL, writing the runner, defining which fields exist on which models — is done by the developer. The mechanical work — filling in the JSON for the 40 scenarios that cover the pharmacy’s ordonnancier workflow — is done by the AI.

That’s the right division. The developer provides domain knowledge and system contracts. The AI provides speed on repetitive translation tasks. Neither tries to do the other’s job.

The DSL is the boundary between them. Design it well and both sides of that boundary become much simpler.

Next: TBD