Skip to main content
Accessibility Engineering MAY 09, 2026

Making AccelaStudy AI Accessible to All

Accessibility is the kind of work that gets pushed to the next sprint until something forces the issue. The forcing function for AccelaStudy AI was the launch window. We had fifty-six user-facing repositories across con…

Accessibility is the kind of work that gets pushed to the next sprint until something forces the issue. The forcing function for AccelaStudy AI was the launch window. We had fifty-six user-facing repositories across consumer apps, enterprise tools, internal back-office surfaces, and marketing sites, and we had a brand promise that adaptive learning works for every learner. Every learner means every learner; a screen-reader user, a switch-device user, a keyboard-only user, a person magnifying the screen 200%. None of those people care that we built a beautiful adaptive engine if the buttons aren't reachable, the inputs aren't labeled, or the focus ring is invisible.

This article is the autopsy of one day's accessibility work. The numbers are real. The patterns are reusable. If you have a JavaScript codebase north of a thousand files and you've been treating WCAG as a checklist instead of as engineering, this is the playbook.

What "fully accessible" actually means

Plenty of apps claim to be accessible. Far fewer pass an actual audit. Almost none have an audit that runs in CI and refuses to ship regressions. The difference matters. WCAG 2.1 Level AA is a moving target made up of about fifty success criteria; "we tested with VoiceOver once" is not a strategy for staying compliant as a codebase grows.

I draw a hard line between three states a codebase can be in:

StateWhat it meansWhat it requires
Aspirationally accessibleSome components are good; some aren't; nobody knows whichHope and a styleguide
Audited accessibleA scan happened once and findings got fixedDiscipline for a sprint
Continuously accessibleEvery commit is gated by a deterministic check; regressions can't mergeEngineering, scripts, codemods

We were aspirationally accessible. We're now continuously accessible. That transition required three things: a script that catches regressions deterministically, a codebase pattern library that makes the right thing easy, and a set of codemods that did one-time mass fixes without humans touching three thousand files by hand.

The shape of the problem

AccelaStudy AI is one product in a fleet. The fleet shares a design system, a console-simulator library, an activities library, an authentication shell, and a layer of UI primitives. Around that core sit the consumer apps (avian-app-web, avian-app-electron), the recruiter and enterprise surfaces, the internal tools (twenty-one of them, from a Kanban board to a calendar to a unified observability dashboard), and the marketing websites. Fifty-six repositories, all in one monorepo, all needing to pass the same accessibility bar.

After running the first deterministic audit pass against fresh main, the script reported:

SeverityFindings
CRITICAL0
HIGH414
MEDIUM312
LOW0
Total726

Six hundred and twenty-six issues clustered into a small number of patterns. Seventy-one percent of the HIGH findings were in two repositories: the cloud-console simulator (avian-console-sim-react, used by labs across forty-five certifications) and a static-site dist/ folder that hadn't been rebuilt in a month. Once I clustered the findings by rule and repository, the path to zero became obvious.

The audit script

The first thing I built was the auditor itself. The premise: anything mechanical should be deterministic. Color contrast and focus-trap correctness need eyes; missing alt attributes do not.

The script is avian-audits/scripts/accessibility_audit.py, stdlib-only Python, about seven hundred lines. It implements fifteen rules covering the mechanical phases of WCAG 2.1 AA:

RuleSeverityWCAGWhat it catches
img-missing-altHIGH1.1.1<img> without alt=
svg-info-no-ariaHIGH1.1.1Standalone informational <svg> lacking role="img" + aria-label (and not inside a labelled control)
canvas-no-altHIGH1.1.1<canvas> without aria-label/aria-labelledby/aria-hidden
input-no-accessible-nameHIGH1.3.1Form input with no label / aria-label / aria-labelledby / wrapping label
input-adjacent-label-no-htmlforMEDIUM1.3.1Sibling <label> exists but no htmlFor/id link
multiple-h1MEDIUM1.3.1More than one <h1> per page-tree component
missing-main-landmarkHIGH1.3.1Repo with no <main> anywhere
missing-skip-linkHIGH2.4.1Repo with a layout but no skip-to-main-content link
html-no-langHIGH3.1.1HTML file with no lang on <html>
generic-link-textMEDIUM2.4.4"click here" / "learn more" inside <a> / <Link>
clickable-non-interactiveHIGH2.1.1<div>/<span>/<li> with onClick lacking role + tabIndex + onKeyDown
aria-hidden-focusableHIGH4.1.2Focusable element with aria-hidden="true" and no tabIndex={-1}
positive-tabindexMEDIUM2.4.3tabIndex={N>0}
outline-none-no-replacementHIGH2.4.7CSS outline:none on :focus with no replacement; Tailwind focus(-visible):outline-none without a ring/outline/border replacement
no-reduced-motion-guardMEDIUM2.3.3Repo defines @keyframes but no CSS file references prefers-reduced-motion
missing-sr-only-utilityMEDIUM1.3.1Repo never references a sr-only / visually-hidden utility

The script auto-discovers UI-bearing repositories by walking the conventional monorepo parents (clients/, libs/, tools/, websites/, services/, automations/) and identifying any directory that contains TSX, JSX, or HTML files. It honors a small exemption list for repos with no UI surface (server-only services, layout-engine libraries, design archives). Test files, node_modules/, dist/, build/, coverage/, htmlcov/, and per-repo docs/ and examples/ paths are skipped.

The clever part is the JSX tokenizer. Every JSX-aware tool I tried (regex, tree-sitter, Babel) had a tradeoff. Regex is fast but breaks on onClick={() => x > 0} because the inner > is read as a tag close. Babel is correct but slow and brings a parser dependency I didn't want in an audit script. So I wrote a small character-by-character walker that tracks brace depth and string state: when it sees <, it walks forward, ignoring > inside {...} and string literals, until it finds the real closing > of the opening tag. About eighty lines. Every rule checks attributes via this walker, which means the script reads JSX correctly the first time on every run.

The script writes two artifacts:

  • avian-audits/reports/accessibility-audit-report-YYYY-MM-DD.md — Markdown summary with by-repo and by-rule tables
  • avian-audits/reports/accessibility-audit-findings-YYYY-MM-DD.json — machine-readable findings for CI, dashboards, diffing across runs

In Mode 1 (the default), it exits with code 2 if any HIGH or CRITICAL findings remain. That's the gate.

Heuristics worth their weight

Determinism is great until your script flags fifty true positives and twelve false ones, and the false-positive review burns more time than the fixes. Three heuristic refinements made the audit usable in practice:

  1. Skip aria-hidden inputs and divs. An input with aria-hidden="true" is by definition not in the accessibility tree. Honeypot inputs, autofill catchers, hidden file pickers triggered by a styled button — all of these legitimately omit aria-label. Same logic for <div onClick aria-hidden="true"> modal backdrops.
  2. Skip spread-prop forwarders. A generic component like <Input ref={ref} {...props} /> delegates aria-label to the consumer. The wrapper itself can't statically declare a label. The script checks for {...spread} syntax and exempts the element.
  3. Recognize the conditional-attribute pattern. A common React idiom is <div role={cond ? 'button' : undefined} tabIndex={cond ? 0 : undefined} onKeyDown={cond ? handler : undefined}>. Statically, we can't verify the conditions match, but the developer is clearly aware of the requirement. When all three attributes appear as JSX expressions, the script treats the element as authored-correctly. (The original implementation had a bug: role was already brace-stripped before this check ran. Took me a few minutes to notice the heuristic was a no-op.)

These three suppressions cleared dozens of confirmed false positives without weakening real-defect detection. Every suppression is documented inline in the script alongside the rule it modifies, so a future engineer reading accessibility_audit.py understands not just what's checked but what's deliberately not checked.

The fix waves

I ran the audit, looked at the clustering, and built a wave plan. Each wave targeted a class of fix, not a repository. That ordering matters: fixing the design-system primitives first means downstream consumers inherit the fixes, and a good codemod beats a hundred manual edits.

Initial audit414 HIGH, 312 MEDIUMWave 1: activities-react5 fixesWave 2: console-sim codemod2,235 input/label pairings + 8 primitivesWave 3: client apps subagent59 fixesWave 4: tools fleet subagent136 fixes across 21 toolsWave 5: heuristic refinements +17 manual cleanup fixesWave 6: cloudops dist sed71 stale dist HTML fixesWave 7: console-sim views subagent118 view-level fixesWave 8: final stragglers2 fixesWave 9: MEDIUM cleanup676 fixesFinal state0 findings at every severity
The nine-wave fix sequence: 414 HIGH findings to zero across all severities

The biggest single intervention was a Python codemod. The console-sim repository had a stereotypical pattern across 257 dashboard files: <label>Field name</label> <input ... />, with no htmlFor/id linkage. Visually, the labels lined up with the inputs. Programmatically, no screen reader knew the input's name. A codemod that walks the JSX, finds adjacent <label> and input pairs, generates a stable id from the input's data-testid (or a slug of the label text if no testid), and rewrites both the label and the input to carry the linkage took about two hundred lines and fixed 2,235 inputs in eight seconds. That's three orders of magnitude faster than the equivalent human pass and one order of magnitude more reliable.

A second codemod handled <h1> proliferation. Pages in the simulator render multiple panels, each with its own page-level heading. Source-level, that means multiple <h1> per file; runtime, only one panel renders at a time. The codemod kept the first <h1> per file and demoted the rest to <h2>. 594 demotions across 241 files.

A third codemod added aria-label to inputs that had a data-testid but no preceding sibling label, deriving the label from the testid (e.g., net-add-device-name becomes "Device name"). I had to rewrite this one twice; the first version's regex broke on JSX attribute values containing arrow functions. Brace-aware tokenizers earn their keep.

The waves outside the codemods went to subagents. A subagent is just an LLM session with a specific task, scoped to a list of files, with the audit findings JSON as input. I dispatched two in parallel (one for client apps, one for the tools fleet) and they came back with 195 fixes between them. The agents apply the fixes; I review the patterns and the typecheck output. The pattern review caught three confirmed false positives in client apps that the script's heuristics didn't yet suppress, which became the basis for Wave 5's heuristic refinements.

What's actually in the codebase now

A snapshot of the AccelaStudy AI surface area, post-audit:

MetricCount
UI-bearing repositories56
TSX/JSX source files (production)2,185
Native <button> elements2,527
Form inputs (input / select / textarea)3,486
<a href> linksthousands (uncounted; per-repo)
Total ARIA attribute uses3,019
Explicit aria-label uses1,530
aria-hidden uses (decorative)752
Total role= uses1,001
role="button" (custom clickable elements)101
role="img" (canvas/SVG with text alternative)112
role="dialog" (modal containers)109
tabIndex uses150
onKeyDown handlers (custom keyboard support)208
Tailwind focus-visible: ring classes102
Skip-to-main-content references200+
Total fixes shipped in one day3,317
HIGH findings before / after414 / 0
Findings at every severity, after0

Three thousand nineteen ARIA attributes is not a vanity number. It's the count of places we've explicitly chosen to extend or refine the accessibility tree beyond what the native HTML provides. Every one of those is a design decision that the audit will catch the regression on.

Two thousand five hundred twenty-seven native <button> elements is the more important metric, because it's the count of things we didn't have to make accessible by hand. Native semantics are the foundation; ARIA is the extension. The codebase leans heavily on native semantics: buttons, anchors, fieldsets, labels, headings. The ARIA layer covers the visualizations (the Knowledge Map canvas, the Behavioral Rings SVG, the Ring Forge), the custom widgets (the lab console toolbar, the segmented billing-cadence selector, the keyboard-driven drag-and-drop in activities), and the live regions (toasts, exam timers, chat output, narration logs).

Patterns that did the heavy lifting

A few patterns recur across the codebase. They're worth naming because they encode the "shape" of an accessible component once, and every consumer inherits the shape.

The interactive-checkbox pattern

<div
  role="checkbox"
  tabIndex={0}
  aria-checked={isComplete}
  onClick={() => toggle(id)}
  onKeyDown={(e) => {
    if (e.key === ' ' || e.key === 'Enter') {
      e.preventDefault();
      toggle(id);
    }
  }}
>
  {label}
</div>

Used wherever a styled checkbox replaces the native control. The four ingredients (role, tabIndex, click handler, key handler) are non-negotiable; the audit script enforces all four together.

The dialog backdrop

<div className={styles.overlay} role="presentation" onClick={onClose}>
  <div role="dialog" aria-modal="true" aria-labelledby="title">
    <h2 id="title">Confirm</h2>
    {/* content */}
  </div>
</div>

role="presentation" removes the backdrop from the accessibility tree. The inner <div role="dialog"> carries the focus trap, the labelled-by reference, and the Escape-key handler. The audit catches backdrops that pretend to be buttons (and would pollute the keyboard tab order) and silences the rule for this pattern.

The decorative SVG inside a labelled control

<button aria-label="Close">
  <svg aria-hidden="true" focusable="false"></svg>
</button>

Every icon button. Every Lucide / Phosphor / Heroicons reference. The button carries the name; the SVG is a glyph, not content. Marking the SVG aria-hidden plus focusable="false" keeps it out of the accessibility tree and out of the tab order on legacy browsers.

The progress bar wrapper

<div
  role="progressbar"
  aria-valuenow={Math.round(percent)}
  aria-valuemin={0}
  aria-valuemax={100}
  aria-label={___PRESERVE_BLOCK_11___}
>
  <div className={styles.fill} style={{ width: ___PRESERVE_BLOCK_12___ }} aria-hidden="true" />
</div>

Used for the radial progress on the certifications page, the per-domain bars on the exam score report, and the mastery bar on the activity sidebar. The wrapper carries the role and the values; the inner fill is decorative.

The radio-group with arrow keys

<div
  role="radiogroup"
  aria-label="Billing cadence"
  onKeyDown={(e) => {
    if (['ArrowRight', 'ArrowDown', 'ArrowLeft', 'ArrowUp'].includes(e.key)) {
      e.preventDefault();
      onChange(billing === 'monthly' ? 'annual' : 'monthly');
    }
  }}
>
  <IntervalButton role="radio" aria-checked={isMonthly} tabIndex={isMonthly ? 0 : -1}  />
  <IntervalButton role="radio" aria-checked={isAnnual}  tabIndex={isAnnual  ? 0 : -1}  />
</div>

This is the WAI-ARIA radio pattern: only the selected option is in the tab order; arrow keys cycle through the rest. The subscribe flow's monthly/annual toggle uses it. Without the arrow-key handler, keyboard users couldn't discover the second option; the audit catches that omission.

What the script does not check

The script catches the mechanical eighty percent. It does not catch:

  • Color contrast. Computed colors per theme, against per-component backgrounds, with consideration for state (hover, disabled, focus). This needs axe-core's color-contrast rule running against rendered DOM in a real browser.
  • Touch target size. A <button> that's correctly labelled but only 24 pixels tall fails WCAG 2.5.5 on mobile. Computing the box model needs a layout engine.
  • Modal focus-trap correctness. Catching whether Tab loops within the modal and Escape closes it requires actual interaction, not static analysis.
  • Custom ARIA widget pattern correctness. A <tablist> + <tab> + <tabpanel> triple needs aria-controls on each tab pointing to a panel id and aria-labelledby on each panel pointing back. The script catches missing roles; it doesn't catch wiring errors between the three.
  • Screen-reader narrative quality. "Knowledge map showing 400 concepts: 60% mastered, 25% in progress, 15% not started" is a meaningful text alternative for a canvas. "Knowledge map" is not. The script verifies the attribute exists; it doesn't verify the words inside it convey the data.
  • Activity-format keyboard semantics. A drag-and-drop activity needs Space-to-grab, arrow-keys-to-move, Enter-to-drop, Escape-to-cancel, and live-region announcements of position. The script verifies "an onKeyDown exists"; it doesn't verify the full pattern.

These are the LLM-driven phases of the spec. They run after the script passes, on a slower cadence, and they need a human to confirm the result. We have an axe-core Playwright sweep across thirty-four routes for color contrast and a manual screen-reader pass that goes into a release readiness checklist.

What it actually takes

After three thousand-plus fixes in a day, here's what I think is non-negotiable for a fully accessible product, and what's nice-to-have:

AreaNon-negotiableNice-to-have
AuditDeterministic script in CI; exit non-zero on HIGH/CRITICALCoverage dashboard; per-rule trend charts
SpecSingle source-of-truth doc; modes formalizedRendered as a website page
CodemodsReusable for the most-common bulk fixesPlugin into a pre-commit hook
PatternsDocumented and exemplified in the design systemStorybook stories for each pattern
Native HTMLButtons, anchors, fieldsets, labels — used over div+role wherever possible
ARIAUsed to extend, never replace, native semantics
FocusVisible indicator on every focusable element; :focus-visible not :focusHigh-contrast mode tested
KeyboardEvery interactive control reachable; arrow-key patterns where applicableTab-order Playwright tests
Skip-linkPresent at the top of every shell layoutMultiple targets (e.g., to nav, to main)
langSet on <html> for every pagePer-section overrides for non-English content
Reduced motion@media (prefers-reduced-motion: reduce) guard wherever animations existPer-component opt-outs
Screen-reader testingManual pass with at least one of NVDA / VoiceOver / JAWS before each releaseRecorded passes for regression comparison
Color contrastVerified per themeComputed in CI

The first row is what gates a release. The rest gets there over time. The script catches the "broken at all" cases. Manual review catches the "could be better" cases. Both have a place; neither is sufficient on its own.

Continuous, not episodic

What I care most about is what happens in three months, when we ship a new activity format, a new lab dashboard, a new tool. The work this week was finite; the discipline is continuous.

The discipline lives in three places:

  1. avian-audits/accessibility-audit.md is the spec. It defines the target standard (WCAG 2.1 AA), the rules, the severities, the audit modes, and the fixes-section patterns. It updates in the same commit as the script. Treat it like an ADR.
  2. avian-audits/scripts/accessibility_audit.py is the executable. CI runs it. Mode 1 exits non-zero on HIGH/CRITICAL. Pull requests that introduce regressions get blocked at the review gate.
  3. The fix sections of the spec are the codemod inventory. When a new mechanical pattern surfaces, the rule and the codemod ship together.

Every six weeks, someone runs Mode 2 (which enforces the MEDIUM gate too). MEDIUMs accumulate slowly in a healthy codebase; the slower cadence is appropriate. The really judgment-heavy phases — color contrast, touch targets, modal focus traps, screen-reader quality — run on release boundaries, not on every commit.

If you're starting from where we were a week ago, my advice is: write the script first. Don't write the report; don't make the slide deck; don't even fix anything. Write the script. The script gives you a baseline number, the baseline tells you the size of the problem, and the size of the problem tells you whether to fix by hand, by codemod, or by subagent. Once the script is in place, every fix is cheap and every regression is impossible. That's the difference between aspirationally accessible and continuously accessible, and it's a one-week investment for a permanent payoff.

Numbers I want you to take away

  • Fifty-six UI-bearing repositories audited. None excluded.
  • Two thousand one hundred eighty-five TSX and JSX source files scanned in production code paths.
  • Three thousand nineteen explicit ARIA attribute uses across the codebase. Every one of them is a deliberate design decision the audit catches if regressed.
  • Three thousand four hundred eighty-six form inputs, every one with an accessible name (label, aria-label, or wrapping label).
  • Three thousand three hundred seventeen fixes shipped in a single day across nine fix waves and seven codemods.
  • Two thousand two hundred thirty-five of those fixes came from a single 200-line Python codemod.
  • Four hundred fourteen HIGH-severity findings became zero. Zero CRITICAL throughout. Zero MEDIUM and zero LOW after the cleanup wave.

A learner using a screen reader, a keyboard, switch device, voice control, magnification, or reduced-motion settings can now use AccelaStudy AI without hitting a barrier any of the rest of us would notice. That's not a finishing line; that's a starting point. It's also the bar every product in our fleet, and every team I work with, should be willing to clear.

About These Records
These time records capture personal project work done with Claude Code (Anthropic) only. They do not include work done with ChatGPT (OpenAI), Gemini (Google), Grok (xAI), or other models, all of which I use extensively. Client work is also excluded, despite being primarily Claude Code. The actual total AI-assisted output for any given day is substantially higher than what appears here.