Making AccelaStudy AI Accessible to All

Accessibility is the kind of work that gets pushed to the next sprint until something forces the issue. The forcing function for AccelaStudy AI was the launch window. We had fifty-six user-facing repositories across consumer apps, enterprise tools, internal back-office surfaces, and marketing sites, and we had a brand promise that adaptive learning works for every learner. Every learner means every learner; a screen-reader user, a switch-device user, a keyboard-only user, a person magnifying the screen 200%. None of those people care that we built a beautiful adaptive engine if the buttons aren't reachable, the inputs aren't labeled, or the focus ring is invisible.

This article is the autopsy of one day's accessibility work. The numbers are real. The patterns are reusable. If you have a JavaScript codebase north of a thousand files and you've been treating WCAG as a checklist instead of as engineering, this is the playbook.

What "fully accessible" actually means

Plenty of apps claim to be accessible. Far fewer pass an actual audit. Almost none have an audit that runs in CI and refuses to ship regressions. The difference matters. WCAG 2.1 Level AA is a moving target made up of about fifty success criteria; "we tested with VoiceOver once" is not a strategy for staying compliant as a codebase grows.

I draw a hard line between three states a codebase can be in:

State	What it means	What it requires
Aspirationally accessible	Some components are good; some aren't; nobody knows which	Hope and a styleguide
Audited accessible	A scan happened once and findings got fixed	Discipline for a sprint
Continuously accessible	Every commit is gated by a deterministic check; regressions can't merge	Engineering, scripts, codemods

We were aspirationally accessible. We're now continuously accessible. That transition required three things: a script that catches regressions deterministically, a codebase pattern library that makes the right thing easy, and a set of codemods that did one-time mass fixes without humans touching three thousand files by hand.

The shape of the problem

AccelaStudy AI is one product in a fleet. The fleet shares a design system, a console-simulator library, an activities library, an authentication shell, and a layer of UI primitives. Around that core sit the consumer apps (avian-app-web, avian-app-electron), the recruiter and enterprise surfaces, the internal tools (twenty-one of them, from a Kanban board to a calendar to a unified observability dashboard), and the marketing websites. Fifty-six repositories, all in one monorepo, all needing to pass the same accessibility bar.

After running the first deterministic audit pass against fresh main, the script reported:

Severity	Findings
CRITICAL	0
HIGH	414
MEDIUM	312
LOW	0
Total	726

Six hundred and twenty-six issues clustered into a small number of patterns. Seventy-one percent of the HIGH findings were in two repositories: the cloud-console simulator (avian-console-sim-react, used by labs across forty-five certifications) and a static-site dist/ folder that hadn't been rebuilt in a month. Once I clustered the findings by rule and repository, the path to zero became obvious.

The audit script

The first thing I built was the auditor itself. The premise: anything mechanical should be deterministic. Color contrast and focus-trap correctness need eyes; missing alt attributes do not.

The script is avian-audits/scripts/accessibility_audit.py, stdlib-only Python, about seven hundred lines. It implements fifteen rules covering the mechanical phases of WCAG 2.1 AA:

Rule	Severity	WCAG	What it catches
`img-missing-alt`	HIGH	1.1.1	`<img>` without `alt=`
`svg-info-no-aria`	HIGH	1.1.1	Standalone informational `<svg>` lacking `role="img"` + `aria-label` (and not inside a labelled control)
`canvas-no-alt`	HIGH	1.1.1	`<canvas>` without `aria-label`/`aria-labelledby`/`aria-hidden`
`input-no-accessible-name`	HIGH	1.3.1	Form input with no label / aria-label / aria-labelledby / wrapping label
`input-adjacent-label-no-htmlfor`	MEDIUM	1.3.1	Sibling `<label>` exists but no `htmlFor`/`id` link
`multiple-h1`	MEDIUM	1.3.1	More than one `<h1>` per page-tree component
`missing-main-landmark`	HIGH	1.3.1	Repo with no `<main>` anywhere
`missing-skip-link`	HIGH	2.4.1	Repo with a layout but no skip-to-main-content link
`html-no-lang`	HIGH	3.1.1	HTML file with no `lang` on `<html>`
`generic-link-text`	MEDIUM	2.4.4	"click here" / "learn more" inside `<a>` / `<Link>`
`clickable-non-interactive`	HIGH	2.1.1	`<div>`/`<span>`/`<li>` with `onClick` lacking `role` + `tabIndex` + `onKeyDown`
`aria-hidden-focusable`	HIGH	4.1.2	Focusable element with `aria-hidden="true"` and no `tabIndex={-1}`
`positive-tabindex`	MEDIUM	2.4.3	`tabIndex={N>0}`
`outline-none-no-replacement`	HIGH	2.4.7	CSS `outline:none` on `:focus` with no replacement; Tailwind `focus(-visible):outline-none` without a ring/outline/border replacement
`no-reduced-motion-guard`	MEDIUM	2.3.3	Repo defines `@keyframes` but no CSS file references `prefers-reduced-motion`
`missing-sr-only-utility`	MEDIUM	1.3.1	Repo never references a `sr-only` / `visually-hidden` utility

The script auto-discovers UI-bearing repositories by walking the conventional monorepo parents (clients/, libs/, tools/, websites/, services/, automations/) and identifying any directory that contains TSX, JSX, or HTML files. It honors a small exemption list for repos with no UI surface (server-only services, layout-engine libraries, design archives). Test files, node_modules/, dist/, build/, coverage/, htmlcov/, and per-repo docs/ and examples/ paths are skipped.

The clever part is the JSX tokenizer. Every JSX-aware tool I tried (regex, tree-sitter, Babel) had a tradeoff. Regex is fast but breaks on onClick={() => x > 0} because the inner > is read as a tag close. Babel is correct but slow and brings a parser dependency I didn't want in an audit script. So I wrote a small character-by-character walker that tracks brace depth and string state: when it sees <, it walks forward, ignoring > inside {...} and string literals, until it finds the real closing > of the opening tag. About eighty lines. Every rule checks attributes via this walker, which means the script reads JSX correctly the first time on every run.

The script writes two artifacts:

avian-audits/reports/accessibility-audit-report-YYYY-MM-DD.md — Markdown summary with by-repo and by-rule tables
avian-audits/reports/accessibility-audit-findings-YYYY-MM-DD.json — machine-readable findings for CI, dashboards, diffing across runs

In Mode 1 (the default), it exits with code 2 if any HIGH or CRITICAL findings remain. That's the gate.

Heuristics worth their weight

Determinism is great until your script flags fifty true positives and twelve false ones, and the false-positive review burns more time than the fixes. Three heuristic refinements made the audit usable in practice:

Skip aria-hidden inputs and divs. An input with aria-hidden="true" is by definition not in the accessibility tree. Honeypot inputs, autofill catchers, hidden file pickers triggered by a styled button — all of these legitimately omit aria-label. Same logic for <div onClick aria-hidden="true"> modal backdrops.
Skip spread-prop forwarders. A generic component like <Input ref={ref} {...props} /> delegates aria-label to the consumer. The wrapper itself can't statically declare a label. The script checks for {...spread} syntax and exempts the element.
Recognize the conditional-attribute pattern. A common React idiom is <div role={cond ? 'button' : undefined} tabIndex={cond ? 0 : undefined} onKeyDown={cond ? handler : undefined}>. Statically, we can't verify the conditions match, but the developer is clearly aware of the requirement. When all three attributes appear as JSX expressions, the script treats the element as authored-correctly. (The original implementation had a bug: role was already brace-stripped before this check ran. Took me a few minutes to notice the heuristic was a no-op.)

These three suppressions cleared dozens of confirmed false positives without weakening real-defect detection. Every suppression is documented inline in the script alongside the rule it modifies, so a future engineer reading accessibility_audit.py understands not just what's checked but what's deliberately not checked.

The fix waves

I ran the audit, looked at the clustering, and built a wave plan. Each wave targeted a class of fix, not a repository. That ordering matters: fixing the design-system primitives first means downstream consumers inherit the fixes, and a good codemod beats a hundred manual edits.

The nine-wave fix sequence: 414 HIGH findings to zero across all severities

The biggest single intervention was a Python codemod. The console-sim repository had a stereotypical pattern across 257 dashboard files: <label>Field name</label> <input ... />, with no htmlFor/id linkage. Visually, the labels lined up with the inputs. Programmatically, no screen reader knew the input's name. A codemod that walks the JSX, finds adjacent <label> and input pairs, generates a stable id from the input's data-testid (or a slug of the label text if no testid), and rewrites both the label and the input to carry the linkage took about two hundred lines and fixed 2,235 inputs in eight seconds. That's three orders of magnitude faster than the equivalent human pass and one order of magnitude more reliable.

A second codemod handled <h1> proliferation. Pages in the simulator render multiple panels, each with its own page-level heading. Source-level, that means multiple <h1> per file; runtime, only one panel renders at a time. The codemod kept the first <h1> per file and demoted the rest to <h2>. 594 demotions across 241 files.

A third codemod added aria-label to inputs that had a data-testid but no preceding sibling label, deriving the label from the testid (e.g., net-add-device-name becomes "Device name"). I had to rewrite this one twice; the first version's regex broke on JSX attribute values containing arrow functions. Brace-aware tokenizers earn their keep.

The waves outside the codemods went to subagents. A subagent is just an LLM session with a specific task, scoped to a list of files, with the audit findings JSON as input. I dispatched two in parallel (one for client apps, one for the tools fleet) and they came back with 195 fixes between them. The agents apply the fixes; I review the patterns and the typecheck output. The pattern review caught three confirmed false positives in client apps that the script's heuristics didn't yet suppress, which became the basis for Wave 5's heuristic refinements.

What's actually in the codebase now

A snapshot of the AccelaStudy AI surface area, post-audit:

Metric	Count
UI-bearing repositories	56
TSX/JSX source files (production)	2,185
Native `<button>` elements	2,527
Form inputs (input / select / textarea)	3,486
`<a href>` links	thousands (uncounted; per-repo)
Total ARIA attribute uses	3,019
Explicit `aria-label` uses	1,530
`aria-hidden` uses (decorative)	752
Total `role=` uses	1,001
`role="button"` (custom clickable elements)	101
`role="img"` (canvas/SVG with text alternative)	112
`role="dialog"` (modal containers)	109
`tabIndex` uses	150
`onKeyDown` handlers (custom keyboard support)	208
Tailwind `focus-visible:` ring classes	102
Skip-to-main-content references	200+
Total fixes shipped in one day	3,317
HIGH findings before / after	414 / 0
Findings at every severity, after	0

Three thousand nineteen ARIA attributes is not a vanity number. It's the count of places we've explicitly chosen to extend or refine the accessibility tree beyond what the native HTML provides. Every one of those is a design decision that the audit will catch the regression on.

Two thousand five hundred twenty-seven native <button> elements is the more important metric, because it's the count of things we didn't have to make accessible by hand. Native semantics are the foundation; ARIA is the extension. The codebase leans heavily on native semantics: buttons, anchors, fieldsets, labels, headings. The ARIA layer covers the visualizations (the Knowledge Map canvas, the Behavioral Rings SVG, the Ring Forge), the custom widgets (the lab console toolbar, the segmented billing-cadence selector, the keyboard-driven drag-and-drop in activities), and the live regions (toasts, exam timers, chat output, narration logs).

Patterns that did the heavy lifting

A few patterns recur across the codebase. They're worth naming because they encode the "shape" of an accessible component once, and every consumer inherits the shape.

The interactive-checkbox pattern

<div
  role="checkbox"
  tabIndex={0}
  aria-checked={isComplete}
  onClick={() => toggle(id)}
  onKeyDown={(e) => {
    if (e.key === ' ' || e.key === 'Enter') {
      e.preventDefault();
      toggle(id);
    }
  }}
>
  {label}
</div>

Used wherever a styled checkbox replaces the native control. The four ingredients (role, tabIndex, click handler, key handler) are non-negotiable; the audit script enforces all four together.

The dialog backdrop

<div className={styles.overlay} role="presentation" onClick={onClose}>
  <div role="dialog" aria-modal="true" aria-labelledby="title">
    <h2 id="title">Confirm</h2>
    {/* content */}
  </div>
</div>

role="presentation" removes the backdrop from the accessibility tree. The inner <div role="dialog"> carries the focus trap, the labelled-by reference, and the Escape-key handler. The audit catches backdrops that pretend to be buttons (and would pollute the keyboard tab order) and silences the rule for this pattern.

The decorative SVG inside a labelled control

<button aria-label="Close">
  <svg aria-hidden="true" focusable="false">…</svg>
</button>

Every icon button. Every Lucide / Phosphor / Heroicons reference. The button carries the name; the SVG is a glyph, not content. Marking the SVG aria-hidden plus focusable="false" keeps it out of the accessibility tree and out of the tab order on legacy browsers.

The progress bar wrapper

<div
  role="progressbar"
  aria-valuenow={Math.round(percent)}
  aria-valuemin={0}
  aria-valuemax={100}
  aria-label={___PRESERVE_BLOCK_11___}
>
  <div className={styles.fill} style={{ width: ___PRESERVE_BLOCK_12___ }} aria-hidden="true" />
</div>

Used for the radial progress on the certifications page, the per-domain bars on the exam score report, and the mastery bar on the activity sidebar. The wrapper carries the role and the values; the inner fill is decorative.

The radio-group with arrow keys

<div
  role="radiogroup"
  aria-label="Billing cadence"
  onKeyDown={(e) => {
    if (['ArrowRight', 'ArrowDown', 'ArrowLeft', 'ArrowUp'].includes(e.key)) {
      e.preventDefault();
      onChange(billing === 'monthly' ? 'annual' : 'monthly');
    }
  }}
>
  <IntervalButton role="radio" aria-checked={isMonthly} tabIndex={isMonthly ? 0 : -1} … />
  <IntervalButton role="radio" aria-checked={isAnnual}  tabIndex={isAnnual  ? 0 : -1} … />
</div>

This is the WAI-ARIA radio pattern: only the selected option is in the tab order; arrow keys cycle through the rest. The subscribe flow's monthly/annual toggle uses it. Without the arrow-key handler, keyboard users couldn't discover the second option; the audit catches that omission.

What the script does not check

The script catches the mechanical eighty percent. It does not catch:

Color contrast. Computed colors per theme, against per-component backgrounds, with consideration for state (hover, disabled, focus). This needs axe-core's color-contrast rule running against rendered DOM in a real browser.
Touch target size. A <button> that's correctly labelled but only 24 pixels tall fails WCAG 2.5.5 on mobile. Computing the box model needs a layout engine.
Modal focus-trap correctness. Catching whether Tab loops within the modal and Escape closes it requires actual interaction, not static analysis.
Custom ARIA widget pattern correctness. A <tablist> + <tab> + <tabpanel> triple needs aria-controls on each tab pointing to a panel id and aria-labelledby on each panel pointing back. The script catches missing roles; it doesn't catch wiring errors between the three.
Screen-reader narrative quality. "Knowledge map showing 400 concepts: 60% mastered, 25% in progress, 15% not started" is a meaningful text alternative for a canvas. "Knowledge map" is not. The script verifies the attribute exists; it doesn't verify the words inside it convey the data.
Activity-format keyboard semantics. A drag-and-drop activity needs Space-to-grab, arrow-keys-to-move, Enter-to-drop, Escape-to-cancel, and live-region announcements of position. The script verifies "an onKeyDown exists"; it doesn't verify the full pattern.

These are the LLM-driven phases of the spec. They run after the script passes, on a slower cadence, and they need a human to confirm the result. We have an axe-core Playwright sweep across thirty-four routes for color contrast and a manual screen-reader pass that goes into a release readiness checklist.

What it actually takes

After three thousand-plus fixes in a day, here's what I think is non-negotiable for a fully accessible product, and what's nice-to-have:

Area	Non-negotiable	Nice-to-have
Audit	Deterministic script in CI; exit non-zero on HIGH/CRITICAL	Coverage dashboard; per-rule trend charts
Spec	Single source-of-truth doc; modes formalized	Rendered as a website page
Codemods	Reusable for the most-common bulk fixes	Plugin into a pre-commit hook
Patterns	Documented and exemplified in the design system	Storybook stories for each pattern
Native HTML	Buttons, anchors, fieldsets, labels — used over div+role wherever possible	—
ARIA	Used to extend, never replace, native semantics	—
Focus	Visible indicator on every focusable element; `:focus-visible` not `:focus`	High-contrast mode tested
Keyboard	Every interactive control reachable; arrow-key patterns where applicable	Tab-order Playwright tests
Skip-link	Present at the top of every shell layout	Multiple targets (e.g., to nav, to main)
`lang`	Set on `<html>` for every page	Per-section overrides for non-English content
Reduced motion	`@media (prefers-reduced-motion: reduce)` guard wherever animations exist	Per-component opt-outs
Screen-reader testing	Manual pass with at least one of NVDA / VoiceOver / JAWS before each release	Recorded passes for regression comparison
Color contrast	Verified per theme	Computed in CI

The first row is what gates a release. The rest gets there over time. The script catches the "broken at all" cases. Manual review catches the "could be better" cases. Both have a place; neither is sufficient on its own.

Continuous, not episodic

What I care most about is what happens in three months, when we ship a new activity format, a new lab dashboard, a new tool. The work this week was finite; the discipline is continuous.

The discipline lives in three places:

avian-audits/accessibility-audit.md is the spec. It defines the target standard (WCAG 2.1 AA), the rules, the severities, the audit modes, and the fixes-section patterns. It updates in the same commit as the script. Treat it like an ADR.
avian-audits/scripts/accessibility_audit.py is the executable. CI runs it. Mode 1 exits non-zero on HIGH/CRITICAL. Pull requests that introduce regressions get blocked at the review gate.
The fix sections of the spec are the codemod inventory. When a new mechanical pattern surfaces, the rule and the codemod ship together.

Every six weeks, someone runs Mode 2 (which enforces the MEDIUM gate too). MEDIUMs accumulate slowly in a healthy codebase; the slower cadence is appropriate. The really judgment-heavy phases — color contrast, touch targets, modal focus traps, screen-reader quality — run on release boundaries, not on every commit.

If you're starting from where we were a week ago, my advice is: write the script first. Don't write the report; don't make the slide deck; don't even fix anything. Write the script. The script gives you a baseline number, the baseline tells you the size of the problem, and the size of the problem tells you whether to fix by hand, by codemod, or by subagent. Once the script is in place, every fix is cheap and every regression is impossible. That's the difference between aspirationally accessible and continuously accessible, and it's a one-week investment for a permanent payoff.

Numbers I want you to take away

Fifty-six UI-bearing repositories audited. None excluded.
Two thousand one hundred eighty-five TSX and JSX source files scanned in production code paths.
Three thousand nineteen explicit ARIA attribute uses across the codebase. Every one of them is a deliberate design decision the audit catches if regressed.
Three thousand four hundred eighty-six form inputs, every one with an accessible name (label, aria-label, or wrapping label).
Three thousand three hundred seventeen fixes shipped in a single day across nine fix waves and seven codemods.
Two thousand two hundred thirty-five of those fixes came from a single 200-line Python codemod.
Four hundred fourteen HIGH-severity findings became zero. Zero CRITICAL throughout. Zero MEDIUM and zero LOW after the cleanup wave.

A learner using a screen reader, a keyboard, switch device, voice control, magnification, or reduced-motion settings can now use AccelaStudy AI without hitting a barrier any of the rest of us would notice. That's not a finishing line; that's a starting point. It's also the bar every product in our fleet, and every team I work with, should be willing to clear.

About These Records

These time records capture personal project work done with Claude Code (Anthropic) only. They do not include work done with ChatGPT (OpenAI), Gemini (Google), Grok (xAI), or other models, all of which I use extensively. Client work is also excluded, despite being primarily Claude Code. The actual total AI-assisted output for any given day is substantially higher than what appears here.