Accessibility is the kind of work that gets pushed to the next sprint until something forces the issue. The forcing function for AccelaStudy AI was the launch window. We had fifty-six user-facing repositories across consumer apps, enterprise tools, internal back-office surfaces, and marketing sites, and we had a brand promise that adaptive learning works for every learner. Every learner means every learner; a screen-reader user, a switch-device user, a keyboard-only user, a person magnifying the screen 200%. None of those people care that we built a beautiful adaptive engine if the buttons aren't reachable, the inputs aren't labeled, or the focus ring is invisible.
This article is the autopsy of one day's accessibility work. The numbers are real. The patterns are reusable. If you have a JavaScript codebase north of a thousand files and you've been treating WCAG as a checklist instead of as engineering, this is the playbook.
What "fully accessible" actually means
Plenty of apps claim to be accessible. Far fewer pass an actual audit. Almost none have an audit that runs in CI and refuses to ship regressions. The difference matters. WCAG 2.1 Level AA is a moving target made up of about fifty success criteria; "we tested with VoiceOver once" is not a strategy for staying compliant as a codebase grows.
I draw a hard line between three states a codebase can be in:
| State | What it means | What it requires |
|---|---|---|
| Aspirationally accessible | Some components are good; some aren't; nobody knows which | Hope and a styleguide |
| Audited accessible | A scan happened once and findings got fixed | Discipline for a sprint |
| Continuously accessible | Every commit is gated by a deterministic check; regressions can't merge | Engineering, scripts, codemods |
We were aspirationally accessible. We're now continuously accessible. That transition required three things: a script that catches regressions deterministically, a codebase pattern library that makes the right thing easy, and a set of codemods that did one-time mass fixes without humans touching three thousand files by hand.
The shape of the problem
AccelaStudy AI is one product in a fleet. The fleet shares a design system, a console-simulator library, an activities library, an authentication shell, and a layer of UI primitives. Around that core sit the consumer apps (avian-app-web, avian-app-electron), the recruiter and enterprise surfaces, the internal tools (twenty-one of them, from a Kanban board to a calendar to a unified observability dashboard), and the marketing websites. Fifty-six repositories, all in one monorepo, all needing to pass the same accessibility bar.
After running the first deterministic audit pass against fresh main, the script reported:
| Severity | Findings |
|---|---|
| CRITICAL | 0 |
| HIGH | 414 |
| MEDIUM | 312 |
| LOW | 0 |
| Total | 726 |
Six hundred and twenty-six issues clustered into a small number of patterns. Seventy-one percent of the HIGH findings were in two repositories: the cloud-console simulator (avian-console-sim-react, used by labs across forty-five certifications) and a static-site dist/ folder that hadn't been rebuilt in a month. Once I clustered the findings by rule and repository, the path to zero became obvious.
The audit script
The first thing I built was the auditor itself. The premise: anything mechanical should be deterministic. Color contrast and focus-trap correctness need eyes; missing alt attributes do not.
The script is avian-audits/scripts/accessibility_audit.py, stdlib-only Python, about seven hundred lines. It implements fifteen rules covering the mechanical phases of WCAG 2.1 AA:
| Rule | Severity | WCAG | What it catches |
|---|---|---|---|
img-missing-alt | HIGH | 1.1.1 | <img> without alt= |
svg-info-no-aria | HIGH | 1.1.1 | Standalone informational <svg> lacking role="img" + aria-label (and not inside a labelled control) |
canvas-no-alt | HIGH | 1.1.1 | <canvas> without aria-label/aria-labelledby/aria-hidden |
input-no-accessible-name | HIGH | 1.3.1 | Form input with no label / aria-label / aria-labelledby / wrapping label |
input-adjacent-label-no-htmlfor | MEDIUM | 1.3.1 | Sibling <label> exists but no htmlFor/id link |
multiple-h1 | MEDIUM | 1.3.1 | More than one <h1> per page-tree component |
missing-main-landmark | HIGH | 1.3.1 | Repo with no <main> anywhere |
missing-skip-link | HIGH | 2.4.1 | Repo with a layout but no skip-to-main-content link |
html-no-lang | HIGH | 3.1.1 | HTML file with no lang on <html> |
generic-link-text | MEDIUM | 2.4.4 | "click here" / "learn more" inside <a> / <Link> |
clickable-non-interactive | HIGH | 2.1.1 | <div>/<span>/<li> with onClick lacking role + tabIndex + onKeyDown |
aria-hidden-focusable | HIGH | 4.1.2 | Focusable element with aria-hidden="true" and no tabIndex={-1} |
positive-tabindex | MEDIUM | 2.4.3 | tabIndex={N>0} |
outline-none-no-replacement | HIGH | 2.4.7 | CSS outline:none on :focus with no replacement; Tailwind focus(-visible):outline-none without a ring/outline/border replacement |
no-reduced-motion-guard | MEDIUM | 2.3.3 | Repo defines @keyframes but no CSS file references prefers-reduced-motion |
missing-sr-only-utility | MEDIUM | 1.3.1 | Repo never references a sr-only / visually-hidden utility |
The script auto-discovers UI-bearing repositories by walking the conventional monorepo parents (clients/, libs/, tools/, websites/, services/, automations/) and identifying any directory that contains TSX, JSX, or HTML files. It honors a small exemption list for repos with no UI surface (server-only services, layout-engine libraries, design archives). Test files, node_modules/, dist/, build/, coverage/, htmlcov/, and per-repo docs/ and examples/ paths are skipped.
The clever part is the JSX tokenizer. Every JSX-aware tool I tried (regex, tree-sitter, Babel) had a tradeoff. Regex is fast but breaks on onClick={() => x > 0} because the inner > is read as a tag close. Babel is correct but slow and brings a parser dependency I didn't want in an audit script. So I wrote a small character-by-character walker that tracks brace depth and string state: when it sees <, it walks forward, ignoring > inside {...} and string literals, until it finds the real closing > of the opening tag. About eighty lines. Every rule checks attributes via this walker, which means the script reads JSX correctly the first time on every run.
The script writes two artifacts:
avian-audits/reports/accessibility-audit-report-YYYY-MM-DD.md— Markdown summary with by-repo and by-rule tablesavian-audits/reports/accessibility-audit-findings-YYYY-MM-DD.json— machine-readable findings for CI, dashboards, diffing across runs
In Mode 1 (the default), it exits with code 2 if any HIGH or CRITICAL findings remain. That's the gate.
Heuristics worth their weight
Determinism is great until your script flags fifty true positives and twelve false ones, and the false-positive review burns more time than the fixes. Three heuristic refinements made the audit usable in practice:
- Skip aria-hidden inputs and divs. An input with
aria-hidden="true"is by definition not in the accessibility tree. Honeypot inputs, autofill catchers, hidden file pickers triggered by a styled button — all of these legitimately omitaria-label. Same logic for<div onClick aria-hidden="true">modal backdrops. - Skip spread-prop forwarders. A generic component like
<Input ref={ref} {...props} />delegatesaria-labelto the consumer. The wrapper itself can't statically declare a label. The script checks for{...spread}syntax and exempts the element. - Recognize the conditional-attribute pattern. A common React idiom is
<div role={cond ? 'button' : undefined} tabIndex={cond ? 0 : undefined} onKeyDown={cond ? handler : undefined}>. Statically, we can't verify the conditions match, but the developer is clearly aware of the requirement. When all three attributes appear as JSX expressions, the script treats the element as authored-correctly. (The original implementation had a bug:rolewas already brace-stripped before this check ran. Took me a few minutes to notice the heuristic was a no-op.)
These three suppressions cleared dozens of confirmed false positives without weakening real-defect detection. Every suppression is documented inline in the script alongside the rule it modifies, so a future engineer reading accessibility_audit.py understands not just what's checked but what's deliberately not checked.
The fix waves
I ran the audit, looked at the clustering, and built a wave plan. Each wave targeted a class of fix, not a repository. That ordering matters: fixing the design-system primitives first means downstream consumers inherit the fixes, and a good codemod beats a hundred manual edits.
The biggest single intervention was a Python codemod. The console-sim repository had a stereotypical pattern across 257 dashboard files: <label>Field name</label> <input ... />, with no htmlFor/id linkage. Visually, the labels lined up with the inputs. Programmatically, no screen reader knew the input's name. A codemod that walks the JSX, finds adjacent <label> and input pairs, generates a stable id from the input's data-testid (or a slug of the label text if no testid), and rewrites both the label and the input to carry the linkage took about two hundred lines and fixed 2,235 inputs in eight seconds. That's three orders of magnitude faster than the equivalent human pass and one order of magnitude more reliable.
A second codemod handled <h1> proliferation. Pages in the simulator render multiple panels, each with its own page-level heading. Source-level, that means multiple <h1> per file; runtime, only one panel renders at a time. The codemod kept the first <h1> per file and demoted the rest to <h2>. 594 demotions across 241 files.
A third codemod added aria-label to inputs that had a data-testid but no preceding sibling label, deriving the label from the testid (e.g., net-add-device-name becomes "Device name"). I had to rewrite this one twice; the first version's regex broke on JSX attribute values containing arrow functions. Brace-aware tokenizers earn their keep.
The waves outside the codemods went to subagents. A subagent is just an LLM session with a specific task, scoped to a list of files, with the audit findings JSON as input. I dispatched two in parallel (one for client apps, one for the tools fleet) and they came back with 195 fixes between them. The agents apply the fixes; I review the patterns and the typecheck output. The pattern review caught three confirmed false positives in client apps that the script's heuristics didn't yet suppress, which became the basis for Wave 5's heuristic refinements.
What's actually in the codebase now
A snapshot of the AccelaStudy AI surface area, post-audit:
| Metric | Count |
|---|---|
| UI-bearing repositories | 56 |
| TSX/JSX source files (production) | 2,185 |
Native <button> elements | 2,527 |
| Form inputs (input / select / textarea) | 3,486 |
<a href> links | thousands (uncounted; per-repo) |
| Total ARIA attribute uses | 3,019 |
Explicit aria-label uses | 1,530 |
aria-hidden uses (decorative) | 752 |
Total role= uses | 1,001 |
role="button" (custom clickable elements) | 101 |
role="img" (canvas/SVG with text alternative) | 112 |
role="dialog" (modal containers) | 109 |
tabIndex uses | 150 |
onKeyDown handlers (custom keyboard support) | 208 |
Tailwind focus-visible: ring classes | 102 |
| Skip-to-main-content references | 200+ |
| Total fixes shipped in one day | 3,317 |
| HIGH findings before / after | 414 / 0 |
| Findings at every severity, after | 0 |
Three thousand nineteen ARIA attributes is not a vanity number. It's the count of places we've explicitly chosen to extend or refine the accessibility tree beyond what the native HTML provides. Every one of those is a design decision that the audit will catch the regression on.
Two thousand five hundred twenty-seven native <button> elements is the more important metric, because it's the count of things we didn't have to make accessible by hand. Native semantics are the foundation; ARIA is the extension. The codebase leans heavily on native semantics: buttons, anchors, fieldsets, labels, headings. The ARIA layer covers the visualizations (the Knowledge Map canvas, the Behavioral Rings SVG, the Ring Forge), the custom widgets (the lab console toolbar, the segmented billing-cadence selector, the keyboard-driven drag-and-drop in activities), and the live regions (toasts, exam timers, chat output, narration logs).
Patterns that did the heavy lifting
A few patterns recur across the codebase. They're worth naming because they encode the "shape" of an accessible component once, and every consumer inherits the shape.
The interactive-checkbox pattern
<div
role="checkbox"
tabIndex={0}
aria-checked={isComplete}
onClick={() => toggle(id)}
onKeyDown={(e) => {
if (e.key === ' ' || e.key === 'Enter') {
e.preventDefault();
toggle(id);
}
}}
>
{label}
</div>
Used wherever a styled checkbox replaces the native control. The four ingredients (role, tabIndex, click handler, key handler) are non-negotiable; the audit script enforces all four together.
The dialog backdrop
<div className={styles.overlay} role="presentation" onClick={onClose}>
<div role="dialog" aria-modal="true" aria-labelledby="title">
<h2 id="title">Confirm</h2>
{/* content */}
</div>
</div>
role="presentation" removes the backdrop from the accessibility tree. The inner <div role="dialog"> carries the focus trap, the labelled-by reference, and the Escape-key handler. The audit catches backdrops that pretend to be buttons (and would pollute the keyboard tab order) and silences the rule for this pattern.
The decorative SVG inside a labelled control
<button aria-label="Close">
<svg aria-hidden="true" focusable="false">…</svg>
</button>
Every icon button. Every Lucide / Phosphor / Heroicons reference. The button carries the name; the SVG is a glyph, not content. Marking the SVG aria-hidden plus focusable="false" keeps it out of the accessibility tree and out of the tab order on legacy browsers.
The progress bar wrapper
<div
role="progressbar"
aria-valuenow={Math.round(percent)}
aria-valuemin={0}
aria-valuemax={100}
aria-label={___PRESERVE_BLOCK_11___}
>
<div className={styles.fill} style={{ width: ___PRESERVE_BLOCK_12___ }} aria-hidden="true" />
</div>
Used for the radial progress on the certifications page, the per-domain bars on the exam score report, and the mastery bar on the activity sidebar. The wrapper carries the role and the values; the inner fill is decorative.
The radio-group with arrow keys
<div
role="radiogroup"
aria-label="Billing cadence"
onKeyDown={(e) => {
if (['ArrowRight', 'ArrowDown', 'ArrowLeft', 'ArrowUp'].includes(e.key)) {
e.preventDefault();
onChange(billing === 'monthly' ? 'annual' : 'monthly');
}
}}
>
<IntervalButton role="radio" aria-checked={isMonthly} tabIndex={isMonthly ? 0 : -1} … />
<IntervalButton role="radio" aria-checked={isAnnual} tabIndex={isAnnual ? 0 : -1} … />
</div>
This is the WAI-ARIA radio pattern: only the selected option is in the tab order; arrow keys cycle through the rest. The subscribe flow's monthly/annual toggle uses it. Without the arrow-key handler, keyboard users couldn't discover the second option; the audit catches that omission.
What the script does not check
The script catches the mechanical eighty percent. It does not catch:
- Color contrast. Computed colors per theme, against per-component backgrounds, with consideration for state (hover, disabled, focus). This needs axe-core's color-contrast rule running against rendered DOM in a real browser.
- Touch target size. A
<button>that's correctly labelled but only 24 pixels tall fails WCAG 2.5.5 on mobile. Computing the box model needs a layout engine. - Modal focus-trap correctness. Catching whether Tab loops within the modal and Escape closes it requires actual interaction, not static analysis.
- Custom ARIA widget pattern correctness. A
<tablist>+<tab>+<tabpanel>triple needsaria-controlson each tab pointing to a panel id andaria-labelledbyon each panel pointing back. The script catches missing roles; it doesn't catch wiring errors between the three. - Screen-reader narrative quality. "Knowledge map showing 400 concepts: 60% mastered, 25% in progress, 15% not started" is a meaningful text alternative for a canvas. "Knowledge map" is not. The script verifies the attribute exists; it doesn't verify the words inside it convey the data.
- Activity-format keyboard semantics. A drag-and-drop activity needs Space-to-grab, arrow-keys-to-move, Enter-to-drop, Escape-to-cancel, and live-region announcements of position. The script verifies "an
onKeyDownexists"; it doesn't verify the full pattern.
These are the LLM-driven phases of the spec. They run after the script passes, on a slower cadence, and they need a human to confirm the result. We have an axe-core Playwright sweep across thirty-four routes for color contrast and a manual screen-reader pass that goes into a release readiness checklist.
What it actually takes
After three thousand-plus fixes in a day, here's what I think is non-negotiable for a fully accessible product, and what's nice-to-have:
| Area | Non-negotiable | Nice-to-have |
|---|---|---|
| Audit | Deterministic script in CI; exit non-zero on HIGH/CRITICAL | Coverage dashboard; per-rule trend charts |
| Spec | Single source-of-truth doc; modes formalized | Rendered as a website page |
| Codemods | Reusable for the most-common bulk fixes | Plugin into a pre-commit hook |
| Patterns | Documented and exemplified in the design system | Storybook stories for each pattern |
| Native HTML | Buttons, anchors, fieldsets, labels — used over div+role wherever possible | — |
| ARIA | Used to extend, never replace, native semantics | — |
| Focus | Visible indicator on every focusable element; :focus-visible not :focus | High-contrast mode tested |
| Keyboard | Every interactive control reachable; arrow-key patterns where applicable | Tab-order Playwright tests |
| Skip-link | Present at the top of every shell layout | Multiple targets (e.g., to nav, to main) |
lang | Set on <html> for every page | Per-section overrides for non-English content |
| Reduced motion | @media (prefers-reduced-motion: reduce) guard wherever animations exist | Per-component opt-outs |
| Screen-reader testing | Manual pass with at least one of NVDA / VoiceOver / JAWS before each release | Recorded passes for regression comparison |
| Color contrast | Verified per theme | Computed in CI |
The first row is what gates a release. The rest gets there over time. The script catches the "broken at all" cases. Manual review catches the "could be better" cases. Both have a place; neither is sufficient on its own.
Continuous, not episodic
What I care most about is what happens in three months, when we ship a new activity format, a new lab dashboard, a new tool. The work this week was finite; the discipline is continuous.
The discipline lives in three places:
avian-audits/accessibility-audit.mdis the spec. It defines the target standard (WCAG 2.1 AA), the rules, the severities, the audit modes, and the fixes-section patterns. It updates in the same commit as the script. Treat it like an ADR.avian-audits/scripts/accessibility_audit.pyis the executable. CI runs it. Mode 1 exits non-zero on HIGH/CRITICAL. Pull requests that introduce regressions get blocked at the review gate.- The fix sections of the spec are the codemod inventory. When a new mechanical pattern surfaces, the rule and the codemod ship together.
Every six weeks, someone runs Mode 2 (which enforces the MEDIUM gate too). MEDIUMs accumulate slowly in a healthy codebase; the slower cadence is appropriate. The really judgment-heavy phases — color contrast, touch targets, modal focus traps, screen-reader quality — run on release boundaries, not on every commit.
If you're starting from where we were a week ago, my advice is: write the script first. Don't write the report; don't make the slide deck; don't even fix anything. Write the script. The script gives you a baseline number, the baseline tells you the size of the problem, and the size of the problem tells you whether to fix by hand, by codemod, or by subagent. Once the script is in place, every fix is cheap and every regression is impossible. That's the difference between aspirationally accessible and continuously accessible, and it's a one-week investment for a permanent payoff.
Numbers I want you to take away
- Fifty-six UI-bearing repositories audited. None excluded.
- Two thousand one hundred eighty-five TSX and JSX source files scanned in production code paths.
- Three thousand nineteen explicit ARIA attribute uses across the codebase. Every one of them is a deliberate design decision the audit catches if regressed.
- Three thousand four hundred eighty-six form inputs, every one with an accessible name (label, aria-label, or wrapping label).
- Three thousand three hundred seventeen fixes shipped in a single day across nine fix waves and seven codemods.
- Two thousand two hundred thirty-five of those fixes came from a single 200-line Python codemod.
- Four hundred fourteen HIGH-severity findings became zero. Zero CRITICAL throughout. Zero MEDIUM and zero LOW after the cleanup wave.
A learner using a screen reader, a keyboard, switch device, voice control, magnification, or reduced-motion settings can now use AccelaStudy AI without hitting a barrier any of the rest of us would notice. That's not a finishing line; that's a starting point. It's also the bar every product in our fleet, and every team I work with, should be willing to clear.