2.7 Leverage AI and LLMs Appropriately and Responsibly

Large language models (LLMs) such as Claude, ChatGPT, and Gemini are trained on the code that makes up the public internet.

Unfortunately, there are dominant and incorrect perspectives that view accessibility as an isolated experience or a marginal concern. And since those perspectives have become somewhat institutionalized, agent-generated code is often rife with accessibility defects and anti-patterns. This trend will likely become exacerbated as agentic coding becomes more of a standard practice.

Despite this, LLMs can still be used to help improve accessibility. But it is important to understand their strengths and limitations.

LLMs will not generate accessible code without being explicitly prompted to do so. Even when prompted, solutions must be verified.
Prompt quality matters, and specific, detailed instructions will yield far better results than simply including the word “accessible” in the instructions.
These tools are prone to hallucinations and overconfidence.
LLMs cannot perform accessibility audits, nor are they the right tool for the job. They are not as reliable as dedicated tools like Axe DevTools.

Use LLMs to Support Some Workflows

LLMs can significantly reduce the amount of work that often gets neglected, particularly when there are time and resource constraints. However, a human should still be the final decision maker.

Writing alt text for images: Some LLMs are highly adept at describing the contents of an image and could often generate much more useful alternative text. However, they cannot understand the context in which an image is used or whether the image is decorative.
Building micro-interactions: When prompted with explicit details, LLMs can be used to build interactions with keyboard support and focus management as part of initial implementation. However, they cannot determine if the interaction is an accessible UX pattern.
Lowering the testing burden: Writing full test coverage can sometimes take a back seat to feature development, especially when it comes to testing for keyboard navigation or expected screen reader behaviors. Using LLMs can help ensure better test coverage for these input modalities, but it cannot guarantee that a passing test translates to a usable solution.
Using skills and instructions files: Skills and instructions files can help agents learn, retain, and repeat specialized tasks, so prompts won’t require as much detail every time they’re written. The agent will simply find a skill that aligns with its assigned task and use those instructions to inform its output. These skill files can make a significant positive difference, but they do not guarantee accessible solutions every time.

Use LLMs for Research and Education

LLMs excel at synthesizing complex information and brainstorming. Use them to lower the barrier to understanding technical standards and generating ideas. Because these tools are prone to hallucinations and may be trained on bad information, it is important to learn how to be critical of their responses.

Make the Web Content Accessibility Guidelines (WCAG) more approachable: A common criticism of the WCAG is that they can be too dense or technical for non-specialists. LLMs can help explain the success criteria in layperson’s terms and provide more robust examples of how to meet them than the WCAG documentation does.
Rubber-ducking: LLMs can be a sounding board for working through ideas or identifying alternative solutions. Additionally, conversing with models can help workshop talking points for working with stakeholders or team members who might not align on accessibility support.
Finding missing acceptance criteria: Sometimes, acceptance criteria may not include details that explain how a feature works with assistive technology. Use LLMs to review the existing criteria and determine if additional items are needed.

Proceed with Caution

LLMs are powerful tools, but they are very far from perfect. They cannot function as a stand-in for the user, nor are they capable of predicting what a screen reader might say.

Accessibility cannot be simply automated away. It is part of the user experience; it is about humans. Keep focus on delivering value to the user, not on cutting through compliance checklists more quickly.

One of the best ways to learn how to use LLMs for accessibility effectively is to learn accessibility foundations, and by using the guidance already in this playbook. These foundations will not only help inform how to write effective prompts, but also evaluate whether the response solves the problem or makes matters worse.

How We Know We’re Doing This

We continue to test results manually
Our accessibility prompts contain specific instructions that are informed by experience with assistive technologies, and we are reviewing that the results are accurate
We have created guardrails, such as skills or instructions, and verified the results
We are using agents to build features that reference solutions that are known to be accessible
Our agents are helping us achieve better test coverage that includes interactions involving assistive technology

How We Know We’re Coming up Short

We are using LLMs for accessibility, but haven’t built the foundational skills as a team
Our team views coding agents as a shortcut to meet accessibility compliance
We write generic accessibility prompts such as “add aria” and “make accessible”
We use LLMs to handle accessibility without any other changes to our processes, such as shifting left
We are using LLMs to assist with automating tests while neglecting proven, dedicated tooling