Producing Accessible Video, Audio, and Multimedia Content

Video, audio, and multimedia are used across the university and medical center and have become a regular part of our daily lives. Digital media is used in our classrooms, in meetings, in presentations, on websites, on social media platforms, in emails, and so on. It is important that these resources are designed to accommodate the diverse needs of all users, promoting equal access and opportunities to engage with the digital media.

The guidance on this page is intended to help users understand accessibility requirements for media and to provide instruction to create accessible content. For detailed information on how to evaluate exiting media, please review Evaluating Video, Audio, and Multimedia Content for Accessibility.

What is accessible video, audio, and multimedia?

Accessible video, audio, and multimedia can be used and understood by everyone. Video, audio, and multimedia are designed with best practices that meet the needs of people with disabilities and works better for everyone.

Basic Accessibility Requirements

Pre-recorded video with audio (multimedia)

Provide captions that are synchronized with the audio and include text descriptions of audio content.
Provide either an audio description track that describes the visual elements of the video or a text transcript that describes both its audio and visual elements.
Do not use autoplay for videos and audio.
Avoid blinking or flashing content that may trigger seizures.
Use an accessible media player to share your content.
If posting media on a website:
- Place text adjacent to the media describing its purpose, intent, or title.
- Place transcript or link to the transcript under the video.

Live video with audio (multimedia)

For live video, live captioning is required.
Do not use autoplay for videos and audio.
Avoid blinking or flashing content that may trigger seizures.
Use an accessible media player to share your content.
If posting media on a website, place text adjacent to the media describing its purpose, intent, or title

Pre-recorded Video (no audio)

A transcript or an audio track describing the video is necessary.
Do not use autoplay.
Avoid blinking or flashing content that may trigger seizures.
Use an accessible media player to share your content.
If posting media on a website, place text adjacent to the media describing its purpose, intent, or title.

Pre-recorded audio (no video)

A transcript is required.
If posting media on a website:
- Place text adjacent to the media describing its purpose, intent, or title.
- Place transcript or link to the transcript either adjacent to the audio, or in the description.

Live Audio (no video)

Real time transcripts or captions are not required for live audio content (such as live podcasts), as this is a AAA requirement under WCAG. However, if someone requests them as an accommodation, the unit may need to provide them to comply with the Access for Individuals with Disabilities policy. If this accommodation is requested, please reach out to the ADA Digital Accessibility Center (accessibility@osu.edu) for guidance.

Captions

Captions are a synchronized text version of audio and video content. They can be either open (always visible) or closed (toggle on/off). Captions provide equitable access to people who cannot access or process the original audio. This can include people who are deaf, hard of hearing, neurodivergent or listening to audio in poor conditions.

In addition to spoken dialogue, captions must include meaningful non-speech information conveyed through sound, such as:

Sound effects: Briefly describe sound effects in square brackets, e.g., [doorbell ringing], [pants swishing].
Music:
- Briefly describe music in brackets, e.g., [drum solo], [twangy guitar country music].
- If the song is known, include the performer or composer and its title, e.g. [The McCoys singing “Hang on Sloopy”].
- Caption song lyrics with musical note icon (♪) one at the beginning of the lyric and two at the end, e.g. ♪ Oh come let’s sing Ohio’s praise ♪♪
Laughter and Applause: Briefly describe it in square brackets, e.g., [applause], [audience laughs].
Speaker identification: Ensure that every time a new speaker speaks, they are identified by name in parenthesis, e.g., (Brutus Buckeye) Today I will be presenting on….
Location

For more information, see WCAG 2.1 Understanding SC 1.2.2 Captions (Prerecorded) (Level A)

Caption Language

Captions should be in the language of the spoken dialog. If a video contains multiple languages, captions should match the language of the words spoken and non-speech information should be captioned using the predominant language understood by the audience.

If the audio needs to be understood by an audience that speaks a different language, subtitles (see next section) in that language can be provided. This is not generally an accessibility requirement; however, it may be required in some circumstances under other university policies. Units who have questions regarding this topic are encouraged to reach out to the Office of Institutional Equity at equity@osu.edu. Units may also want to review the Spoken Language Accommodations page on the Office of Institutional Equity’s website.

Captions vs. Subtitles

“Captions” and “subtitles” are often used interchangeably, but they each have a distinct meaning.

Caption: a transcription of spoken dialog and non-speech audio primarily intended to provide equal access to deaf and hard of hearing individuals.
Subtitle: a transcription of spoken dialog, sometimes a translation or interpretation from the dialog language to the language of the viewer. Subtitles do not contain non-speech audio.

How to Create Captions

Automatic Captions with Human Review

Video hosting platforms such as Mediasite, Microsoft Stream, Zoom and YouTube have built-in captioning features. These features can significantly reduce the amount of time it takes to create captions. However, they sometimes get dialog incorrect, miss punctuation, and they cannot currently detect speakers, sound effects or music. Because of this, automatic captions must be reviewed and edited by a human to ensure accuracy.

Mediasite

The OSU instance of Mediasite has an AI-powered automated captioning feature built-in that generates very accurate captions of spoken dialog.

To use it:

Go to Mediasite.
Click “Add Presentation”.
Click the “Choose File” button in the Add Video section and select the video file to upload.
Give it a title and description and click the “Create Presentation” button.
Once the video file is uploaded, captions will be generated automatically. Wait for an email with the subject line starting with “Captioning completed for…”
Go back to Mediasite, then click on “My Presentations.”
Click on the presentation.
Click “Edit Captions.”
Delete the automated caption messages (“TO REQUEST ACCESSIBLE CAPTIONS VISIT GO.OSU.EDU/CCHELP” and “CAPTIONS PROVIDED BY AUTOMATED TRANSCRIPTION.”)
Listen to the video and adjust captions as needed. Ensure that you are indicating speakers, sound effects, music, etc.

To download the captions to create transcripts or for use on other platforms, such as YouTube or Microsoft Stream:

Go to “My Presentations.”
Click on the presentation.
Click “Downloads.”
Click “Captions (.dfxp)” or “Transcript (.txt).”

If the video was uploaded to Mediasite only for the automated captioning feature and does not need to be hosted there, we recommend deleting it after you have downloaded the captions.

Note: Microsoft Stream only supports caption files in the .vtt format. Content creators will need to convert the .dfxp to .vtt format (uploading captions to an online converter for certain types of videos may violate the Institutional Data Policy).

Microsoft Stream

Videos that are uploaded to Teams, OneDrive, and SharePoint are available in Microsoft Stream. Microsoft Stream has a built-in automatic captioning feature that currently produces mostly accurate captions of spoken dialog.

See Microsoft’s documentation on creating and editing captions in Microsoft Stream. Once the captions are generated, listen to the video and adjust captions as needed. Ensure that you are indicating speakers, sound effects, music, etc.

YouTube

YouTube can generate automatic captions. However, the captions often do not include punctuation and do not detect spoken dialog as well as the other options on this list, requiring longer manual review and remediation. It is recommended to use a different tool to generate and edit the captions, then upload the captions to the YouTube video.

PopeTech has a great article titled A complete guide for adding captions to YouTube videos that walks users through a step-by step process. Additionally, YouTube Help, provides the following instructional guides:

Zoom Cloud

In Zoom, the meeting organizer needs to enable automated captions. For online events, this is a way to meet the Digital Accessibility Policy in some circumstances with an approved exception (in other situations, a live captioner may be required; see Live Captioning).

If the video is going to be distributed after the meeting or event, the captions must be updated for accuracy. For editing captions, we recommend:

Importing them into Mediasite to generate and edit new captions. These may be more accurate than the captions generated by Zoom.
Editing the automated captions in the Zoom web portal.
Downloading them from Zoom and uploading and editing them in Microsoft Stream.

If the automated captions are not accurate enough to be easily edited, you may consider downloading the MP4 file and hiring a captioning provider.

Professional Captioning

Vendors can create captions and transcripts for you.

For pre-recorded videos, see Hiring a Captioning Supplier.
For live videos, see the Live Captioning page on the ADA Digital Accessibility Center’s website. Hiring a live CART captioning supplier may be required for certain types of events and is required if a participant requests the accommodation.

Pre-recorded vs. Live Captioning

Due to the challenges and limitations of live captioning, slight mistakes are more acceptable than they are in pre-recorded videos. Live captions, unlike pre-recorded video captions, are also typically behind the audio by a few seconds.

Automated live captioning is sometimes sufficient to meet the Digital Accessibility Policy, however, there are many situations where a professional captioner is required (see Live Captioning). This service is called “CART captioning” and is done by a professional using a stenographer machine. The output is displayed during the live presentation.

CART captioning is supported by many tools we use at OSU, including Microsoft Teams and Zoom. The CART captioning providers are typically well acquainted with these tools and the captioning can be easily integrated into your live event. See instructions for hiring a captioning supplier.

If you are sharing a recording from an event that was live captioned, the captions need to be edited for accuracy and be synchronized to the audio track. See above for options to create and edit them.

Presentation Slides and Captioning Placement (Lower Thirds)

Most of the time, captions are positioned near the bottom center of the video. If your video contains text content, such as presentation slides or a lower third identifying an interviewee, the text might be obscured by the captions, preventing individuals who use captions from accessing that information.

Avoid placing any important text in the bottom of the video. Video creators who incorporate lower thirds should consider alternative placement, such as to the side of the speaker.

Transcripts

Transcripts are a text version of audio and/or video content. Transcripts offer an alternative method for blind users or users with vision loss to access the content. Additionally, they can be beneficial for visitors looking for information via search engines.

How to Create Transcripts

There are a number of easy ways to create transcripts! The easiest place to start is to use the captioning file as a starting point. If the captioning file is not available, Microsoft Word has a transcribing feature that allows users to record directly in Word or to upload an audio file. Definitely leverage these tools to avoid manually typing a transcript.

Starting from a Caption File

If you already have captions, you can use that file to create the transcript. Most caption-editing tools provide an option to export a plain text transcript.

Starting without a Caption File

If you don’t have captions, you’ll need to follow a different process: Transcribing Audio to Text. We recommend using the transcribe feature in Microsoft Word to generate transcripts.

Formatting Standards

Do not include timestamps. If your auto-generated file has timestamps, remove them.
Use regular sentence structure and punctuation.
Describe any unspoken (visual, text or audio) information that is contextually important, unless the narration within the video already summarizes it.
Include line breaks at the end of a complete thought or section
Identify speakers. Identify speakers by full name the first time, then last name only on subsequent references.
- If a speaker’s name is not given, refer to them neutrally as Narrator, Interviewer, Speaker, or another term that fits the context of their role in the video.
- Identify a change in speakers by starting a paragraph with the new speaker's name and a colon.
Spell out any text presented on the screen when it is relevant
Do not alter wording based on grammar, do edit out “ums”, etc.
Look for special characters that do not paste correctly from copied content (e.g., quote marks, fractions, etc.)

Where to Place Transcripts

Because transcripts are plain text, there are many ways you can provide them, like a text file, on the webpage with the video/audio, or on a separate webpage. One common pattern is to place them in an accordion/disclosure element.

No matter how you provide them, they need to:

Meet all other accessibility standards.
Be easy to locate – we recommend placing them immediately below the video or audio player.
Be easy to identify – use the word “transcript” or “transcription” in the link or button text.

Audio Descriptions

An audio description is an audio-narrated description that helps blind, low vision users, and users with cognitive disabilities understand what is going on in the video. It provides information about speakers, scene changes, on screen text, and may describe the mood or visual reaction of characters.

The easiest way to do this is to integrate descriptions into the video narration. Here are some examples to illustrate the point:

Instead of saying, “When its time to register for classes, go to this page, click here and then add it here,” say “When its time to register for classes, navigate to the shopping cart page, select the checkbox by the class that you want to register, then select the “enroll in course” button.”
Instead of saying, “Thank you for your time. If you have any questions here is my email,” say, “Thank you for your time. If you have any questions, my email is Brutus.Buckeye@osu.edu”.

Additional guidance on writing audio descriptions can be found at the DCMP Description Key.

If an audio description needs added, videos can be edited to insert the descriptions between speaker breaks. If there is not enough time between the speaker breaks, either the video needs edited to increase pauses to allow the description to fit or a media player that supports audio description breaks, such as Able Player, can be used.

Unfortunately, audio descriptions are not yet well supported by YouTube. There are workarounds, see PopeTech’s article on how to create audio descriptions for accessible YouTube videos. We highly recommend people use Microsoft Stream or Mediasite whenever possible if their videos need to support audio description tracks until YouTube fully supports audio descriptions.

Resources

DCMP Captioning Key – captioning guidelines published in 1994 by the Described and Captioned Media Program (DCMP). A reference for captioning for entertainment and educational media, distributed internationally and translated in multiple languages.

DCMP Description Key – audio descriptions developed through a partnership with by the Described and Captioned Media Program (DCMP) and the American Foundation for the Blind in 2006.

WCAG 2.1 Understanding SC 1.2 Time-based Media

WCAG 2.1 Audio Description or Media Alternative 1.2.3 (Prerecorded)

The W3c Guide on making transcripts 

Credit 

The OSU Digital Accessibility team would like to thank the following for sharing some of their digital accessibility best practices and guidance: OSU Engineering Technology Services, OSU Wexner Medical Center - Marketing and Strategic Communications, IT Accessibility at the University of Michigan, and University of Arkansas Explore Access.