JUMP START HTML5 (2014)
Chapter 10 Mutimedia: The track Element
In the last chapter, we discussed how to make your media available to more users with cross-browser techniques. In this chapter, we'll look at how to make it accessible and "index-able", too, with the track element and WebVTT.
HTML5 multimedia comes with three challenges:
1. Lack of accessibility.
2. Lack of "index-ability".
3. Language barriers between the media and the viewer or listener.
Accessibility simply means supporting users with impairments or disabilities as fully as possible. Audible media isn't usable to hearing-impaired users. Visually-impaired users don't see visual media in the same way that fully-sighted users do. HTML5 audio and video present clear challenges for these users.
What's more, search engine software struggles to correctly index binary data, such as audio, video, and images. Even Google's Image Search relies on file names, alt attributes, and surrounding text rather than actual file indexing. By themselves, audio and video files are a bit of an information black hole. Data goes in, but is often difficult to extract and use.
Language barriers are another challenge of HTML5 multimedia. The viewer or listener may not understand, let alone be fluent in the language of the media played. Subtitles within the video can help. However, they aren't readily swappable while editing, viewing, or listening to media. And they too suffer from the indexing problem.
HTML5 defines a track element as a way to solve these problems. With it, we can add timed, text-based alternatives—such as subtitles, captions, and metadata—to our media files. In this chapter, we'll look at this element, and a syntax for captions and subtitles known as WebVTT.
The State of track Support
Before we begin, however, let's talk about browser support. The <track> element is at least partly supported by Internet Explorer 10+, Chrome 16+, Safari 6+, and Opera 16+. However, its utility is limited in some of those browsers.
Safari 6.0.5 for OS X doesn't actually make captions or subtitles available to the user, though the scripting interface is partially available. Captions are visible, however, in Safari 7. (In iOS 7, they're only available when the video is in full-screen mode.)
Opera 16+ for desktop supports track, but Opera for Android does not. On Android, Opera passes video handling off to the operating system's software. Rather than playing video in the browser, it launches an external application, making captions irrelevant. Opera's new Coast browser for iOS behaves similarly, using that platform's built-in video handling.
Firefox support for track is still in progress. Full support isn't yet available. However, partial support is available in the latest nightlies (Firefox 27.0a1 and higher). Enable it by typing about:config in the address bar, and changing the media.webvtt.enabled setting to true.
To provide captioning in browsers that lack support, take a look at Captionator.js and MediaElementJS.
Captions, Subtitles, and audio
Most browsers don't fully support the track element when used with the audio element. There are no subtitles, no captions, and no menu for them. For audio files you currently have two options:
1. Include a text transcript in the same HTML document.
2. Use a video element instead of an audio element.
Since an audio file lacks intrinsic dimensions, the video element's will render at its default 300x150-pixel size.
Adding the track Element
To use the track element, place it between the opening <video> tag and closing </video> tag, as shown below.
<video src="/path/to/media.file" controls>
<track src="/path/to/tracktext.vtt" srclang="en">
</video>
If you're using source elements instead, place your track tag or tags after your source tags.
<video src="/path/to/media.file" controls>
<source src="/path/to/media.m4v" kind="video/mp4">
<source src="/path/to/media.webm" kind="video/webm">
<track src="/path/to/tracktext.vtt" srclang="en">
</video>
You've probably noticed that our track tag contains an src attribute. The value of src must be the URL of a text file containing the alternative version. In theory, this could be any captioning file format that the browser supports. In practice, it should be a WebVTT file. WebVTT enjoys support in every browser that supports the track element. TTML is an alternative captioning syntax, but so far, only Internet Explorer supports it.
A little later in this chapter, we'll discuss WebVTT's captioning syntax and how to use it.
Specifying Subtitles, Captions, and Metadata
In our examples above, we've left out the kind attribute. It's optional, but recommended. The kind attribute tells our browser the function of each text track and guides how it will be displayed.
The value of kind may be one of the following values:
· subtitles: used to provide a transcription, or translation of dialogue
· captions: used for transcription and translation, but also used to provide descriptions specifically for hearing-impaired users
· descriptions: used to describe the video component in cases where it's unavailable, or the user is visually impaired; Synthesized as audio
· chapters: used for navigating the resource, similar to what you might find on a DVD menu
· metadata: data about the video that's intended for script or machine consumption
Of these five types, all but metadata are revealed to the user. Tracks of the descriptions kind are synthesized as audio, and will be heard rather than seen by the user. Both subtitles and captions are overlaid on the video file. For captions, the user interface may include a closed caption button that lets the user toggle captions on and off (Figure 10.1).
Figure 10.1. An example of a video with captions enabled in Chrome 32. Video still from "Sita Sings the Blues" by Nina Paley, (sitasingstheblues.com)
Chapters are intended to be displayed as an interactive list in the user agent's interface. To date, however, no major browser fully supports this feature. Metadata tracks provide information about the media file or time range, and aren't displayed to the user at all.
If you don't set the kind attribute, your track will be treated as a subtitles track. That's the default state for the element. When the type is subtitles—whether explicitly, or implicitly—you must include a srclang attribute. Without the srclang attribute, captions will not work.
Using Multiple track Elements
Though it's perfectly valid to use multiple track elements, doing so is not perfectly supported in all browsers. To date, Internet Explorer 10+ and Safari 7 are the only browsers that support multiple track elements. Both browsers provide the user with a menu that allows him or her to select which text track he or she would like to use.
Figure 10.2. The closed caption button in Chrome, when captions are turned on (top) and turned off
When multiple track elements are present, Chrome and Opera will use the first track element listed. Rather than provide a menu of track elements, Chrome and Opera include a CC button (for "Closed Captioning") that toggles the current set of captions or subtitles on and off (Figure 10.2).
Specifying the Language of Your Text Tracks
You'll also want to include a srclang attribute with your <track> tag. The value of this attribute must be a valid BCP 47 language code. Usually these codes are two letters, such as en for English, or de for German. But they could also include a country or region code such as fr-CA for Canadian French, or en-GB for British English.
There are loads of language and country or region codes—too many to list here. Commit the ones you use most to memory. Should you need other language and region codes, the best place to find them is Richard Ishida's Language Subtag Lookup tool.
Without the srclang attribute, subtitles will not work (captions and descriptions should). When present, some user agents allow the user to choose between tracks.
Figure 10.3. Selecting a language for subtitles or captions in Safari for iOS 7. Video still from "Sita Sings the Blues" by Nina Paley (sitasingstheblues.com)
For example, Safari for iOS 7 offers the user a captions menu when the video is viewed in full screen mode (Figure 10.3, above). It uses the value of srclang to set the language option in the menu.
Figure 10.4. By default, tracks in Internet Explorer are untitled. Video still from "Sita Sings the Blues" by Nina Paley (sitasingstheblues.com)
This is not the case for Internet Explorer, however. In Internet Explorer 10+ (Figure 10.4, above), tracks listed in the Closed Captions menu are untitled by default. To fix this, we need to add the label attribute to each of our tracks. As of publication, other browsers do not support multipletrack elements.
Labeling Your Tracks
The label attribute is self-explanatory. It defines a name or title that the browser can use when exposing the track to the user.
<video src="/path/to/media.file" controls>
<track src="/path/to/en-us.vtt" srclang="en-US"
↵label="English - USA">
<track src="/path/to/fr.vtt" srclang="fr"
↵label="Français">
<track src="/path/to/es-MX.vtt" srclang="es-mx"
↵label="Español - México">
</video>
Without the label attribute, Internet Explorer gives track elements unhelpful default titles, as we saw in Figure 3. When included, Internet Explorer will instead use the value of the label attribute in the caption selection menu (Figure 10.5).
Figure 10.5. When tracks have a label attribute, the label value becomes the name of the caption option. Video still from "Sita Sings the Blues" by Nina Paley (sitasingstheblues.com)
Safari handles labels a bit differently. Each label in Safari includes the label and the language in its captions and subtitles menu. To date, neither Firefox, Chrome, nor Opera make multiple tracks available to the user.
Figure 10.6. Safari includes both the label and the language.
When selecting labels for your tracks, keep the following in mind:
· Avoid duplicating labels. It's confusing for the user.
· Labels should describe the content and/or purpose of the track (for example: "English - Fully captioned").
· Labels are not a substitute for a srclang attribute.
Creating Text Tracks With WebVTT
WebVTT is the most widely-supported text format for HTML5 captioning. Both Chrome and Internet Explorer support it, and World Wide Web consortium members are working to standardize the syntax for all browsers. We'll show a few examples here. For a fuller discussion of WebVTT, consult the WebVTT draft specification.
What is WebVTT?
WebVTT is a simple, structured text format used to provide metadata, subtitles, and captions for web-based audio and video. Though technically plain text, WebVTT files must be served with a Content-type: text/vtt response header. Sending a response header does require adjusting your server configuration. Consult your server documentation or your web host's support team to learn how to do this.
WebVTT files must also use UTF-8 encoding. Check the settings or preferences for your text editor. It should use UTF-8 character encoding, and Unix/Linux line endings. (This should actually be your default since it's also required of HTML5.) You may also wish to adjust your server configuration to include a charset parameter with the Content-type header. This will make the full response header Content-type: text/vtt; charset=UTF-8. Again, consult your server documentation or your web hosting support team.
Creating a Simple WebVTT File
The first line of a WebVTT file must begin with the string WEBVTT. You may see examples online that start with WEBVTT FILE. The WEBVTT part is what's important here. A blank line must separate this string from the rest of your file.
WebVTT files consist of a series of cues. Each cue is made of a time range and a subtitle or caption. Times should use an hh:mm:ss.mls format, where hh is the number of hours, mm is minutes, ss is seconds, and mls are the number of milliseconds. For a cue that should appear on screen at 1 hour, 33 minutes, and 58.3 seconds, you'd use 01:33:58.300.
You may leave off a leading zero for hours, but only for hours. This means our previous cue could also be written as 1:33:58.300. Minutes and seconds, however, must be expressed using two digits. Milliseconds must use three, zero padding if necessary.
Start and end times for each cue must be separated by, -->. If, for example, your cue should start 30 seconds into your media file, and end 45.4 seconds into it, you'd note that like so.
00:00:30.000 --> 00:00:45.400
Every time range is then followed by the text that makes up the cue. The simplest cue is plain text, and cues can break across multiple lines, as shown below. (Dialogue from the movie "Sita Sings the Blues," sitasingstheblues.com).
WEBVTT
0:06:57.200 --> 0:07:01.000
[Music]
0:07:02.500 --> 0:07:08.000
When? I don't remember what year. There's no year.
How do you know there's a year for that?
0:07:08.500 --> 0:07:10.000
I think they say the 14th century.
0:07:11.500 --> 0:07:13.000
The 14th century was recently.
I know but ...
0:07:13.500 --> 0:07:16.500
That's when the Moguls were ruling in India
The 11th then
0:07:17.500 --> 0:07:22.000
It's definitely B.C. It's B.C. for sure.
And I think it's Ayodhya.
Notice the blank lines between each cue? They're required. This is how the browser determines where one cue ends and the next begins.
WebVTT Cue Spans
WebVTT supports a set of HTML5-like tags known as cue spans that offer simple formatting for subtitles and captions.
· c or cue class span
· i or italics span
· b or bold span
· u or underline span
· ruby and rt or ruby and ruby text spans
· v or voice cue span
· lang or language cue tag
Much like HTML tags, each cue span start tag begins with a < and ends with >. Ending tags begin with </ and end with >. Let's take our dialogue from above and add some voice cue span tags.
WEBVTT
0:06:57.200 --> 0:07:01.000
<i.music>[Music]</i>
0:07:02.500 --> 0:07:08.000
<v Man1>When? I don't remember what year. There's no year.
How do you know there's a year for that?</v>
0:07:08.500 --> 0:07:10.000
<v Woman>I think they say the 14th century.</v>
0:07:11.500 --> 0:07:13.000
<v Man1>The 14th century was recently</v>
<v Woman>I know but ...</v>
0:07:13.500 --> 0:07:16.500
<v Man1>-That's when the Moguls were ruling in India</v>
<v Woman>- The 11th then</v>
0:07:17.500 --> 0:07:22.000
<v Man2>It's definitely B.C. It's B.C. for sure.
And I think it's Ayodhya.</v>
Here, each character is denoted by a <v> tag. The name of each character—here Man1, Woman, and Man2—are also part of the tag. It's an attribute of sorts. We've also added an italic cue span to our first cue.
Tip: Validate Your WebVTT
Poorly-written WebVTT will keep captions and subtitles from working. Validate your WebVTT syntax using Anne Van Kesteren's Live WebVTT Validator.
Voice cue span tags don't change the appearance of each subtitle by themselves. But they do add semantic data and become hooks for CSS, as we'll see in the next section.
Styling Subtitles and Captions with the ::cue Pseudo-element
Figure 10.7. Un-styled captions in Internet Explorer 11. Video still from "Sita Sings the Blues" by Nina Paley (sitasingstheblues.com)
Captions and subtitles are displayed in most browsers as white, sans-serif text. Internet Explorer (Figure 10.7) adds a text shadow for readability. Chrome, Safari, and Opera use a black background. They're always placed at the bottom of the video, and centered on screen.
Though WebVTT offers a syntax for caption alignment and placement, browser vendors haven't yet implemented it. For now, we have limited control over how native captions look and where they are placed on screen.
We can, however, adjust the appearance of captions and subtitles with CSS and the ::cue pseudo-element (currently only supported in Chrome, Safari 7, and Opera 16+).
Not all CSS properties can be used to style WebVTT captions and subtitles. Only a subset of properties are supported, largely related to text and color.
· color
· opacity
· visibility
· text-decoration
· text-outline
· text-shadow
· background properties such as background-color
· outline properties such as outline-color
· font properties such as font-family, but also line-height and white-space
Important: Change Caption Colors With Caution!
Ensure that your color selections create enough contrast by using a contrast ratio calculator.
Let's look at an example of styling the ::cue pseudo-element. Here ::cue is our selector, and we're using the font short hand property to make our caption text bold.
::cue {
font: bold 18px / 1.5 sans-serif;
}
You can see the effect in Figure 10.8.
Figure 10.8. Applying bold text styling to caption text. Image from "Sita Sings the Blues" sitasingstheblues.com
That's not all we can do, however. The ::cue pseudo-element also resembles a function, and accepts a single argument. This argument must be one or more CSS selectors, such as a class name or element. Recall the markup we used in our snippet of dialogue from above.
0:07:11.500 --> 0:07:13.000
<v Man1>The 14th century was recently</v>
<v Woman>I know but ...</v>
We can use these tags with ::cue to create more specific CSS selectors. For example, we may want to use different colored text for each character.
::cue(v[voice=Man1]) {
color:#9f0;
background: rgba(0,0,0,.8);
}
::cue(v[voice=Woman]) {
color:#ece;
background: rgba(0,0,0,.8);
}
Here we've combined an element selector (v) and an attribute selector (voice) to assign each character a different color (Figure 10.9).
Figure 10.9. Caption text in which each character's lines are in a different color, as shown in Chrome 30
Perhaps we want to visually convey the character's volume or tone. We might add a class using dot syntax as shown below.
0:07:11.500 --> 0:07:13.000
<v.softly Man1>The 14th century was recently</v>
<v Woman>I know but ...</v>
Then in our CSS, we would pass the class name as our argument to ::cue.
::cue(.softly) {
font-size: .9em;
}
We've just scratched the surface of what you can do with track and WebVTT here. HTML5Rocks has an excellent tutorial on the basics of both, as well as some neat tricks achievable with the TextTrack scripting API.
What We've Learned
In this chapter, we looked at how to increase the accessibility and findability of our media files. In our next chapter, we'll take a look combining markup and JavaScript to create a media player.