Advanced TV Standards: Into the Future with Jaunty Air and an Anchor Around our Necks


Donald A. Norman

Abstract

We dream of the information future, one made possible by the convergence of the technologies of entertainment, communications, and computation. The world opens up before us, allowing exciting advances in education, business, entertainment and recreation. Or does it? Much of this new vision requires the coherent integration of television and data, yet the Advanced TV standards seem to have ignored the world of information. The pictures are difficult to process, they carry forward old technologies, they do not provide much room for data, they do not provide for the two-way transmission of data. The standards, rather than allowing the dramatic advance in technology that has been hoped for, seem more like an anchor, holding us back from what is possible, from what is being dreamed.




I am a newcomer to the area of television. At Apple, I head the research group -- The Advanced Technology Group -- and like my colleagues in other information technology companies, I am helping today's computer technology move into the 21st Century, where information and social interaction will be pervasive, simpler, and more responsive to social needs. We are in the midst of an interesting revolution, one that I am sure historians 200 years from now will call one of the more profound technological changes in written history. This revolution is really about social interaction, collaboration, and access to knowledge. It isn't about telephones or computers or television.

Scenarios

One of the standard design exercises in my field is that of the scenario: you imagine some people with as much precision as possible about their gender, age, job, family, everyday problems. Then you try to see how these people would behave with the new technologies under discussion. Scenarios are important because they let us examine how the technology will really work in context from the very first, formative stages of the idea. Let me give you two simplified examples in which I will not elaborate upon the people or their lifestyles, but simply upon the kinds of technology we are envisioning.

Jim, an avid sports fan, is watching a baseball game with Susan and Billy. He's worried about his team, so he decides to check on the batting average of the current batter. He uses his slate controller to call up a menu of options and he selects "Player statistics." A picture and short bio appears in the color window.

Suddenly Barry Bonds is up, and, between pitches, the broadcaster mentions the home run he hit the previous night. Jim quickly calls up a menu that gets him to a file of "Hot Stuff" that the network provides. For only a small charge (added to his monthly bill), Jim retrieves the video of of the home run and saves it on his home server. Some of the fans collect and trade these clips much like trading baseball cards. The ESPN logo is a bit like the logos on baseball cards. Every so often, he splices some of these clips together into his own "highlight video".

Bonds strikes out. Great. It seems like he's been doing that a lot lately, especially against right-handed batters. Susan says she remembers a game earlier this year against the Dodgers when he struck out three times. No one believes her, so Jim checks the library. He brings up the SportsQuery service a commercial service on the Internet and looks for games in the last six weeks between the Giants and Dodgers in which Bonds struck out three times. A few seconds later, SportsQuery returns with several probable matches: newspaper articles from the SF Examiner and the LA Times, and a video clip of Bonds in the game from the archives of one of the San Francisco TV stations. The Examiner article is the cheapest to retrieve 15 cents, billed to the monthly account . Jim requests it and everyone reads it aloud from the screen: with great joy Susan was right.


This scenario was about baseball, but it could just as easily been about gardening, or cooking, or news. It could even be about advertisements. Suppose that advertisers always made available more information about their products. Part of the interesting scenarios that we have developed has advertising playing the role of providing needed product information, so much so that people would request it. Now imagine that, business of the world: consumers who beg you to give more information about your products.

In fact, people generally do want more information about products when they are ready to buy. Advertising is annoying because, in the current mode of television, it interrupts the flow of events and, moreover, is often not relevant. Newspaper and magazine advertisements don't interfere nearly as much because the nature of printed matter is such that advertisements are easy to skip when they are not relevant, and equally easy to concentrate upon when they are. TV doesn't have this property because of the sequential nature of the information flow. As a result, ads are more useful and less annoying in magazines and newspapers than on TV. Some of our scenarios show how this pattern could change to the point that consumers might even pay to see television advertisements.

Now look at another scenario -- school homework:

Jane, a high-school Junior, is watching her favorite Rock Star in concert on the family television screen in the family room. Her parents, Juan and Maria come in the room. Juan turns to Jane: "Have you finished your paper on Mona Lisa?"

Jane makes a face. Juan picks up the control slate and switches the television set to display the World Wide Web. His cable company provides cable access, with a subscription service to a fairly powerful Web "browser." Juan enters "Mona Lisa" on the slate, and the browser returns with several possible places to look. Jane and Maria pick the Louvre Museum in Paris. Up comes the Mona Lisa.

Maria says: "Jane, let's see your paper." Sheepishly, Jane sweeps her finger across the track pad and the screen divides to show the start of the essay: Jane's name, a title and one line of text "The Mona Lisa was painted by Leonardo DaVinci." In addition, the high school class Home Page is also displayed, with a button that, if clicked, will reveal the assignment. Jane carefully ignores it and quickly deletes the High School display from the screen.

Jane looks over the Louvre's text on the Mona Lisa and uses the "copy and paste" command to put a few relevant passages into her paper, all automatically marked as to their source, simplifying Jane's task of documenting her sources (and helping teach the distinction between using referenced quotations and plagiarizing).

By this time Jane has picked up some interest in who Da Vinci might have been, so the family follows some of the other suggestions of the Web browser to view the "Gates DaVinci" collection and to look at a short film in the National Public Television archives. Then the switch to Florence for more videos, paintings and text. Two hours later Jane has the material she needs for her paper (she will finish it later on her computer in her bedroom.). The entire family has traveled, learned and worked together.


The basic scenario here is simple, although a bit optimistic: my teen-age children would never work quite so cooperatively with their parents. I present it here to show the potential of the marriage between TV and the NII for education, in this case both to let the family search for relevant information, but also to let them view retrieved television clips.

Note how both scenarios capture the essence of our vision for future television and information services: effortless traveling back and forth. The basic scenarios, by the way, are possible today on any computer hooked to the World Wide Web. What isn't possible is the smooth integration with commercial television. These scenarios require that viewers be able to direct their searches across the Internet for relevant material, implying low bandwidth data transmission from the home and text on the screen that is easy to read. To make this possible, we need close to the same quality text display now found on GUI-based computers, which is a far cry from what today's TV set can display. And finally, there is the ability to transfer text and images from one document to another.

In both of the scenarios and in any situation where a viewer requests more information, the viewer is learning. This is the ideal situation for education: allowing people to learn because they want to learn, because the information being requested and read is interesting and relevant to their concerns. Imagine when students can immediately look up information on topics that interest them, and then display, read, scan, and print out relevant findings. No longer is access limited to textbooks or even the local library: the library has become international.

I could present another dozen scenarios. The range of possibilities is enormous, from education for elementary school, high school, and college, for self-learning and business education. For home improvement, for travel, hobbies, and entertainment. For work and for play.

Note that we still maintain the distinction between "television" and "computers." We do not believe that either will supplant the other. To us, "television" and "computers" are primarily distinguished by usage. When people watch television, they are in "receive" mode -- receiving more information than they are providing. TV sets will have large screens and will typically be viewed by a group of people from a distance with only limited amount of interaction. Their main function will be entertainment and education. Computers, on the other hand, are much more interactive in function. When people use computers, they will be generating much of the material -- writing letters or essays, drawing, financial activities, or communicating with others. Computers will therefore be smaller than television sets, meant to be used by one or two people at a time rather than the groups that congregate around television sets, with considerable interaction with the material. Technically, however, the television set and the computer will be very similar. Both will have communication ports to the external world, both will have high-resolution display screens, both will have CPU chips and considerable local storage and working memory. But they will look different because they will be constructed for very different usage patterns.

The Advanced Television Standards Process

To those of us in the computer business, these scenarios make great sense. In fact, the only question is when, not if. Most of us have given the HDTV developments only passing interest. Most people don't know what ATV stands for, or what the Grand Alliance is. I did a good deal of learning in getting ready for the talk. Mind you, we know that HDTV/ATV will be important for us, but we assume, therefore, that the natural regulatory process will play itself out and then we will build upon the result. The National Information Infrastructure (NII) and advanced television (ATV) seems like a natural marriage.

Hah. Blind faith is not a good way to proceed.

I look at what is happening and I am amazed. Um, what ever happened to the information revolution? It never got to the Grand Alliance. I think about those scenarios, and I don't see them happening. The standards process, as written, makes it very very difficult. Let me share with you my analysis of why: Basically, it is because of neglect of the requirements of the information world. We assumed that information from the NII -- the world of computers -- could share the screen with information from ATV -- the world of television. Nope.

The great promise of Advanced TV for the NII comes from some of its technological components:

1. Digital signals, that convey picture, sound, text, and data.
2. High resolution screens capable of displaying large amounts of text.
3. Two way interaction between source and viewer.

To my great surprise, none of these seem to be issues in ATV. Instead, the emphasis has been on entertainment and on the commercial model of current broadcast TV, one in which entertainment titles contain little text, and the sending of data, if it is to be done at all, is one-way, separate from the TV service. Two way interaction, if done at all, is very asymmetric, with the viewer perhaps selecting catalog items or shows, perhaps sending back a credit card number and purchase choice. The notion of symmetric interaction, where some viewers (such as a real estate firm, a High School, or even a private individual) might publish its information for others seems neglected.

I am very worried by what I see. The problem is two-fold. First of all, the members of the Grand Alliance, on the whole, are dominated by the television industry, yet we are in the midst of a technological revolution that is merging television with communication with computers. What happened to representatives from those other industries?

Second, the assumptions are all based on the cost model of current broadcasting and production, transmission, and reception facilities. Great care is taken to minimize cost. But cost minimization also means a slackening of quality.

You know, standards such as Advanced Television are going to be with us a very long time. We will never be able to change some of its conventions. Yet what is costly today will be cheap tomorrow. The standards will still be here -- the cost will be gone -- and the American public will be the loser.

Let me discuss my issues separately

Digital signals, that convey picture, sound, text, and data.
There are a number of issues of that are important to the computer industry in order for advanced television to be able to play an important role in the emerging national Information Infrastructure. Technically, they deal with:

In this talk I concentrate only upon the first four items because from the computer graphics point of view, one thing is clear: you have to have a clean, coherent image. This allows wonderful processing effects to take place, enhancing the experience and the understanding by the viewers.

But now, in the interest of economy, the picture is taken apart and shown in segments (half now, the other half later -- interlacing), and each element of the picture is distorted out of shape, wider horizontally than it is deep, vertically -- non-square pixels. Think about that for a moment. Interlacing is a technology whose time has long since past. Interlace is a technology of the 1930s. It was essential to the success of NTSC, for it allowed a possible picture in the days when it was simply not possible to send a cohesive (progressively scanned) picture. But times have changed. We don't need interlacing anymore. Worse, it gets in the way. It degrades the image quality considerably. It makes computer processing and augmentation of the image extremely difficult, and, in this age of digital compression and MPEG-2, it doesn't even save much bandwidth.

Now, I realize that the proposals are for a family of standards, some of which meet our purposes quite well, others of which do not. Thus, interlacing is only present in a few of the formats, the rest are progressive scan. The problem is that the lowest quality images are apt to dominate for reasons of (false) economy. Let interlacing in and it will be difficult or impossible to eradicate. If the baseball game is transmitted with interlaced scan, our scenario fails. It won't help that there is an existing standard that calls for progressive scan if the interlaced scan is allowed and used.

Non-square pixels are an accident. They don't do anyone any good, but they certainly make life difficult for the computer graphics processing. To say nothing of diminishing the quality of text that is displayed on the screen.

Why are these such problems? Just imagine the difficulties. Suppose the TV is displaying a person running from right to left. The odd rows of the interlaced image show the person at a different location than the even rows. If you wanted your TV set to capture the image so you could print it out, you would have to settle for half an image (only the even or the odd lines). If you tried to take advantage of the full resolution of the picture and capture all the lines of the image, it would be a mishmash of images because of the interlacing. Note that if the image had been transmitted by progressive scan, there would be no problem: the entire image would be consistent with itself, and you would see a better quality picture and it would be easier to capture a high quality image, using all of the lines.

Sound complex? It is. It's a mess. For everyday cinema or video, none of this matters. For scientific or business visualization, it makes a big difference.

Interlacing saves money at the camera end, and then a little bit in transmission bandwidth. But it creates problems in production and in the display if there is to be computer enhancement or computer generated imagery added to the picture. With MPEG-2 compression, there is little saving with interlacing because any two adjacent lines of a progressively scanned image are apt to be very similar, so it gets maximum advantage of the compression.

When the computer industry first brought out Graphical User Interfaces about 15 years ago, the clear monitor of choice was the television set, for it was mass produced, inexpensive, and readily available. But this was simply not possible to use commercially available. It simply is not possible to present high-quality text on NTSC (or PAL or SECAM, for that matter).

If you take a look at what has transpired, the computer industry has had to go to progressive scan, ever smaller pixel sizes, larger screens, and faster refresh rates.

The human eye is remarkable sensitive to flicker, especially at the periphery. The human eye is designed to be motion sensitive -- slight motion in the periphery gets your attention. Flicker seems like motion to the neural circuits. This is a special problem with big screens (or to be more precise, with wide viewing angles). Apple soon discovered that people want big screens. As a result, we have had to go to refresh rates greater than 60 and often 70 frames/second.

The film industry recognizes this too. By historical accident, film is taken at the rate of 24 frames/second, but if it is projected that way, especially with the wide screen that is so popular today, it flickers badly -- enough to create headaches and nausea. So the best quality movie projectors break the beam of light for each picture frame twice, meaning that each picture frame is projected three times so that the eye sees 72 flashes per second, thus minimizing brightness flicker.

Television has the added problem that it should really be using high frame rates to avoid brightness flicker, but because it is interlaced, even at these rates, there is interlace flicker when the object being projected moves between the transmission of the two parts of the interlace.

Interlacing is bad. High frame rates are good. Alas, we are stuck with our history, in which there are three major existing TV standards (PAL, SECAM, and NTSC) and one primary commercial motion picture standard: PAL and SECAM use 25 frames/second (interlaced), NTSC uses approximately 30 frames/second (interlaced), and movies use 24 frames/second .

Faster scan rates are needed to capture movement. Rapid display rates are needed to eliminate flicker. The existing rates of 24 (film), 25 (European TV) and 30 (NTSC) frames/second are too slow for both motion capture and flicker prevention.

Our studies show that it is possible to produce a reasonable compromise with 24 frames/second or progressively scanned images, because that allows easy display of 48 or 72 images/second and relatively simple conversion from 25 and 30 frames/second by schemes already in wide use. When large screen displays are used, it is easy to show each frame three times, yielding 72 images/second.

High resolution screens It must be possible to put text on the screen that can actually be read. On my small computer screen, I can read a page of printed text that looks almost as clear and precise as the printed page itself -- not quite as good, but it is getting there rapidly.

One my big, expensive home TV, I'm lucky if I can read 12 lines of text. Movie titles scroll illegibly across the screen. Here is where the number of lines in the picture make a big difference. For data, we need more, not less. But we can deal with numbers such as 640 wide by 480 deep. This is a small screen, but workable if the image that is presented has square pixels, progressive scan, and is unrefreshed frequently enough.

Conclusion: we need at least 640 by 480 pixels, displayed at least 24 frames/second with progressive scan, and refreshed at a 48 or 72 images/second rate. Note that a 640 by 480 image can easily be stretched by a factor of 4/3 to yield a screen aspect ratio of 16:19 or stretched by 3/2 to yield an aspect ratio of 2:1, the movie industry's choice.

Two way interaction between source and viewer Current NTSC is barely capable of transmitting data. Basically, there are those 24 left-over lines in the vertical blanking interval. Not much room

What we need is a standard that allows the transmission of data, if necessary by freezing the image on the screen and then using the normal image transmission period for data. In one frame, we could transmit a lot of data, and the viewer might never note that a single frame had been frozen one extra cycle.

What forms of data? Who knows? That's the whole point about technological revolutions: you don't know what is going to happen.

Therefore the data transmission standard must have a flexible, self defining structure that allows data formats that have not yet been invented to be sent in some future year. The current proposal is not sufficiently robust in its structure to deliver computer code or data with sufficient accuracy to be useful. It needs a layer of error correction.

Once again: the ATV standards are apt to be with us for 50 years, and by then, technology will be very different. NTSC was invented before the notion of digital data, before the computer, before the transistor. The world has changed a lot since then.

The transmission has to allow for two-way interaction because small and large businesses, high schools, elementary schools, colleges, and universities are all going to want to generate and transmit their own information.

We want a world in which one television set provides the image for everyone on the NII.

Standards Are Forever

I haven't been able to discover why we have 50 Hz. power in some part of the world and 60 Hz power here, but that accidental decision dominated the technical choices in our original television standards of NTSC, SECAM, and APL and seems impossible to remove today. Similarly, the use of 24 frames/second in movie theaters is equally arbitrary, and the fact that it is synonymous with neither the 25 nor the 30 frames/second of TV is equally bizarre. Interlacing was a technological hack, no elevated into a world-wide standard. Can we get rid of it?

Standards are forever, because once established, the simplify and dominate the lives of millions, even billions. The "qwerty" keyboard seems forever with us, as does the English system of measurement, at least in the United States. Which is the correct side of the road on which to drive? Right or left? Obviously, it doesn't matter, as long as everyone does the same, but wouldn't the world be better for automobile manufacturers and drivers if everyone had agreed upon the same standard. Think it would be possible to agree upon a single, world-wide standard now? No way.

I am sure each of you has your own experience with old technologies and methods that your industry is forced to maintain because, once upon a time, long ago in the past, it was the standard and today, there are far too many people who depend upon that ancient, antiquated method.

These are examples of standards that stay with us for a long time, causing inconvenience and expense. I worry about the impact of the ATV proposals in this way -- how many old standards are we perpetuating for yet another century?

But what of other standards that lock us in to in appropriate methods, that prevent advances? This is where I really worry. This is where I fear that insufficient thought has been given in ATV process.

Standards based on costs are dangerous. The costs drop, while the standards stay.

Summing Up

We are in the midst of an information revolution in which the fields of entertainment, communications, and computation converge. If things go well, it will be possible to merge the strengths of each field, to create services and experiences not possible today. Many activities stand to benefit -- education, business, personal interaction, fun, entertainment, and recreation. On the other hand, if we do any one of the areas wrong, then the expected convergence will not occur: television, communication, and computation will go their separate ways, co-existing but not co-supporting.

The chance to establish critically important technological standards does not often occur. Today we are at that point. Are we to take the easy way, emphasizing the least common denominator, doing what we can get away with rather that what is best? Will we let the temporary expediency of costs lock deficient standards into place for a large fraction of the next century?

The proposed "Grand Alliance" standards for Advanced TV are very close to being satisfactory for the NII as well as for TV. In fact, to make them work only requires the deletion of some of the alternative formats. We are not asking for radical change. Indeed, we ask for simplification of the family of standards to one that will guarantee compatibility between the world of television and the world of information. Today's standards do not do that because they allow for inappropriate transmission standards in the guise of a temporary efficiency in costs. It is these inappropriate standards that we must eliminate.

I am here to urge you to do the right thing, not the cheap thing. Recognize that the Advanced Television Standards are really the Advanced Information Services Standards that will allow Advanced Television to become a major, central part of the National Information Infrastructure. Intelligent choices can lead us to great societal advances. Standards are forever. Costs are temporary. What is expedient and expensive today will be foolish and inexpensive later. Let us do the right thing.



Donald Norman is Vice President of Advanced Technology at Apple Computer MS 301-4D, 1 Infinite Loop, Cupertino, CA 95014 USA (dnorman@apple.com). This paper was presented at the IEEE-SMPTE Digital System Information Exchange, Washington DC, August 15, 1995. Copyright © 1995 Apple Computer, Inc. All rights reserved. Gary Demos, Jim Burger, and Jim Miller assisted with the preparation of this paper.

When I presented this paper in Washington, someone in the audience objected that I was ignoring the bandwidth requirements, and with moving images, the compression technologies would not save enough bandwidth unless some sort of "subsampling" of the image was used. The best "subsampling mechanism to save bandwidth" is interlacing. This is false. If bandwidth needs to be preserved, there are better ways than by the brute force method of chopping up the image into odd and even lines. Because human sensitivity to detail deteriorates with motion, the best way to preserve bandwidth would be to reduce the transmission of high spatial frequencies for moving images.