Windows Azure
Customer Solution Case Study
/ State of Georgia Makes Audio and Video Recordings Accessible and Searchable Online

“The Georgia Archives exists to help serve the state’s residents, legislators, and government officials, and we now have a new tool that enables anyone to watch government at work and explore areas of interest.”

David Carmicheal, Director, TheGeorgia Archives

The Georgia Archives sought a way to make audio and video recordings easily accessible and searchable on the web. The Archives chose a solution based on the Microsoft Research Audio Video Indexing System (MAVIS) and Windows Azure. It now enjoys improved productivity and faster access to audio and video content for citizens, legislators, and other interested parties, and less work for the Georgia Archives—all with minimal costs and no IT issues.

This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
Document published March 2012


Business Needs

The Georgia Archives collect the state’s permanent records and makes them accessible to all. It has well-established processes for providing access to paper records, which are relatively easy to digitize and search electronically. But it faced greater challenges for audio and video recordings, for which it lacked the means to make broadly and efficiently accessible.

David Carmicheal, Director of the Georgia Archives, cites recordings of legislative sessions as a good example. “These recordings are typically hours in length,and the only information available about them is the recording date, which means that people can’t readily find content within a recording unless they listen to it—that is, after requesting a copy and waiting about a week for a copy to be made,” says Carmicheal.“Requests are also labor-intensive for Archives staff, requiring anywhere from two to eight hours to make a physical copy of an older, analog recording.”

The Archives briefly tried using speech recognition software to convert recorded audio to searchable text. However, the results were deemed too inaccurate, especially when working with strong accents and poor-quality recordings.

Solution

A conversation Carmicheal had with Microsoft led the Georgia Archives to the Microsoft Research Audio Video Indexing System (MAVIS). Here’s how MAVIS works and why it can be more accurate (and ultimately more useful) than other solutions that also use large-vocabulary continuous speech recognition (LVCSR) to convert audio to text, so that it can be searched:

Typical LVCSR systems have a preconfigured vocabulary, which makes them susceptible to inaccuracies due to factors such as accents and out-of-vocabulary terms, as may be the case with proper names. MAVIS helps overcome these challenges by using the Bing search engine to get more information about the content, which it then uses to expand its base vocabulary. MAVIS also preserves the confidence with which a word is recognized and which other potential matches were considered—a technique pioneered by Microsoft Research called Probabilistic Word-Lattice Indexing—and preserves time stamps to support direct navigation to keyword matches.

The Georgia Archives initially tested MAVIS with 100 hours of recordings. Because of the compute-intensive nature of MAVIS processing, these initial tests were performed on servers managed by Microsoft Research. By the time Carmicheal was ready to test another 500 hours of content, Microsoft Research had MAVIS running on Windows Azure, as a way to make it easy to adopt without having to invest in server infrastructure and easier to scale based on workload. “I was impressed by the accuracy of MAVIS, and equally impressed by how quickly and inexpensively we could put it to work for us on Windows Azure,” says Carmicheal.

In May 2011, the Georgia Archives launched a site that enables users to search four years of recordings from the Georgia General Assembly. Legislators can use it to research why a bill did or did not pass, and citizens can use it to gain insight into the arguments for or against a bill—including the ability to hear the emotional charge of discussions on a topic.

Microsoft has since enlisted solution provider GreenButton—named Windows Azure Partner of the Year in 2011—to help early adopters such as the Georgia Archives to continue to use MAVIS and to make it commercially available to other organizations. Carmicheal is evaluating a proposal from GreenButton for a turn-key approach, in which recordings of all legislative sessions will be uploaded to a website hosted on Windows Azure. Indexed recordings will be live and searchable within 24 hours, so that anyone can hear for themselves exactly what Georgia legislators are saying. “MAVIS works great and the price is very reason-able,” says Carmicheal. “Were we to have audio and video recordings transcribed, it would cost at least ten times as much.”

Benefits

By using Microsoft technology, the Georgia Archives has made its wealth of audio and video recordings easily accessible to all. Specific benefits include:

  • Improved productivity. People no longer need to wait up to a week for the Georgia Archives to duplicate a recording, nor do they need to listen to the entire recording to determine if it contains what they need. Instead, a single search shows hits across all recordings, including text snippets that show the search terms in context. Clicking on a snippet immediately takes the user directly to that portion of the recording.
  • Faster access to content. Under the proposal from GreenButton, uploaded content will be live and searchable within 24 hours. GreenButton can make this commitment because of the immediate scalability provided by Windows Azure, which makes it possible to immediately devote as many servers as needed to the compute-intensive MAVIS processing algorithms.
  • Less work. Making audio and video recordings accessible and searchable online reduces the workload for the Georgia Archives, which will no longer need to spend as many as eight hours servicing a request for a copy of a recording.
  • Minimal costs and no IT issues.The Georgia Archives did not need to acquire servers, nor does it have to worry about system administration or backups. Similarly, as more content is added, the Archives will not need to worry about scalability or additional disk space.

“We have some really good information in our audio and video archives, but until now, it was too difficult for people to find it,” concludes Carmicheal. “The Georgia Archives exist to help serve the state’s residents, legislators, and government officials, and we now have a new tool that enables anyone to watch government at work and explore areas of interest.”

This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
Document published March 2012