5 THINGS: on Nexidia Dialogue Search
- What is Dialogue Search?
- Where is Dialogue Search used?
- How does Dialogue Search work?
- How accurate is it?
- How much does it cost?
Today we’re looking at Dialogue Search from Nexidia. Indexing and searching for content, based on audio. Let’s get geek started.
Question 1: What is Dialogue Search?
When not being used by the Government to monitor your phone calls, office conversations, or noises in the bathroom, the underlying technology in Nexidia’s Dialogue Search can be used to index the audio within your sound and video files and create a searchable database of those media files. This gives you, dear viewer, the ability to search on the details of your content. Not just subject keywords, but the actual content – verbatim- within the media. Ooey gooey, Chocolately good Metadata.
This technology can either be used as a standalone searchable database, or, it can be integrated into your NLE du jour or Asset Management system.
Many of you may be familiar with Phrase Find, which was a scaled down version of Dialogue Search, available as a plugin to some versions of Avid Media Composer, or within MediaSite by Sonic Foundry. But we’re getting a little bit ahead of ourselves. Let’s see where Dialogue Search is most beneficial.
Question 2: Where is Dialogue Search used?
Nexidia’s technology has many use cases. But as it pertains to post production, it’s fantastic for logging. Let’s consider Reality Television. Many cameras, sometimes recording from most of the day. This is a massive amount of footage to comb through for that one scathing soundbite. Imagine being able to search across this and instantly find that sound bite you need. Or the components you need to create that epic Frankenbite. It’s a tremendous time saver.
In terms of Metadata, Dialogue Search provides a way to find media in your archives without having to rely on the manual input of search terms. Many manual input terms in asset management cover broad topics or people….not phrases or quotes. Instead of searching by these concepts as most do, you now have a much more granular way of searching content – in only a few seconds, and with little or no manual input of metadata.
The next place it shows great promise is that of closed captioning. Closed captioning is an expensive process, and due to the new FCC regulations, mandatory for many content providers. The technology behind Dialogue Search shows great promise for auto generation of captioning. However, at this point, it is not accurate enough for the strict guidelines for captioning. However, it is accurate enough to be used in Nexidia’s QC product – which can analyze an existing caption file, compare it to the media file, and retime the captioning to match the media file. This drastically reduces the amount of your deliverables that are rejected, as well as again conforming to the FCC regulations.
I know, many of you are wondering about Phrase Find -will it come back as an option for Media Composer? As of v8 of Media Composer, Phrase Find is not available – there are licensing issues that Nexidia and Avid are still sorting out. No time table has been set for a definitive yes or no. Depending on who you talk to on both camps, discussions are either going well, or horrible. Unfortunately we’re just going to have to wait..and see.
Question 3: How does Dialogue Search work?
The current Oxford dictionary lists over 170,000 words in the English language -not including obsolete and derivative words, regional terms and slang. It would be wholly inefficient and quite impossible to maintain an accurate database of words. So the fine PHD level folks at Nexidia broke down the individual sounds that we make as we form words, and used those discreet sounds as a blueprint for all speech – thus making it possible to search for any word or sound – even those which are not found in the dictionary.
It turns out that all American English words are made of up one or more of 40 individual sounds, called Phonemes. Combinations of several of the individual 40 sounds form every word we use. (all of you Brits across the pond will be happy to know you’ve got one more over us Yanks with 41 distinct phonemes…so you’ve got that AND better tea)
Each language has their own individual phonemes. Nexidia has created indexes for several languages for Dialogue Search, with more in development.
Latin American Spanish
Modern Standard Arabic
Nexidia Dialogue Search – wether it’s direct from Nexidia, or part of the Phrase Find Add On Component inside some versions of Media Composer, indexes media files and creates a database of all of the individual sounds. This allows the user to search for any word they can think of, and Nexidia will match up the word you’re searching, with the Phonemes that make up the word. Since the media files are indexed as soon as they are added, the amount of time to search for a word happens in seconds.
Dialogue Search not only play the clips at the point in which your word shows up, but it also can export results as XML or AAF so you can populate the results into your NLE. Nexidia also has advanced interoperability with CatDV and a Panel for Adobe Premiere.
Question 4: How accurate is it?
Searching for a word isn’t a yes or no proposition. A poor recording, background noise or slurred speech can certainly skew results. That’s why the Nexidia technology scores and rates the probability that the word you are looking for is actually being spoken. Nexidia also allows the user to define what constitutes an accurate score, and which ones are bunk.
Let’s see an example: [[LIVE DEMO]]
As you can see, in the right environment, it’s fantastic. As the recording fidelity decreases, however, so does the accuracy. It’s also heavily dependent on the speaker.
Those who tend to talk fast (which I am very guilty of) or those who are not speaking in their native tongue can also skew results. However, the way I look at it is that ANY harvested metadata with which to search by is a bonus.
Question 5: How much does it cost?
Like many software technologies nowadays, Nexidia has both Annual and Perpetual licensing models, and all are based on tiered pricing options, consisting of how many “active hours” can be searched as well as how many users -plus annual support. To reduce cost many facilities reduce the amount of active hours that are searchable – for example, you may only want to index the current season of your show – not every season.
First, let’s look at Perpetual Licensing:
|Tier||Media Hours||Seats||Base Price||Annual Support|
All seats are floating as well, and thus based on concurrent users- not the total number of anytime users.
There are also adds on for your unique needs:
|Additional 500 hours||$7,495|
|Additional 10 seats||$4,995|
|Additional user seat||$600|
|Unlimited user seats||$49,995|
|Tier Upgrade Prices||= Upgrade – Current Tier|
Annually, the pricing structure is based on the same methodology:
|Tier||Media Hours||Seats||Base Price||Annual Support|
…and there are add ons, just like the perpetual licensing.
|Additional 500 hours||$2,875|
|Additional 10 seats||$1,995|
|Additional user seat||$240|
|Unlimited user seats||$19,995|
I know, for the average user, it’s a bit expensive. But on the facility level, if we factor in the cost per hour of a logger or editor, over the course of a year, or the hundreds of dollars an hour for transcription services, or the time spent combing through archives for footage, Dialogue Search becomes a cost effective and efficient way to get your projects done quicker.
Have others tools to analyze audio? Or howabout other tools to harvest quality metadata? Ask me in the Comments section. Also, please subscribe and share this nerd goodness with the rest of your nerdy friends.
I plan to be back in 2 weeks with another 5 things. Until the next episode: learn more, do more – thanks for watching.