Speaker Search and Indexing for Multimedia Databases

dc.contributor.authorSilva, T.
dc.contributor.authorKarunarathne, D.D.
dc.contributor.authorWikramanayake, G.N.
dc.contributor.authorHewagamage, K.P.
dc.contributor.authorDias, G.K.A.
dc.date.accessioned2011-10-04T08:45:02Z
dc.date.available2011-10-04T08:45:02Z
dc.date.issued2011
dc.description.abstractThis paper proposes an approach for indexing a collection of multimedia clips by a speaker in an audio track. A Bayesian Information Criterion (BIC) procedure is used for segmentation and Mel-Frequency Cepstral Coefficients (MFCC) are extracted and sampled as metadata for each segment. Silence detection is also carried out during segmentation. Gaussian Mixture Models (GMM) are trained for each speaker, and an ensemble technique is proposed to reduce errors caused by the probabilistic nature of GMM training. The indexing system utilizes sampled MFCC features as segment metadata and maintains the metadata of the speakers separately, allowing modification or additions to be done independently. The system achieves a True Miss Rate (TMR) of around 20% and a False Alarm Rate (FAR) of around 10% for segments between 15 and 25 seconds in length with performance decreasing with reduction in segment size.en_US
dc.identifier.urihttp://archive.cmb.ac.lk/handle/70130/102
dc.language.isoenen_US
dc.publisherInfotel Lanka Societyen_US
dc.subjectMultimedia Databases,en_US
dc.subjectSpeaker Searchen_US
dc.titleSpeaker Search and Indexing for Multimedia Databasesen_US
dc.typeResearch paperen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
abstract6.txt
Size:
974 B
Format:
Plain Text

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: