The International Arab Journal of Information Technology (IAJIT)


Environment Recognition for Digital Audio Forensics Using MPEG-7 and Mel Cepstral Features

 Environment  recognition  from  digital  audio  for  fore nsics  application  is  a  growing  area  of  interest.  However,  compared  to  other  branches  of  audio  forensics,  it  i s  a  less  researched  one.  Especially  less  attention has  been  given  to  detect  environment  from  files  where  foreground  speech  is  p resent,  which  is  a  forensics  scenario.  In  this  paper,  we  perform  several  experiments  focusing  on  the  problems  of  environment   recognition  from  audio  particularly  for  forensics application.  Experimental  results  show  that  the  task  is  easier  w hen  audio  files  contain  only  environmental  sound  th an  when  they  contain  both  foreground  speech  and  background  environment.  We  propose  a  full  set  of  MPEG-7  audio  features  comb ined  with  Mel  Frequency  Cepstral  Coefficients  (MFCCs)  to  improve  the  accuracy.  In  the  experiments,  the  proposed  approach  significantly  increases the recognition accuracy of environment s ound even in the presence of high amount of foregro und human speech.   

[1] AES AES43-2000: AES Standard for Forensics Purposes-Criteria for the Authentication of Analog Audio Tape Recordings, Journal of the Audio Engineering Society , vol. 48, no. 3, pp. 204-214, 2000.

[2] Broeders A., Forensics Speech and Audio Analysis: the State of the Art in 2000 AD, in Proceedings of Actas Del I Congreso de la Sociedad Espanola de Acustica Forense , Spain, pp. 13-24, 2000.

[3] Campbell P., Shen W., Campbell M., Schwartz R., Bonastre F., and Matrouf D., Forensics Speaker Recognition: A Need for Caution, IEEE Signal Processing Magazine , vol. 26, no. 2, pp. 95-103, 2009.

[4] Campbell W., Brady K., Campbell J., Reynolds D., and Granville R., Understanding Scores in Forensics Speaker Recognition, in Proceedings of Speaker Recognition Workshop , San Juan, pp. 1-8, 2006.

[5] Champod C. and Meuwly D., The Inference of Identity in Forensics Speaker Recognition, Speech Communication , vol. 31, no. 2-3, pp. 193- 203, 2000.

[6] Delp E., Memon N., and Wu M., Digital Forensics, IEEE Signal Processing Magazine , vol. 3, no. 1, pp. 14-15, 2009.

[7] Duda R., Hart P., and Stork D., Pattern Classification , 2 nd Edition, John Wiley & Sons, NY, 2001.

[8] Eronen J., Peltonen T., Tuomi T., Klapuri P., Fagerlund S., Sorsa T., Lorho G., and Huopaniemi J., Audio-Based Context Recognition, IEEE Transactions Audio, Speech and Language Processing , vol. 14, no. 1, pp. 321-329, 2006.

[9] Kraetzer C., Oermann A., Dittmann J., and Lang A., Digital Audio Forensics: A First Practical Evaluation on Microphone and Environmental Classification, in Proceedings of ACM Multi Media and Security , USA, pp. 63-73, 2007.

[10] Ma L., Smith D., and Milner B., Context Awareness Using Environmental Noise Classification, in Proceedings of 8 th European Conference on Speech Communication and Technology , Switzerland, pp. 2237-2240, 2003.

[11] Maher C., Audio Enhancement Using Nonlinear Time-Frequency Filtering, in Proceedings of 26 th Conference , Audio Forensics in the Digital Age , Denver, pp. 104-112, 2005.

[12] Malkin G. and Waibel A., Classifying User Environment for Mobile Applications Using Linear Autoencoding of Ambient Audio, in Proceedings of IEEE Acoustics , Speech , and Signal Processing , USA, pp. 509-512, 2005.

[13] Mallat S. and Zhang Z., Matching Pursuits with Time-Frequency Dictionaries, IEEE Transactions Signal Processing , vol. 41, no. 12, pp. 3397-3415, 1993.

[14] Musialik C. and Hatje U., Frequency-Domain Processors for Efficient Removal of Noise and Unwanted Audio Events, in Proceedings of 26 th Conference , Audio Forensics in the Digital Age , Denver, pp. 65-77, 2005.

[15] Ntalampira S., Potamitis N., and Fakotakis N., Automatic Recognition of Urban Environmental Sounds Events, in Proceedings of Workshop on Cognitive Information Processing , Greece, pp. 110-113, 2008.

[16] Rabiner L. and Juang B., Fundamentals of Speech Recognition , Prentice Hall, USA, 1993.

[17] Selina C., Narayanan S., and Kuo J., Environmental Sound Recognition Using MP- Based Features, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing , Las Vegas, pp. 1-4, 2008.

[18] Selina C., Narayanan S., Kuo J., and Mataric M., Where am I? Scene Recognition for Mobile Robots Using Audio Features, in Proceedings of IEEE International Conference on Multimedia Expo , Ontario, pp. 885-888, 2006.

[19] TU-Berlin MPEG-7 Audio Analyzer, available at:, last visited 2004.

[20] Wang C., Wang F., He K., and Hsu C., Environmental Sound Classification Using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor, in Proceedings of IEEE International Conference on Neural Networks , Vancouver, pp. 1731-1735, 2006.

[21] Yassine B., Mona D., and Paolo R., Using Language Independent and Language Specific Features to Enhance Arabic Named Entity Recognition, The International Arab Journal for Information Technology , vol. 6, no. 5, pp. 464- 472, 2009.

[22] Zeng Z., Li X., Ma X., and Ji Q., Adaptive Context Recognition Based on Audio Signal, in Proceedings of 19 th International Conference on Pattern Recognition , Tampa, pp. 1-4, 2008. 50 The International Arab Journal of Information Te chnology, Vol. 10, No. 1, January 2013 Ghulam Muhammad received his BSC degree in computer science and engineering in 1997 from Bangladesh University of Engineering and Technology, and ME and PhD degrees in 2003 and 2006, respectively, from Toyohashi University of Technology, Japan. After serving as a Japan Society for the Promotion of Science (JSPS) fellow, he joined as a faculty member in the Colleg e of Computer and Information Sciences at King Saud University, Saudi Arabia. His research interests in clude automatic speech recognition, signal processing, an d multimedia forensics. Khaled Alghathbar He received his PhD in Information Technology from George Mason University, USA. PhD, CISSP, CISM, PMP, BS7799 Lead Auditor, is an associate professor and the director of the Centre of Excellence in Information Assurance in King Saud University, Saud i Arabia. He is a security advisor for several govern ment agencies. His main research interests is in informa tion security management, policies, biometrics and desig n.