Abstract
Three different content-based video indexing microservices dedicated to index video shots for the needs of the IMCOP Content Discovery Platform are presented in the paper. These three services as well as numerous others cooperate with each other within the IMCOP platform to describe, enrich and relate the multimedia data regarding their audio, textual and visual content. Owing to the analysis they perform, the IMCOP platform can discover, recommend and deliver the personalized multimedia content to various IMCOP’s prospective recipients.
As these recipients may also require the personalized video content, services, as e.g. the presented ones, designed respectively to discriminate between characters in videos as well as text- and speech-based indexing of video shots, are absolutely essential. Goals of these services, their approaches and how they comply with objectives of the IMCOP’s microservices architecture are carefully presented in the paper. Research procedures and the results of examinations that have been carried out to verify their pretty high accuracies are also reported and discussed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
The flowchart presented in Figs. 3, 4, 6 and 8 have been drawn under inspiration of the Fuji Xerox Video Indexing Technology website: https://www.fujixerox.com/eng/company/technology/production/multimedia/talkminer.html.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Baran, R., Dziech, A., Zeja, A.: A capable multimedia content discovery platform based on visual content analysis and intelligent data enrichment. Multimed. Tools Appl., 1–15 (2017). https://doi.org/10.1007/s11042-017-5014-1
Wolff, E.: Microservices: Flexible Software Architectures. Addison-Wesley, Boston (2016)
Baran, R., Zeja, A.: The IMCOP system for data enrichment and content discovery and delivery. In: Proceedings of the 2015 International Conference on Computational Science and Computational Intelligence, Las Vegas, USA, pp. 143–146 (2015)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Bloehdorn, S., et al.: Semantic annotation of images and videos for multimedia analysis. In: Gómez-Pérez, A., Euzenat J. (eds.) The Semantic Web: Research and Applications. ESWC 2005. LNCS, vol. 3532, pp. 592–607. Springer, Heidelberg (2005)
Budnik, M., et al.: Learned features versus engineered features for semantic video indexing. In: 13th International Workshop on Content-Based Multimedia Indexing, Prague, pp. 1–6 (2015)
Leszczuk, M., Grega, M.: Prototype software for video summary of bronchoscopy procedures with the use of mechanisms designed to identify, index and search. In: Piȩtka, E., Kawa, J. (eds.) Information Technologies in Biomedicine. Advances in Intelligent and Soft Computing, vol. 69, pp. 587–598. Springer, Heidelberg (2010)
Grega, M., et al.: Multimed. Tools Appl. 68(1), 95–110 (2014)
Zhang, H.J., Wu, J., Zhong, D., Smoliar, S.W.: An integrated system for content-based video retrieval and browsing. Pattern Recognit. 30(4), 643–658 (1997)
Leszczuk, M., et al.: Video summarization framework for newscasts and reports – work in progress. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2017, CCIS, vol. 785, pp. 86–97. Springer, Cham (2017)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 International Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, vol. 1, pp. 511–518. IEEE (2001)
Baran, R., et al.: Face recognition for movie character and actor discrimination based on similarity scores. In: Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1333–1338. IEEE, Las Vegas (2016)
Rublee, E., et al.: ORB: an efficient alternative to SIFT or SURF. In: 13th International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE, Barcelona (2011)
Chen, S.S., et al.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of the 18th International Conference on Image Processing, Brussels, pp. 2609–2612. IEEE (2011)
Baran, R., Partila, P., Wilk, R.: Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. In: Karwowski, W., Ahram, T. (eds.) IHSI 2018, AISC, vol. 722, pp. 42–48. Springer, Cham (2018)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the 2012 International Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 3538–3545. IEEE (2012)
Povey, D., Ghoshal, A., Boulianne, G., et al.: The Kaldi speech recognition toolkit. In: Proceedings of the Workshop on Automatic Speech Recognition and Understanding. IEEE, Big Island (2011)
O’Shaughnesssy, D.: Invited paper: automatic speech recognition: history, methods and challenges. Pattern Recognit. 41(10), 2965–2979 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Baran, R., Partila, P., Wilk, R. (2019). Microservices Architecture for Content-Based Indexing of Video Shots. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-98678-4_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)