Skip to main content

Microservices Architecture for Content-Based Indexing of Video Shots

  • Conference paper
  • First Online:
  • 713 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 833))

Abstract

Three different content-based video indexing microservices dedicated to index video shots for the needs of the IMCOP Content Discovery Platform are presented in the paper. These three services as well as numerous others cooperate with each other within the IMCOP platform to describe, enrich and relate the multimedia data regarding their audio, textual and visual content. Owing to the analysis they perform, the IMCOP platform can discover, recommend and deliver the personalized multimedia content to various IMCOP’s prospective recipients.

As these recipients may also require the personalized video content, services, as e.g. the presented ones, designed respectively to discriminate between characters in videos as well as text- and speech-based indexing of video shots, are absolutely essential. Goals of these services, their approaches and how they comply with objectives of the IMCOP’s microservices architecture are carefully presented in the paper. Research procedures and the results of examinations that have been carried out to verify their pretty high accuracies are also reported and discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://searchmicroservices.techtarget.com/definition/RESTful-API.

  2. 2.

    https://docs.microsoft.com/en-us/dotnet/standard/microservices-architecture/architect-microservice-container-applications/service-oriented-architecture.

  3. 3.

    The flowchart presented in Figs. 3, 4, 6 and 8 have been drawn under inspiration of the Fuji Xerox Video Indexing Technology website: https://www.fujixerox.com/eng/company/technology/production/multimedia/talkminer.html.

  4. 4.

    https://docs.opencv.org/3.3.0/dc/dc3/tutorial_py_matcher.html.

  5. 5.

    https://docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html.

  6. 6.

    https://opensource.google.com/projects/tesseract.

  7. 7.

    http://kaldi-asr.org/.

  8. 8.

    http://www.voxforge.org/.

  9. 9.

    https://github.com/kaldi-asr/kaldi/tree/master/egs/hub4_english/s5.

References

  1. Baran, R., Dziech, A., Zeja, A.: A capable multimedia content discovery platform based on visual content analysis and intelligent data enrichment. Multimed. Tools Appl., 1–15 (2017). https://doi.org/10.1007/s11042-017-5014-1

  2. Wolff, E.: Microservices: Flexible Software Architectures. Addison-Wesley, Boston (2016)

    Google Scholar 

  3. Baran, R., Zeja, A.: The IMCOP system for data enrichment and content discovery and delivery. In: Proceedings of the 2015 International Conference on Computational Science and Computational Intelligence, Las Vegas, USA, pp. 143–146 (2015)

    Google Scholar 

  4. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  5. Bloehdorn, S., et al.: Semantic annotation of images and videos for multimedia analysis. In: Gómez-Pérez, A., Euzenat J. (eds.) The Semantic Web: Research and Applications. ESWC 2005. LNCS, vol. 3532, pp. 592–607. Springer, Heidelberg (2005)

    Google Scholar 

  6. Budnik, M., et al.: Learned features versus engineered features for semantic video indexing. In: 13th International Workshop on Content-Based Multimedia Indexing, Prague, pp. 1–6 (2015)

    Google Scholar 

  7. Leszczuk, M., Grega, M.: Prototype software for video summary of bronchoscopy procedures with the use of mechanisms designed to identify, index and search. In: Piȩtka, E., Kawa, J. (eds.) Information Technologies in Biomedicine. Advances in Intelligent and Soft Computing, vol. 69, pp. 587–598. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Grega, M., et al.: Multimed. Tools Appl. 68(1), 95–110 (2014)

    Google Scholar 

  9. Zhang, H.J., Wu, J., Zhong, D., Smoliar, S.W.: An integrated system for content-based video retrieval and browsing. Pattern Recognit. 30(4), 643–658 (1997)

    Google Scholar 

  10. Leszczuk, M., et al.: Video summarization framework for newscasts and reports – work in progress. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2017, CCIS, vol. 785, pp. 86–97. Springer, Cham (2017)

    Google Scholar 

  11. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 International Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, vol. 1, pp. 511–518. IEEE (2001)

    Google Scholar 

  12. Baran, R., et al.: Face recognition for movie character and actor discrimination based on similarity scores. In: Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1333–1338. IEEE, Las Vegas (2016)

    Google Scholar 

  13. Rublee, E., et al.: ORB: an efficient alternative to SIFT or SURF. In: 13th International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE, Barcelona (2011)

    Google Scholar 

  14. http://research.wstkt.pl/?page_id=88

  15. Chen, S.S., et al.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of the 18th International Conference on Image Processing, Brussels, pp. 2609–2612. IEEE (2011)

    Google Scholar 

  16. Baran, R., Partila, P., Wilk, R.: Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. In: Karwowski, W., Ahram, T. (eds.) IHSI 2018, AISC, vol. 722, pp. 42–48. Springer, Cham (2018)

    Google Scholar 

  17. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the 2012 International Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 3538–3545. IEEE (2012)

    Google Scholar 

  18. Povey, D., Ghoshal, A., Boulianne, G., et al.: The Kaldi speech recognition toolkit. In: Proceedings of the Workshop on Automatic Speech Recognition and Understanding. IEEE, Big Island (2011)

    Google Scholar 

  19. O’Shaughnesssy, D.: Invited paper: automatic speech recognition: history, methods and challenges. Pattern Recognit. 41(10), 2965–2979 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Remigiusz Baran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baran, R., Partila, P., Wilk, R. (2019). Microservices Architecture for Content-Based Indexing of Video Shots. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_45

Download citation

Publish with us

Policies and ethics