This work presents a comprehensive study of the image detection method using CNN and pre-trained models. An object detection method has been developed based on a combination of modern convolutional neural network (CNN) architectures and a transfer learning strategy. Special attention in the developed method is paid to the algorithm for geometric transformation of the input image, which performs scaling while preserving the proportions (aspect ratio) and adding the minimum necessary fields (padding). To improve the interpretation of the neural network output data, a visualization module has been developed that imposes bounding boxes and class labels on the output image. The proposed approach uses the weights of pre-trained models. The results demonstrate an increase in network convergence speed and detection accuracy compared to training models "from scratch", implemented in Python using the TensorFlow/Keras, OpenCV, and NumPy libraries. The practical value of the work lies in creating a full cycle (pipeline) of developing a detection system: from data preparation to a ready-made inference module. The developed toolkit is universal and can be quickly adapted to solve applied tasks (video surveillance, quality control, traffic monitoring) by replacing the dataset without changing the software architecture.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436-444, 2015, [Online]. Available: https://doi.org/10.1038/nature14539.
R. Szeliski, Computer Vision: Algorithms and Applications, Cham: Springer, 947 p., 2023, [Online]. Available: https://doi.org/10.1007/978-3-030-34372-9.
S. K. Rao, Artificial Neural Networks: Advanced Applications, New Delhi: Discovery Publishing House PVT LTD, 248 p., 2024.
A. B. Lozynskyy, I. M. Romanyshyn, and B. P. Rusyn, “Intensity estimation of noise-like signal in presence of uncorrelated pulse interferences,” Radioelectronics and Communications Systems, vol. 62, no. 5, pp. 214-222, 2019, [Online]. Available: https://doi.org/10.3103/S0735272719050030.
C. Wöhler, 3D Computer Vision: Efficient Methods and Applications, London: Springer, 400 p., 2014.
O. Alonso and R. Baeza-Yates, Information Retrieval: Advanced Topics and Techniques, New York: Association for Computing Machinery, 836 p., 2024, [Online]. Available: https://dl.acm.org/doi/book/10.1145/3674127.
T. Filimonova, O. Pursky, V. Babenko, A. Nechepourenko, V. Shvets, and V. Gamaliy, “Text Sentiment Analysis using Different Types of Recurrent Neural Networks,” in 2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN), Jul. 2024, [Online]. Available: https://doi.org/10.1109/ICIPCN63822.2024.00068.
J. F. Peters, Foundations of Computer Vision: Computational Geometry, Visual Image Structures and Object Shape Detection, Cham: Springer, 744 p., 2017, [Online]. Available: https://doi.org/10.1007/978-3-319-52483-2.
Y. P. Putyatin, Image Processing in Computer Vision Systems, Kharkiv: Prom-Art Publishing, 288 p., 2019.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, [Online]. Available: https://doi.org/10.1109/CVPR.2016.90.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, [Online]. Available: https://doi.org/10.1109/CVPR.2016.91.
J. Fan, S. Deng, X. Song, J. Liu, and Y. Sun, “A gradient-based lightweight network automated design method for facial expression recognition,” Expert Systems with Applications, vol. 296, p. 129130, 2026, [Online]. Available: https://doi.org/10.1016/j.eswa.2025.129130.
L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and M. Pietikäinen, “Deep learning for generic object detection: A survey,” International Journal of Computer Vision, vol. 128, pp. 261-318, 2020, [Online]. Available: https://doi.org/10.1007/s11263-019-01247-4.
S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv, 2017, [Online]. Available: https://doi.org/10.48550/arXiv.1706.05098.
M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, [Online]. Available: https://doi.org/10.1109/CVPR42600.2020.01079.
R. C. Gonzalez and R. E. Woods, Digital Image Processing, London: Pearson, 1168 p., 2018.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2014, [Online]. Available: https://doi.org/10.1109/CVPR.2014.81.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017, [Online]. Available: https://doi.org/10.1109/TPAMI.2016.2577031.
M. Everingham, S. M. Eslami, L. Gool, C. K. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes Challenge: A Retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98-136, 2015, [Online]. Available: https://doi.org/10.1007/s11263-014-0733-5.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 2015, [Online]. Available: https://doi.org/10.1007/s11263-015-0816-y.
“PASCAL VOC 2012. A Benchmark Dataset for Multi-Class Object Detection,” [Online]. Available: https://www.kaggle.com/datasets/banuprasadb/pascal-voc-2012, [Accessed: Jan. 14, 2026].