Keywords: code generation, image, machine learning, resNet, transformers
Using ResNet and Transformer architectures in the problem of source code generation from an image
UDC 004.832.22
DOI: 10.26102/2310-6018/2025.49.2.002
This study examines different ways to optimize a system designed to generate source code from an image. The system itself consists of two parts: an autoencoder for processing images and extracting the necessary features from them, and text processing using LSTM blocks. Recently, many new approaches have been released to solve problems of both improving image processing performance and text processing and prediction. In this study, ResNet architectures were chosen to improve the image processing part and Transformer architecture to improve the text prediction part. As part of the experiments, a comparison was made of the performance of systems consisting of various combinations of architectural solutions of the original system, ResNet architecture and transformers, and a conclusion was made about the quality of prediction based on the performance of the BLEU, chrF++ metrics, as well as the execution of functional tests. The experiments showed that the combination of ResNet and Transformer architectures shows the best result in the task of generating source code from an image, but this combination also requires the longest time for its training.
1. Beltramelli T. pix2code: Generating Code from a Graphical User Interface Screenshot. In: EICS '18: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems, 19–22 June 2018, Paris, France. New York: Association for Computing Machinery; 2018. https://doi.org/10.1145/3220134.3220135
2. Zhu Zh., Xue Zh., Yuan Z. Automatic Graphics Program Generation Using Attention-Based Hierarchical Decoder. In: Computer Vision – ACCV 2018: 14th Asian Conference on Computer Vision: Revised Selected Papers: Part VI, 02–06 December 2018, Perth, Australia. Cham: Springer; 2019. pp. 181–196. https://doi.org/10.1007/978-3-030-20876-9_12
3. Liu Ya., Hu Q., Shu K. Improving pix2code Based BI-directional LSTM. In: 2018 IEEE International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), 16–18 November 2018, Shenyang, China. IEEE; 2019. pp. 220–223. https://doi.org/10.1109/AUTEEE.2018.8720784
4. Zou D., Wu G. Automatic Code Generation for Android Applications Based on Improved Pix2code. Journal of Artificial Intelligence and Technology. 2024;4(4):325–331. https://doi.org/10.37965/jait.2024.0515
5. Nikitin I.V. Influence of the TensorFlow Library's Version on the Quality of Code Generation from an Image. Modeling, Optimization and Information Technology. 2024;12(4). (In Russ.). https://doi.org/10.26102/2310-6018/2024.47.4.040
6. Nikitin I.V. Assessing the Quality of the Result in the Problem of Source Code Generation from an image. Modeling, Optimization and Information Technology. 2025;13(1). (In Russ.). https://doi.org/10.26102/2310-6018/2025.48.1.030
7. He K., Zhang X., Ren Sh., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, USA. IEEE; 2016 pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
8. Balduzzi D., Frean M., Leary L., Lewis J.P., Wan-Duo Ma K., McWilliams B. The Shattered Gradients Problem: If resnets are the answer, then what is the question? In: ICML'17: Proceedings of the 34th International Conference on Machine Learning, 06–11 August 2017, Sydney, Australia. 2017. pp. 342–350. https://doi.org/10.48550/arXiv.1702.08591
9. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I. Attention Is All You Need. In: NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 04–09 December 2017, Long Beach, USA. New York: Curran Associates Inc.; 2017. pp. 6000–6010.
10. Chen W.-Y., Podstreleny P., Cheng W.-H., Chen Y.-Y., Hua K.-L. Code Generation From a Graphical User Interface Via Attention-Based Encoder-Decoder Model. Multimedia Systems. 2022;28(1):121–130. https://doi.org/10.1007/s00530-021-00804-7
11. Popović M. chrF++: Words Helping Character N-grams. In: Proceedings of the Second Conference on Machine Translation, 07–08 September 2017, Copenhagen, Denmark. Association for Computational Linguistics; 2017. pp. 612–618. https://doi.org/10.18653/v1/W17-4770
Keywords: code generation, image, machine learning, resNet, transformers
For citation: Nikitin I.V. Using ResNet and Transformer architectures in the problem of source code generation from an image. Modeling, Optimization and Information Technology. 2025;13(2). URL: https://moitvivt.ru/ru/journal/pdf?id=1863 DOI: 10.26102/2310-6018/2025.49.2.002 (In Russ).
Received 19.03.2025
Revised 31.03.2025
Accepted 04.04.2025