References

moitvivt

Моделирование, оптимизация и информационные технологии

Modeling, Optimization and Information Technology

2310-6018

Издательство

10.26102/2310-6018/2025.49.2.046

1932

Разработка улучшенного модуля дифференциальной активации с использованием Grad-CAM++ и семантической сегментации для изменения атрибутов лица

Development of an improved differential activation module using Grad-CAM++ and semantic segmentation for facial attribute editing

0000-0003-4103-2036

Гу Чунюй

Gu Chongyu

chongyugu@gmail.com aff-1

0000-0002-2990-8245

Громов

Максим Леонидович

Gromov

Maxim Leonidovich

maxim.leo.gromov@gmail.com aff-2

Национальный исследовательский Томский государственный университет National Research Tomsk State University

01 01 2026

1 1

10.26102/2310-6018/2025.49.2.046

2026

This work is licensed under a Creative Commons Attribution 4.0 International License

Современные методы изменения атрибутов лица страдают от двух системных проблем: нежелательная модификация второстепенных признаков и потеря контекстных деталей (аксессуаров, фона, текстуры волос и т. д.), что приводит к артефактам и ограничивает их применение в задачах, требующих фотографической точности. Для решения этих проблем мы предлагаем улучшенный модуль дифференциальной активации, предназначенный для точного редактирования с сохранением контекстной информации. В отличие от существующего решения (EOGI), предложенное решение включает: использование градиентной информации второго и третьего порядка для точной локализации редактируемых областей, применение увеличения тестового времени (TTA) и метода главных компонент (PCA) для центрирования карты активации классов (CAM) вокруг объектов и удаления большого количества шума, интеграцию данных семантической сегментации для повышения пространственной точности. Экспериментальное оценивание на первых 1000 изображениях CelebA-HQ (разрешение 1024×1024 пикселей) демонстрирует значительное превосходство над современным методом EOGI: снижение среднего значения FID на 13,84 % (от 27,68 до 23,85), снижение среднего значения LPIPS на 7,03 % (от 0,327 до 0,304) и снижение среднего значения MAE на 10,57 % (от 0,0511 до 0,0457). Предложенный метод превосходит существующие подходы как в количественной оценке, так и в качественном сравнении. Результаты демонстрируют улучшенное сохранение деталей (например, серьги, фона), что делает метод применимым в задачах, требующих высокой фотореалистичности.

Modern methods of facial attribute editing suffer from two systemic issues: unintended modification of secondary features and loss of contextual details (such as accessories, background, and hair textures, etc.), which lead to artifacts and restrict their application in scenarios requiring photographic accuracy. To address these problems, we propose an improved differential activation module designed for precise editing while preserving contextual information. In contrast to the existing solution (EOGI), the proposed solution includes: the use of second- and third-order gradient information for precise localization of editable areas, applying test-time augmentation (TTA) and principal component analysis (PCA) to center the class activation map (CAM) around objects and remove a lot of noise, the integration of semantic segmentation data to enhance spatial accuracy. The evaluation on the first 1,000 images of the CelebA-HQ dataset (resolution 1024×1024) demonstrates significant superiority over the current method EOGI: a 13.84 % reduction in the average FID (from 27.68 to 23.85), a 7.03 % reduction in the average LPIPS (from 0.327 to 0.304), and a 10.57 % reduction in the average MAE (from 0.0511 to 0.0457). The proposed method outperforms existing approaches in both quantitative and qualitative analyses. The results demonstrate improved detail preservation (e.g., earrings and backgrounds), which makes the method applicable in tasks demanding high photographic realism.

глубокое обучение изменение атрибутов лица дифференциальная активация карты активации класса (CAM) семантическая сегментация генеративно-состязательная сеть (GAN)

deep learning facial attribute editing differential activation class activation maps (CAM) semantic segmentation generative adversarial network (GAN)

Данная работа была поддержана грантом Китайского совета по стипендиям (CSC) № 201908090255.

The study was performed without external funding.

References 1

He Zh., Zuo W., Kan M., Shan Sh., Chen X. AttGAN: Facial Attribute Editing by Only Changing What You Want. IEEE Transactions on Image Processing. 2019;28(11):5464–5478. https://doi.org/10.1109/TIP.2019.2916751

Qiu H., Yu B., Gong D., Li Zh., Liu W., Tao D. SynFace: Face Recognition with Synthetic Data. In: 2021 IEEE/CVF International Conference on Computer Vision (CVPR), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 10860–10870. https://doi.org/10.1109/ICCV48922.2021.01070

Goodfellow I.J., Pouget-Abadie J., Mirza M., et al. Generative Adversarial Networks. arXiv. URL: https://arxiv.org/abs/1406.2661 [Accessed 19th April 2025].

Xia W., Zhang Yu., Yang Yu., Xue J.-H., Zhou B., Yang M.-H. GAN Inversion: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023;45(3):3121–3138. https://doi.org/10.1109/TPAMI.2022.3181070

Karras T., Laine S., Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, Long Beach, CA, USA. IEEE; 2019. P. 4401–4410. https://doi.org/10.1109/TPAMI.2020.2970919

Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T. Analyzing and Improving the Image Quality of StyleGAN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 8107–8116. https://doi.org/10.1109/CVPR42600.2020.00813

Richardson E., Alaluf Yu., Patashnik O., et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20–25 June 2021, Nashville, TN, USA. IEEE; 2021. P. 2287–2296. https://doi.org/10.1109/CVPR46437.2021.00232

Tov O., Alaluf Yu., Nitzan Yo., Patashnik O., Cohen-Or D. Designing an Encoder for Stylegan Image Manipulation. ACM Transactions on Graphics (TOG). 2021;40(4). https://doi.org/10.1145/3450626.3459838

Alaluf Yu., Patashnik O., Cohen-Or D. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 6691–6700. https://doi.org/10.1109/ICCV48922.2021.00664

Wang T., Zhang Yo., Fan Ya., Wang J., Chen Q. High-Fidelity GAN Inversion for Image Attribute Editing. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–24 June 2022, New Orleans, LA, USA. IEEE; 2022. P. 11369–11378. https://doi.org/10.1109/CVPR52688.2022.01109

Song H., Du Yo., Xiang T., Dong J., Qin J., He Sh. Editing Out-of-Domain GAN Inversion via Differential Activations. In: Computer Vision – ECCV 2022: 17th European Conference: Proceedings: Part XVII, 23–27 October 2022, Tel Aviv, Israel. Cham: Springer; 2022. P. 1–17. https://doi.org/10.1007/978-3-031-19790-1_1

Chattopadhay A., Sarkar A., Howlader P., Balasubramanian V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 12–15 March 2018, Lake Tahoe, NV, USA. IEEE; 2018. P. 839–847. https://doi.org/10.1109/WACV.2018.00097

Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), 22–29 October 2017, Venice, Italy. IEEE; 2017. P. 618–626. https://doi.org/10.1109/ICCV.2017.74

Muhammad M.B., Yeasin M. Eigen-CAM: Class Activation Map Using Principal Components. In: 2020 International Joint Conference on Neural Networks (IJCNN), 19–24 July 2020, Glasgow, UK. IEEE; 2020. P. 1–7. https://doi.org/10.1109/IJCNN48605.2020.9206626

He K., Zhang X., Ren Sh., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, NV, USA. IEEE; 2016. P. 770–778. https://doi.org/10.1109/CVPR.2016.90

Lee Ch.-H., Liu Z., Wu L., Luo P. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 5548–5557. https://doi.org/10.1109/CVPR42600.2020.00559

Karras T., Aila T., Laine S., Lehtinen J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv. URL: https://arxiv.org/abs/1710.10196 [Accessed 19th April 2025].

Zhang R., Isola Ph., Efros A.A., Shechtman E., Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, Salt Lake City, UT, USA. IEEE; 2018. P. 586–595. https://doi.org/10.1109/CVPR.2018.00068

Heusel M., Ramsauer H., Unterthiner Th., Nessler B., Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv. URL: https://arxiv.org/abs/1706.08500 [Accessed 19th April 2025].

Shen Yu., Yang C., Tang X., Zhou B. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022;44(4):2004–2018. https://doi.org/10.1109/TPAMI.2020.3034267

The authors declare that there are no conflicts of interest present.