<?xml version="1.0" encoding="UTF-8"?>
<article article-type="research-article" dtd-version="1.3" xml:lang="ru" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://metafora.rcsi.science/xsd_files/journal3.xsd">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">moitvivt</journal-id>
      <journal-title-group>
        <journal-title xml:lang="ru">Моделирование, оптимизация и информационные технологии</journal-title>
        <trans-title-group xml:lang="en">
          <trans-title>Modeling, Optimization and Information Technology</trans-title>
        </trans-title-group>
      </journal-title-group>
      <issn pub-type="epub">2310-6018</issn>
      <publisher>
        <publisher-name>Издательство</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.26102/2310-6018/2025.49.2.046</article-id>
      <article-id pub-id-type="custom" custom-type="elpub">1932</article-id>
      <title-group>
        <article-title xml:lang="ru">Разработка улучшенного модуля дифференциальной активации с использованием Grad-CAM++ и семантической сегментации для изменения атрибутов лица</article-title>
        <trans-title-group xml:lang="en">
          <trans-title>Development of an improved differential activation module using Grad-CAM++ and semantic segmentation for facial attribute editing</trans-title>
        </trans-title-group>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0003-4103-2036</contrib-id>
          <name-alternatives>
            <name name-style="eastern" xml:lang="ru">
              <surname>Гу Чунюй</surname>
              <given-names/>
            </name>
            <name name-style="western" xml:lang="en">
              <surname>Gu Chongyu</surname>
              <given-names/>
            </name>
          </name-alternatives>
          <email>chongyugu@gmail.com</email>
          <xref ref-type="aff">aff-1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0002-2990-8245</contrib-id>
          <name-alternatives>
            <name name-style="eastern" xml:lang="ru">
              <surname>Громов</surname>
              <given-names>Максим Леонидович</given-names>
            </name>
            <name name-style="western" xml:lang="en">
              <surname>Gromov</surname>
              <given-names>Maxim Leonidovich</given-names>
            </name>
          </name-alternatives>
          <email>maxim.leo.gromov@gmail.com</email>
          <xref ref-type="aff">aff-2</xref>
        </contrib>
      </contrib-group>
      <aff-alternatives id="aff-1">
        <aff xml:lang="ru">Национальный исследовательский Томский государственный университет</aff>
        <aff xml:lang="en">National Research Tomsk State University</aff>
      </aff-alternatives>
      <aff-alternatives id="aff-2">
        <aff xml:lang="ru">Национальный исследовательский Томский государственный университет</aff>
        <aff xml:lang="en">National Research Tomsk State University</aff>
      </aff-alternatives>
      <pub-date pub-type="epub">
        <day>01</day>
        <month>01</month>
        <year>2026</year>
      </pub-date>
      <volume>1</volume>
      <issue>1</issue>
      <elocation-id>10.26102/2310-6018/2025.49.2.046</elocation-id>
      <permissions>
        <copyright-statement>Copyright © Авторы, 2026</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/">
          <license-p>This work is licensed under a Creative Commons Attribution 4.0 International License</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://moitvivt.ru/ru/journal/article?id=1932"/>
      <abstract xml:lang="ru">
        <p>Современные методы изменения атрибутов лица страдают от двух системных проблем: нежелательная модификация второстепенных признаков и потеря контекстных деталей (аксессуаров, фона, текстуры волос и т. д.), что приводит к артефактам и ограничивает их применение в задачах, требующих фотографической точности. Для решения этих проблем мы предлагаем улучшенный модуль дифференциальной активации, предназначенный для точного редактирования с сохранением контекстной информации. В отличие от существующего решения (EOGI), предложенное решение включает: использование градиентной информации второго и третьего порядка для точной локализации редактируемых областей, применение увеличения тестового времени (TTA) и метода главных компонент (PCA) для центрирования карты активации классов (CAM) вокруг объектов и удаления большого количества шума, интеграцию данных семантической сегментации для повышения пространственной точности. Экспериментальное оценивание на первых 1000 изображениях CelebA-HQ (разрешение 1024×1024 пикселей) демонстрирует значительное превосходство над современным методом EOGI: снижение среднего значения FID на 13,84 % (от 27,68 до 23,85), снижение среднего значения LPIPS на 7,03 % (от 0,327 до 0,304) и снижение среднего значения MAE на 10,57 % (от 0,0511 до 0,0457). Предложенный метод превосходит существующие подходы как в количественной оценке, так и в качественном сравнении. Результаты демонстрируют улучшенное сохранение деталей (например, серьги, фона), что делает метод применимым в задачах, требующих высокой фотореалистичности.</p>
      </abstract>
      <trans-abstract xml:lang="en">
        <p>Modern methods of facial attribute editing suffer from two systemic issues: unintended modification of secondary features and loss of contextual details (such as accessories, background, and hair textures, etc.), which lead to artifacts and restrict their application in scenarios requiring photographic accuracy. To address these problems, we propose an improved differential activation module designed for precise editing while preserving contextual information. In contrast to the existing solution (EOGI), the proposed solution includes: the use of second- and third-order gradient information for precise localization of editable areas, applying test-time augmentation (TTA) and principal component analysis (PCA) to center the class activation map (CAM) around objects and remove a lot of noise, the integration of semantic segmentation data to enhance spatial accuracy. The evaluation on the first 1,000 images of the CelebA-HQ dataset (resolution 1024×1024) demonstrates significant superiority over the current method EOGI: a 13.84 % reduction in the average FID (from 27.68 to 23.85), a 7.03 % reduction in the average LPIPS (from 0.327 to 0.304), and a 10.57 % reduction in the average MAE (from 0.0511 to 0.0457). The proposed method outperforms existing approaches in both quantitative and qualitative analyses. The results demonstrate improved detail preservation (e.g., earrings and backgrounds), which makes the method applicable in tasks demanding high photographic realism.</p>
      </trans-abstract>
      <kwd-group xml:lang="ru">
        <kwd>глубокое обучение</kwd>
        <kwd>изменение атрибутов лица</kwd>
        <kwd>дифференциальная активация</kwd>
        <kwd>карты активации класса (CAM)</kwd>
        <kwd>семантическая сегментация</kwd>
        <kwd>генеративно-состязательная сеть (GAN)</kwd>
      </kwd-group>
      <kwd-group xml:lang="en">
        <kwd>deep learning</kwd>
        <kwd>facial attribute editing</kwd>
        <kwd>differential activation</kwd>
        <kwd>class activation maps (CAM)</kwd>
        <kwd>semantic segmentation</kwd>
        <kwd>generative adversarial network (GAN)</kwd>
      </kwd-group>
      <funding-group>
        <funding-statement xml:lang="ru">Данная работа была поддержана грантом Китайского совета по стипендиям (CSC) № 201908090255.</funding-statement>
        <funding-statement xml:lang="en">The study was performed without external funding.</funding-statement>
      </funding-group>
    </article-meta>
  </front>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="cit1">
        <label>1</label>
        <mixed-citation xml:lang="ru">He Zh., Zuo W., Kan M., Shan Sh., Chen X. AttGAN: Facial Attribute Editing by Only Changing What You Want. IEEE Transactions on Image Processing. 2019;28(11):5464–5478. https://doi.org/10.1109/TIP.2019.2916751</mixed-citation>
      </ref>
      <ref id="cit2">
        <label>2</label>
        <mixed-citation xml:lang="ru">Qiu H., Yu B., Gong D., Li Zh., Liu W., Tao D. SynFace: Face Recognition with Synthetic Data. In: 2021 IEEE/CVF International Conference on Computer Vision (CVPR), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 10860–10870. https://doi.org/10.1109/ICCV48922.2021.01070</mixed-citation>
      </ref>
      <ref id="cit3">
        <label>3</label>
        <mixed-citation xml:lang="ru">Goodfellow I.J., Pouget-Abadie J., Mirza M., et al. Generative Adversarial Networks. arXiv. URL: https://arxiv.org/abs/1406.2661 [Accessed 19th April 2025].</mixed-citation>
      </ref>
      <ref id="cit4">
        <label>4</label>
        <mixed-citation xml:lang="ru">Xia W., Zhang Yu., Yang Yu., Xue J.-H., Zhou B., Yang M.-H. GAN Inversion: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023;45(3):3121–3138. https://doi.org/10.1109/TPAMI.2022.3181070</mixed-citation>
      </ref>
      <ref id="cit5">
        <label>5</label>
        <mixed-citation xml:lang="ru">Karras T., Laine S., Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, Long Beach, CA, USA. IEEE; 2019. P. 4401–4410. https://doi.org/10.1109/TPAMI.2020.2970919</mixed-citation>
      </ref>
      <ref id="cit6">
        <label>6</label>
        <mixed-citation xml:lang="ru">Karras T., Laine S., Aittala M., Hellsten J., Lehtinen J., Aila T. Analyzing and Improving the Image Quality of StyleGAN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 8107–8116. https://doi.org/10.1109/CVPR42600.2020.00813</mixed-citation>
      </ref>
      <ref id="cit7">
        <label>7</label>
        <mixed-citation xml:lang="ru">Richardson E., Alaluf Yu., Patashnik O., et al. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20–25 June 2021, Nashville, TN, USA. IEEE; 2021. P. 2287–2296. https://doi.org/10.1109/CVPR46437.2021.00232</mixed-citation>
      </ref>
      <ref id="cit8">
        <label>8</label>
        <mixed-citation xml:lang="ru">Tov O., Alaluf Yu., Nitzan Yo., Patashnik O., Cohen-Or D. Designing an Encoder for Stylegan Image Manipulation. ACM Transactions on Graphics (TOG). 2021;40(4). https://doi.org/10.1145/3450626.3459838</mixed-citation>
      </ref>
      <ref id="cit9">
        <label>9</label>
        <mixed-citation xml:lang="ru">Alaluf Yu., Patashnik O., Cohen-Or D. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10–17 October 2021, Montreal, QC, Canada. IEEE; 2021. P. 6691–6700. https://doi.org/10.1109/ICCV48922.2021.00664</mixed-citation>
      </ref>
      <ref id="cit10">
        <label>10</label>
        <mixed-citation xml:lang="ru">Wang T., Zhang Yo., Fan Ya., Wang J., Chen Q. High-Fidelity GAN Inversion for Image Attribute Editing. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–24 June 2022, New Orleans, LA, USA. IEEE; 2022. P. 11369–11378. https://doi.org/10.1109/CVPR52688.2022.01109</mixed-citation>
      </ref>
      <ref id="cit11">
        <label>11</label>
        <mixed-citation xml:lang="ru">Song H., Du Yo., Xiang T., Dong J., Qin J., He Sh. Editing Out-of-Domain GAN Inversion via Differential Activations. In: Computer Vision – ECCV 2022: 17th European Conference: Proceedings: Part XVII, 23–27 October 2022, Tel Aviv, Israel. Cham: Springer; 2022. P. 1–17. https://doi.org/10.1007/978-3-031-19790-1_1</mixed-citation>
      </ref>
      <ref id="cit12">
        <label>12</label>
        <mixed-citation xml:lang="ru">Chattopadhay A., Sarkar A., Howlader P., Balasubramanian V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 12–15 March 2018, Lake Tahoe, NV, USA. IEEE; 2018. P. 839–847. https://doi.org/10.1109/WACV.2018.00097</mixed-citation>
      </ref>
      <ref id="cit13">
        <label>13</label>
        <mixed-citation xml:lang="ru">Selvaraju R.R., Cogswell M., Das A., Vedantam R., Parikh D., Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), 22–29 October 2017, Venice, Italy. IEEE; 2017. P. 618–626. https://doi.org/10.1109/ICCV.2017.74</mixed-citation>
      </ref>
      <ref id="cit14">
        <label>14</label>
        <mixed-citation xml:lang="ru">Muhammad M.B., Yeasin M. Eigen-CAM: Class Activation Map Using Principal Components. In: 2020 International Joint Conference on Neural Networks (IJCNN), 19–24 July 2020, Glasgow, UK. IEEE; 2020. P. 1–7. https://doi.org/10.1109/IJCNN48605.2020.9206626</mixed-citation>
      </ref>
      <ref id="cit15">
        <label>15</label>
        <mixed-citation xml:lang="ru">He K., Zhang X., Ren Sh., Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, Las Vegas, NV, USA. IEEE; 2016. P. 770–778. https://doi.org/10.1109/CVPR.2016.90</mixed-citation>
      </ref>
      <ref id="cit16">
        <label>16</label>
        <mixed-citation xml:lang="ru">Lee Ch.-H., Liu Z., Wu L., Luo P. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA. IEEE; 2020. P. 5548–5557. https://doi.org/10.1109/CVPR42600.2020.00559</mixed-citation>
      </ref>
      <ref id="cit17">
        <label>17</label>
        <mixed-citation xml:lang="ru">Karras T., Aila T., Laine S., Lehtinen J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv. URL: https://arxiv.org/abs/1710.10196 [Accessed 19th April 2025].</mixed-citation>
      </ref>
      <ref id="cit18">
        <label>18</label>
        <mixed-citation xml:lang="ru">Zhang R., Isola Ph., Efros A.A., Shechtman E., Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, Salt Lake City, UT, USA. IEEE; 2018. P. 586–595. https://doi.org/10.1109/CVPR.2018.00068</mixed-citation>
      </ref>
      <ref id="cit19">
        <label>19</label>
        <mixed-citation xml:lang="ru">Heusel M., Ramsauer H., Unterthiner Th., Nessler B., Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv. URL: https://arxiv.org/abs/1706.08500 [Accessed 19th April 2025].</mixed-citation>
      </ref>
      <ref id="cit20">
        <label>20</label>
        <mixed-citation xml:lang="ru">Shen Yu., Yang C., Tang X., Zhou B. InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022;44(4):2004–2018. https://doi.org/10.1109/TPAMI.2020.3034267</mixed-citation>
      </ref>
    </ref-list>
    <fn-group>
      <fn fn-type="conflict">
        <p>The authors declare that there are no conflicts of interest present.</p>
      </fn>
    </fn-group>
  </back>
</article>