Keywords: single memory failures, single and multiple faults, double faults, frequency of multiple faults, measures to protect against multiple failures, double error correction codes
Protection against multiple memory faults
UDC 004.052.2
DOI: 10.26102/2310-6018/2024.45.2.025
The problem of multiple faults in the memory chips of on-board equipment of spacecraft as a result of the impact of individual nuclear particles in outer space is considered. A review of the literature showed that the danger of multiple failures is real and will increase as technological standards for the design of electronic components decrease. Double multiple faults are currently the most pressing threat, as they are caused by charged particles with relatively low energies. Double faults can be adjacent or non-adjacent. Adjacent double faults are caused by the simultaneous action of a single nuclear particle. Non-adjacent faults are the result of the accumulation of single failures that occurred at different times in different storage cells of the same memory word. Under certain conditions, the occurrence of double non-contiguous errors can be avoided. To protect against double adjacent errors, correcting codes are used. These codes are relatively new and there is no general description of their construction. These codes are guaranteed to correct single errors and double adjacent errors, but have a significant probability of erroneously correcting a non-adjacent double error. But when moving to practical use, it is necessary to determine the requirements for the type of check matrix for these codes, to find a general algorithm for their construction for different memory word lengths, with low redundancy and high performance, provided that the correcting abilities of the code are subject to the requirements of detecting and correcting only single and double adjacent errors and no additional ones.
1. Podzolko M.V. Modeling of the Risk of Single Event Upsets from Cosmic Particles for Memory with Error Correction. Vestnik Moskovskogo universiteta. Seriya 3. Fizika. Astronomiya = Moscow University Physics Bulletin. 2017;72(6):601–608. https://doi.org/10.3103/S0027134917060133.
2. Kuznetsov N.V., Malyshkin Yu.M., Nikolaeva N.I., Nymmik R.A., Panasyuk M.I., Uzhegov V.M., Yakovlev M.V. Software complex COSRAD for radiation environment forecasting onboard spacecrafts. Voprosy atomnoi nauki i tekhniki. Seriya: Fizika radiatsionnogo vozdeistviya na radioelektronnuyu apparaturu = Questions of atomic science and technics. Series: Physics of radiation effects on radio-electronic equipment. 2011;(2):72–78. (In Russ.).
3. Timothy J.D. A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. URL: https://web.archive.org/web/20150923233043/http://www.ece.umd.edu/courses/enee759h.S2003/references/ibm_chipkill.pdf (дата обращения: 26.03.2024).
4. Meschanov V.D., Lushnikov A.S., Rybalko E.S., Fomitcheva N.N. The model of SRAM with embedded circuit error detection and correction against single event upset. Elektronnaya tekhnika. Seriya 3: Mikroelektronika = Electronic Engineering. Series 3: Microelectronics. 2016;(2):71–76. (In Russ.).
5. Krasnikov G.Ya., Lushnikov A.S., Meschanov V.D., Rybalko E.S., Fomicheva N.N., Shelepin N.A. Studying the fault tolerance of SRAM with the function of correcting single event upsets caused by heavy ions. Nanoindustriya = Nanoindustry. 2018;(9):327–329. (In Russ.). https://doi.org/10.22184/1993-8578.2018.82.327.329.
6. Hsiao M.Y. A Class of Optimal Minimum Odd-weight-column SEC-DED Codes. IBM Journal of Research and Development. 1970;14(4):395–401. https://doi.org/10.1147/rd.144.0395.
7. Hafer C., Mabra J., Slocum D., Farris T., Jordan A. SEE and TID Results for a RadHard by Design 16Mbit SRAM with Embedded EDAC. In: 2006 IEEE Radiation Effects Data Workshop, 17-21 July 2006, Ponte Vedra Beach, FL, USA. IEEE; 2006. P. 131–135. https://doi.org/10.1109/REDW.2006.295481.
8. Krasnyuk A.A., Petrov K.A. Features of application ECC methods in sub-100 nm SRAMS for space systems. Problemy razrabotki perspektivnykh mikro- i nanoelektronnykh sistem = Problems of Advanced Micro- and Nanoelectronic Systems Development. 2012;(1):638–641. (In Russ.).
9. Smulsky A.V., Alexeev S.I., Kudryavtsev Y.E. To the issue of onboard equipment ensure sustainability of the advanced spacecraft with respect to multiple failures from the actions of the space individual nuclear particles. Vestnik NPO im. S.A. Lavochkina. 2014;(4):97–102. (In Russ.).
10. Zebrev G.I., Ishutin I.O., Useinov R.G., Anashin V.S. Calculation methodology of soft single errors frequency for modern devices. Voprosy atomnoi nauki i tekhniki. Seriya: Fizika radiatsionnogo vozdeistviya na radioelektronnuyu apparaturu = Questions of atomic science and technics. Series: Physics of radiation effects on radio-electronic equipment. 2010;(2):82–89. (In Russ.).
11. Petrov K.A. Error control coding for submicron dynamic RAM. Problemy razrabotki perspektivnykh mikro- i nanoelektronnykh sistem = Problems of Advanced Micro- and Nanoelectronic Systems Development. 2012;(1):419–422. (In Russ.).
12. Dutta A., Touba N.A. Multiple Bit Upset Tolerant Memory Using a Selective Cycle Avoidance Based SEC-DED-DAEC Code. In: 25th IEEE VLSI Test Symposium (VTS'07), 06-10 May 2007, Berkeley, CA, USA. IEEE; 2007. P. 349–354. https://doi.org/10.1109/VTS.2007.40.
13. Fujiwara E., Pradhan D.K. Error-Control Coding in Computers. Computer. 1990;23:63–72. https://doi.org/10.1109/2.56853.
14. Pontarelli S., Cardarilli G.C., Re M., Salsano A. Error Correction Codes for SEU and SEFI Tolerant Memory Systems. In: 2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems: DFT '09: Proceedings of the 2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 07-09 October 2009, Chicago, IL, USA. NW Washington, DC: IEEE Computer Society; 2009. P. 425–430. https://doi.org/10.1109/DFT.2009.8.
15. Khanov V.Kh., Lepeshkina E.S., Nepomnyashchikh L.I. Analiz ustoichivosti struktur pamyati k mnozhestvennym sboyam. In: XXI Vserossiiskaya nauchno-tekhnicheskaya konferentsiya «sovremennye problemy radioelektroniki»: Sovremennye problemy radioelektroniki, 03-04 May 2018, Krasnoyarsk, Russia. Krasnoyarsk: Siberian Federal University; 2018. P. 177–181. (In Russ.).
16. Kustov N.D., Lepeshkina E.S., Khanov V.K. Efficiency Estimation of Single Error Correction, Double Error Detection and Double Adjacent Error Correction Codes. In: 9th Computer Science On-line Conference 2020, Applied Informatics and Cybernetics in Intelligent Systems: Proceedings of the 9th Computer Science On-line Conference 2020, 23-26 April 2020, Prague, Czech Republic. Cham: Springer; 2020. P. 518–525. https://doi.org/10.1007/978-3-030-51974-2_48.
17. Datta R., Touba N.A. Exploiting Unused Spare Columns to Improve Memory ECC. In: 27th IEEE VLSI Test Symposium: VTS '09: Proceedings of the 2009 27th IEEE VLSI Test Symposium, 03-07 May 2009, Santa Cruz, CA, USA. NW Washington, DC: IEEE Computer Society; 2009. P. 47–52. https://doi.org/10.1109/VTS.2009.52.
18. Neale A., Sachdev M. A New SEC-DED Error Correction Code Subclass for Adjacent MBU Tolerance in Embedded Memory. IEEE Transactions on Device and Materials Reliability. 2013;13(1):223–230. https://doi.org/10.1109/TDMR.2012.2232671.
19. Reviriego P., Liu S.S., Sánchez Macián A., Xiao L., Maestro J.A. Unequal error protection codes derived from SEC DED codes. Electronics Letters. 2016;52(8):619–620. https://doi.org/10.1049/el.2016.0077.
20. Cha S., Yoon H. Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Pre-computation for Memories. Journal of Semiconductor Technology and Science. 2012;12(4):418–425. https://doi.org/10.5573/JSTS.2012.12.4.418.
21. Jun H., Lee Y. Protection of On chip Memory Systems against Multiple Cell Upsets Using Double-adjacent Error Correction Codes. International Journal of Computer and Information Technology. 2014;3(6):1316–1320.
Keywords: single memory failures, single and multiple faults, double faults, frequency of multiple faults, measures to protect against multiple failures, double error correction codes
For citation: Lepeshkina E.S. Protection against multiple memory faults. Modeling, Optimization and Information Technology. 2024;12(2). URL: https://moitvivt.ru/ru/journal/pdf?id=1551 DOI: 10.26102/2310-6018/2024.45.2.025 (In Russ).
Received 14.04.2024
Revised 22.04.2024
Accepted 06.05.2024
Published 30.06.2024