He Qiang
·Paper Publications
Indexed by: Journal paper
Journal: Design, Automation and Test in Europe Conference -DATE 2023
Included Journals: SCI
Affiliation of Author(s): 华中科技大学
Discipline: Engineering
First-Level Discipline: Electronic Science And Technology
Document Type: M
Key Words: SSD, multidimensional features, failure prediction, machine learning, system availability
DOI number: 10.23919/DATE56975.2023.10137082
Abstract: As SSD failures seriously lead to data loss and service interruption, proactive failure prediction is often used to improve system availability. However, the unidimensional SMART-based prediction models hardly predict all drive failures. Some other features applied in data centers and enterprise storage systems are not readily available in consumer storage systems (CSS). To further analyze related failures in production SSD-based CSS, we study nearly 2.3 million SSDs from 12 drive models based on a dataset of SMART logs, trouble tickets, and error logs. We discover that SMART, Firmware Version, WindowsEvent, and BlueScreenof Death (SFWB) are closely related to SSD failures. We further propose a multidimensional-based failure prediction approach (MFPA), which is portable in algorithms, SSD vendors, and PC manufacturers. Experiments on the datasets show that SFWB-based MFPA achieves a high true positive rate (98.18%) and low false positive rate (0.56%), which is 4% higher and 86% lower than the SMART-based model. It is robust and can con-tinuously predict for 2–3 months without iteration, substantially improving the system availability.