Conference:
Special Session on Intelligent Internet of Things Security and Privacy (WISP 2024) at the 21st International Conference on Distributed Computing and Artificial Intelligence (DCAI 2024), 26-28. June 2024, Salamanca, Spain
Authors:
Xanthopoulou G, Siavvas M, Kalouptsoglou I, Kehagias D, Tzovaras D.
Abstract:
Automated classification of software requirements is valuable for software engineering. Recently, Natural Language Processing (NLP) and Machine Learning (ML) techniques have been utilized as an alternative to manual classification of requirements. In this study, we conduct a thorough empirical evaluation of several NLP methods utilized for efficient classification of software requirements. We focus both on the binary classification between functional and non-functional requirements (NFRs), and on the multi-class classification of the NFRs into specific categories, such as security, performance, usability, etc. For this purpose, we collected and enriched a large dataset of labeled software requirements, paying particular emphasis on security-related requirements. A wide range of NLP-based models were constructed and compared, ranging from simple ML models that utilize the Bag-of-Words (BoW) technique for text representation to the more advanced Large Language Models (LLMs) that emerged recently. The results of our analysis demonstrated the ability of all the examined NLP-based models to provide highly accurate requirements classification both at binary and in multi-class setting, with Transformer-based models demonstrating the best predictive performance, thus revealing the benefits of transfer learning.