Predicting Human Values in Software Requirements with Machine Learning


The master’s thesis should explore the potential of machine learning (ML) methods to automatically identify and predict human values in software requirements.

In value-sensitive software engineering, capturing human values (e.g., privacy, sustainability, fairness, security) within requirements is essential but often neglected. Manual identification of values is labor-intensive, error-prone, and highly subjective, depending on individual stakeholders’ perspectives. Machine learning approaches, particularly natural language processing (NLP), offer the potential to detect patterns in textual requirements and classify them into relevant value categories, thereby supporting consistency and reducing bias in the requirements engineering process.

To address this, the thesis should analyze existing value classification frameworks, NLP methods, and ML techniques for text categorization. Different feature representations (e.g., embeddings, transformers) and models (e.g., supervised classifiers, deep learning approaches) can be evaluated for their effectiveness in predicting values from unstructured requirements texts.

The research focus should include:

  • Human Values & Requirements: Explore value frameworks and taxonomies.
  • ML for NLP: Apply and compare methods for value detection in requirements.
  • LLMs (optional): Use embeddings or fine-tuned models for classification.
  • Evaluation: Assess precision, recall, F1-score, and interpretability.

The framework should be designed to be extensible, allowing for the integration of new value taxonomies or additional datasets. The evaluation should compare ML-based value prediction with manual classification to highlight improvements in efficiency, consistency, and scalability.

Thesis Goal The goal of this Master’s thesis is to investigate machine learning approaches for predicting values in software requirements, develop a prototype ML-based classifier, and evaluate its effectiveness using real-world or synthetic requirements datasets.