Context: Data smells indicate potential issues in the data that warrant further investigation. Thus, a data smell can be described as a hint that data values may be of low inherent data quality caused by violation of recommended best practices, poor quality of data sources or poor data handling in preceding processes. Based on a literature study, several data smells were identified and grouped in different categories. However, there is no tool support available which enables an automatic detection of these data smells.
Goal: Development of a software application that detects these data smells.
Procedure:
- Review available data validation libraries and examine adaptability to implement data smell detection
- Selection of programming language and determine implementation structure
- Implementation of ‘data smell detection’ software application
- Ensure adaptability and maintenance of application (e.g. thresholds)
- Evaluation of ‘data smell detection’ on a test data set