Model Decay vs. Concept Drift: do we need to re-label our data to improve accuracy?

Jul 02, 2023

When a model is successful and provides accurate predictions, there are instances where it can eventually become completely incorrect due to changing conditions. For instance, a crime prediction model that effectively forecasts crime locations may eventually prompt criminals to modify their behavior, thereby rendering the model inaccurate once again.

This phenomenon is commonly referred to as "model impact" or "model-induced change." It arises when the utilization or awareness of a model triggers alterations in the behavior or characteristics of the system being modeled, consequently impacting the conditions on which the model's predictions rely.

In the case of a crime prediction model, if it achieves significant success and widespread recognition, it is conceivable that criminals may adapt their behavior in response to the predictions. They might opt to avoid the areas predicted by the model or alter their methods of operation, ultimately diminishing the model's accuracy or rendering it completely ineffective over time.

This scenario highlights the need for ongoing evaluation and adaptation of predictive models. As conditions change, models should be regularly updated and refined to account for evolving dynamics. It is crucial to monitor the model's performance and reassess its assumptions and variables to ensure it remains effective in the face of changing behaviors or circumstances.

Additionally, it is important to consider the ethical implications of using predictive models in sensitive areas such as crime prediction. Transparency, fairness, and the proper use of such models should be prioritized to prevent unintended consequences and potential harm.

In summary, the success of a model can indeed lead to changes in conditions, which may eventually render the model less accurate or obsolete. Continuous monitoring, adaptation, and ethical considerations are essential in mitigating these challenges and ensuring the ongoing usefulness of predictive models.

There are actually two different situations that can lead to model degradation: "model decay" and "concept drift." Model decay refers to the deterioration of a model's performance over time due to changing conditions or dynamics in the system being modeled. Concept drift, on the other hand, specifically refers to the phenomenon where the underlying concepts or relationships between variables in the data change over time, leading to a degradation in the model's predictive accuracy.

In the context of our example, the criminals' adaptation to the crime prediction model would be considered a form of concept drift. The model's assumptions about criminal behavior and the factors contributing to crime would no longer hold true as the criminals alter their behavior based on the model's predictions, leading to a decline in its accuracy.

Another example of concept drift is an email spam detector. If we want to create an email spam detector, and the data, i.e., the distribution of emails, does not change, but it is the target variable (spam email) that changes because spammers are using more sophisticated techniques to evade detection, that is also an example of concept drift. However, if the change is due to a change in the data distribution due to new e-mail programs that include better grammar and spell checks, and grammar and spell checks were being used by the model to detect spam, that would be an example of model decay because it is the data distribution that has changed.

An example of model decay for a model predicting where crime would happen is when the deterioration is not due to criminals changing their behavior but, for example, due to gentrification or other socio-economic changes in a neighborhood that drive crime to different areas.

Both model decay and concept drift highlight the need for continuous monitoring and adaptation of models to ensure their ongoing effectiveness in dynamic environments. Understanding which one we are dealing with can help us determine whether we need to re-label our data (as in the case of concept drift) or simply retrain it (as in the case of general model decay). In the case of a general model decay, we need to retrain the model, but we may not necessarily need to re-label the data since the deterioration is in a change of the data distribution over time that did not necessarily alter the underlying relationships between input features and the target variable. In the case of concept drift, however, the relation between the data and the labels has changed due to a change in the criminals' and/or spammers' behaviour, and therefore the labeling of the data may need to be modified.

To summarize, while model decay may require retraining the model and adapting it to changing patterns without necessarily re-labeling the data, concept drift may involve changes in the data that could require re-labeling or obtaining new labeled data. The specific steps to address concept drift or model decay can vary depending on the nature of the problem and the available resources.

Share The Intelligent Blog

The Intelligent Blog

Model Decay vs. Concept Drift: do we need to re-label our data to improve accuracy?

Discussion about this post