Integration of DEA-Based Inefficiency into Decision Tree Splitting Criteria
Abstract
This paper introduces a modified decision tree methodology that incorporates efficiency evaluation through Data Envelopment Analysis (DEA). Standard decision trees rely on class distribution metrics such as the Gini index or Entropy, overlooking the performance characteristics of individual observations. In contrast, the proposed approach calculates inefficiency scores for each Decision-Making Unit (DMU) based on input excesses and output shortfalls. It uses these scores to weight samples during tree construction. This integration allows the model to achieve accurate classification while simultaneously highlighting units with poor performance. Results indicate that the DEA-informed decision tree effectively captures inefficiency patterns and provides actionable insights for performance improvement, while maintaining predictive accuracy.
Keywords:
Machine learning, Decision tree, Data envelopment analysisReferences
- [1] Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European journal of operational research, 2(6), 429–444. https://doi.org/10.1016/0377-2217(78)90138-8
- [2] Cook, W. D. (2001). Data envelopment analysis: A comprehensive text with models, applications, references and DEA-solver software. JSTOR. https://doi.org/10.1007/978-0-387-45283-8%0A%0A
- [3] Chen, Y., & Zhu, J. (2004). Measuring information technology’s indirect impact on firm performance. Information technology and management, 5(1), 9–22. https://doi.org/10.1023/B:ITEM.0000008075.43543.97
- [4] Azadeh, A., Ghaderi, S. F., & Sohrabkhani, S. (2008). A simulated-based neural network algorithm for forecasting electrical energy consumption in Iran. Energy policy, 36(7), 2637–2644. https://doi.org/10.1016/j.enpol.2008.02.035
- [5] Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management science, 30(9), 1078–1092. https://doi.org/10.1287/mnsc.30.9.1078
- [6] Daraio, C., & Simar, L. (2005). Introducing environmental variables in nonparametric frontier models: a probabilistic approach. Journal of productivity analysis, 24, 93–121. https://doi.org/10.1007/s11123-005-3042-8
- [7] Daouia, A., Noh, H., & Park, B. U. (2014). Data envelope fitting with constrained polynomial splines. Journal of the royal statistical society series b: statistical methodology, 78(1), 3–30. https://doi.org/10.1111/rssb.12098
- [8] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
- [9] Breiman, L. (2001). Using iterated bagging to debias regressions. Machine learning, 45(3), 261–277. https://doi.org/10.1023/A:1017934522171
- [10] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD ’16: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785