Data-based decision trees as a first introduction to machine learning
Conference
65th ISI World Statistics Congress 2025
Format: IPS Abstract - WSC 2025
Keywords: data science education, decision-tree, machine learning
Thursday 9 October 2 p.m. - 3:40 p.m. (Europe/Amsterdam)
Abstract
Data-based decision trees as a first introduction to machine learning
Yannik Fleischer, Rolf Biehler
Data science, artificial intelligence (AI), and machine learning (ML) affect everyone’s lives. A key driver is datafication, which turns aspects of various areas of life into data. For instance, user behavior on online platforms is recorded as digital data that fuels AI-driven recommender systems that are based on ML. These systems give online platforms substantial influence, especially on adolescents. This influence meets various misconceptions about AI found by Kim et al. (2023). Some students’ perception of AI is to think it can solve “problems ‘magically’ through its intelligence” (Kim et al., 2023, p. 9835) or “that AI is flawless and complete without human interventions and input from the data” (Kim et al., 2023, p. 9838). The calls for incorporating data science topics in school education (e.g., Engel, 2017) are more and more heard, with AI and ML being increasingly taken up in innovative projects and in some curricula. The International Data Science in Schools Project (IDSSP) Curriculum (IDSSP Curriculum team, 2019) explicitly describes ML and especially data-based decision trees (DTs) as a topic for all students. Data-based DTs are used to exemplify ML, and they are seen as teachable even to non-experts and young students (Erickson & Engel, 2023). Since this is still a new topic, a didactic transposition (Chevallard & Bosch, 2014) is necessary to transform the scholarly knowledge about the topic into something that is teachable in school.
We designed teaching units for teaching data based DTs in secondary education to exemplify machine learning. The design contains:
• an elementarized form of creating DTs based on data that preserves the main aspects of professional DT algorithms (e. g. Breiman et al., 1984; Quinlan, 1993) but reduces possible obstacles for students in learning,
• appropriate (digital) tools for teaching in secondary level,
• and pedagogical approaches for designing concrete teaching sequences.
We present parts of our design and selected results of our accompanying classroom research.
References
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Taylor & Francis. https://doi.org/10.1201/9781315139470
Chevallard, Y., & Bosch, M. (2014). Didactic Transposition in Mathematics Education. In S. Lerman (Ed.), Encyclopedia of Mathematics Education (pp. 170–174). Springer Netherlands.
Engel, J. (2017). Statistical Literacy for Active Citizenship: A Call for Data Science Education. Statistics Education Research Journal, 16(1), 44–49. https://doi.org/10.52041/serj.v16i1.213
Erickson, T., & Engel, J. (2023). What goes before the CART? Introducing classification trees with Arbor and CODAP. Teaching Statistics, 45(S1). https://doi.org/10.1111/test.12347
IDSSP Curriculum team. (2019). Curriculum Frameworks for Introductory Data Science. http://idssp.org/files/IDSSP_Frameworks_1.0.pdf
Kim, K., Kwon, K., Ottenbreit-Leftwich, A., Bae, H., & Glazewski, K. (2023). Exploring middle school students’ common naive conceptions of Artificial Intelligence concepts, and the evolution of these ideas. Education and Information Technologies, 28(8), 9827–9854. https://doi.org/10.1007/s10639-023-11600-3
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.