65th ISI World Statistics Congress 2025 | The Hague

65th ISI World Statistics Congress 2025 | The Hague

Data-based decision trees as a first introduction to machine learning

Conference

65th ISI World Statistics Congress 2025 | The Hague

Format: IPS Abstract - WSC 2025

Keywords: data science education, decision-tree, machine learning

Abstract

Data-based decision trees as a first introduction to machine learning

Yannik Fleischer, Rolf Biehler

Data science, artificial intelligence (AI), and machine learning (ML) affect everyone’s lives. A key driver is datafication, which turns aspects of various areas of life into data. For instance, user behavior on online platforms is recorded as digital data that fuels AI-driven recommender systems that are based on ML. These systems give online platforms substantial influence, especially on adolescents. This influence meets various misconceptions about AI found by Kim et al. (2023). Some students’ perception of AI is to think it can solve “problems ‘magically’ through its intelligence” (Kim et al., 2023, p. 9835) or “that AI is flawless and complete without human interventions and input from the data” (Kim et al., 2023, p. 9838). The calls for incorporating data science topics in school education (e.g., Engel, 2017) are more and more heard, with AI and ML being increasingly taken up in innovative projects and in some curricula. The International Data Science in Schools Project (IDSSP) Curriculum (IDSSP Curriculum team, 2019) explicitly describes ML and especially data-based decision trees (DTs) as a topic for all students. Data-based DTs are used to exemplify ML, and they are seen as teachable even to non-experts and young students (Erickson & Engel, 2023). Since this is still a new topic, a didactic transposition (Chevallard & Bosch, 2014) is necessary to transform the scholarly knowledge about the topic into something that is teachable in school.

We designed teaching units for teaching data based DTs in secondary education to exemplify machine learning. The design contains:
• an elementarized form of creating DTs based on data that preserves the main aspects of professional DT algorithms (e. g. Breiman et al., 1984; Quinlan, 1993) but reduces possible obstacles for students in learning,
• appropriate (digital) tools for teaching in secondary level,
• and pedagogical approaches for designing concrete teaching sequences.
We present parts of our design and selected results of our accompanying classroom research.

References

Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Taylor & Francis. https://doi.org/10.1201/9781315139470

Chevallard, Y., & Bosch, M. (2014). Didactic Transposition in Mathematics Education. In S. Lerman (Ed.), Encyclopedia of Mathematics Education (pp. 170–174). Springer Netherlands.

Engel, J. (2017). Statistical Literacy for Active Citizenship: A Call for Data Science Education. Statistics Education Research Journal, 16(1), 44–49. https://doi.org/10.52041/serj.v16i1.213

Erickson, T., & Engel, J. (2023). What goes before the CART? Introducing classification trees with Arbor and CODAP. Teaching Statistics, 45(S1). https://doi.org/10.1111/test.12347

IDSSP Curriculum team. (2019). Curriculum Frameworks for Introductory Data Science. http://idssp.org/files/IDSSP_Frameworks_1.0.pdf

Kim, K., Kwon, K., Ottenbreit-Leftwich, A., Bae, H., & Glazewski, K. (2023). Exploring middle school students’ common naive conceptions of Artificial Intelligence concepts, and the evolution of these ideas. Education and Information Technologies, 28(8), 9827–9854. https://doi.org/10.1007/s10639-023-11600-3

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc.