» Congress Schedule
In one overview: The WSC Scientific & Special Programme.
Data science has a great and increasing importance in several branches of statistics using large data sets and new data sources, e.g., administrative registers, satellites and aircrafts, webcams, data voluntarily provided by internet users, data harvested from the web and so on. The analysis and elaboration of these kinds of data require the use of data science methods and tools besides “traditional” statistical methods. The applications of data science tools range from earth observation to official statistics, and the discussion on advantages, disadvantages, limitations, and requirements of the use of alternative data sources integrated with probability sample surveys is informing the debate in national and international statistical systems all over the world.
This Invited Paper Session (IPS) focuses on most relevant methodological and applied issues of data science: interpretability of machine learning tools, potential bias, integration of new data sources with sample surveys for improving official statistics, analysis of huge amounts of meteorological and remote sensing data.
This IPS is proposed by the vice-chair and chair-elect of the ISI Special Interest Group on Data Science, discusses methodological and applied issues, and is balanced from geographical and gender point of view.
Tree-based statistical learning techniques and explicative tools
Speaker: Rosanna Verde, Professor of Statistics - Università della Campania "Luigi Vanvitelli", Italy
Abstract. Machine Learning tools are very popular in the field of supervised classification when the number of observations and the number of variables is too large to predict a priori classes. However, there is a strong automatism in the classification process which represents a challenge of the widely consolidated techniques. An interesting contribution could certainly be to provide interpretative and descriptive tools, which in addition to the accuracy of the prediction, allow us to understand the discriminating power of the selected descriptors as the most competing in the construction of trees. For this reason, a criterion of recognition of the predictors that most contribute to the separation of the a priori groups, should be combined with an embedding procedure that seeks multiple solutions and a final compromise. Aids to the interpretation of the tree-based functional classifiers is still an open frontier. Some contributions are advanced in the choice of the best transformation of functional data to grasp the differences between the classes to be predicted in terms of slope or changing rates. Applications on real data, in the medical and environmental fields, allow to validate the proposals, related to the interpretative tools in the classification methods based on trees.
Mining Text for Bias in Written Comments of Student Evaluations of Teaching
Speaker: Daniel Jeske, University of California, Riverside (USA)
Philip Kass, University of California, Davis
Herbie Lee, University of California, Santa Cruz
Dylan Friel, University of California, Riverside
Yunzhe Li, Univiersity of California, Santa Curz
Abstract. We discuss alternative predictive models that efficiently scan written course comments and determine the proportions of comments that reflect student satisfaction levels that are positive, mixed, or negative. We use the predictive model to investigate the degree of potential bias in written comments with respect to the gender, ethnicity, and rank of the instructor, and compare the findings to parallel bias studies of the corresponding numerical scores.
Evolving Official Statistics: The Increasingly Varied Role of Data Science
Speaker: Linda J. Young, Chief Mathematical Statistician and Director Research and Development Division, USDA NASS
Abstract. Sample surveys have been the foundation of official statistics produced by the US Department of Agriculture’s National Agricultural Statistics Service (NASS) and other National Statistical Institutes for more than half a century. Increasingly, information from diverse sources, such as administrative, weather, and remotely sensed data, is available and can be used to improve fully survey-based estimates. In addition, new products that inform official statistics can be developed, such as new metrics or maps of the scope and intensity of natural disasters. In this presentation, data science approaches that are being used in the production of official statistics are highlighted. Estimates of the propensity of response from a sampled unit have been incorporated in the sampling and data collection phases of surveys. Predictions of what crops will be grown where can inform editing processes. Survey and non-survey data have been combined through modeling to produce improved official statistics. The progress that has been made and important research questions that remain are discussed.
Spatio-temporal modelling of the Brazilian wildfires: The influence of human and meteorological variables
Speaker: Paulo Canas Rodrigues, Department of Statistics, Federal University of Bahia, Salvador, BA, Brazil
Abstract: Wildfires are one of the most common natural disasters in many world regions and actively impact life quality. These events have become frequent with the increasing effect of climate change and other local policies and human behaviour. This study considers the historical data with the geographical locations of all the ``fire spots'' detected by the reference satellites that cover the whole Brazilian territory between January 2011 and December 2020, comprising more than 1.8 million fire spots. This data was modelled with a spatial econometric model using meteorological variables (precipitation, air temperature, humidity, and wind speed) and a human variable (land-use transition and occupation) as covariates. We find that the change in land use from forest and green areas to farming has a significant positive impact on the number of fire spots for all six Brazilian biomes. (Joint work with Jonatha Pimentel and Rodrigo Bulhões)
Statistical Modelling alternatives to Machine Learning in complex survey data analysis
Speaker: Ross Darnell, Data61 CSIRO
Murray Aitkin, School of Mathematics and Statistics, University of Melbourne
Discussant: Elisabetta Carfagna, University of Bologna, Department of Statistical Sciences, Italy
Organiser: Prof. Elisabetta Carfagna
Chair: Prof. Elisabetta Carfagna
Speaker: Dr Paulo Canas Rodrigues
Speaker: Prof. Rosanna Verde
Speaker: Dr Daniel Jeske
Speaker: Ross Darnell
Speaker: Linda Young
Discussant: Elisabetta Carfagna
For more details on registrations and submissions for the 64th ISI World Statistics Congress, please first login to your account. If you do not have an account then you can create one below: