64th ISI World Statistics Congress

64th ISI World Statistics Congress

IPS 200 - Challenges of Natural Language Processing techniques in official statistics

Category: IPS
Tuesday 18 July 10 a.m. - noon (Canada/Eastern) (Expired) Room 104

View proposal detail

This session invites practitioners and researchers to present research on challenges related to the innovative application of natural language processing (NLP) techniques for the production of official statistics. Official statistics are traditionally produced using structured data, often produced by conducting surveys. In the past decades, these have been complemented with register and administrative data sources. More recently, by applying innovative data science techniques (e.g. NLP) on unstructured data (e.g. text) statistical offices are creating new, mostly experimental, statistics. There is more data available, and previously hard-to-analyze unstructured texts (web pages, social media, …) now become usable with use of NLP (amongst other techniques) in a timely and frequent way Furthermore, the field of NLP is rapidly evolving resulting in ever-increasing opportunities to extract information from unstructured texts. However, given the novelty of many NLP methods and underlying data sources, statistical institutes must be careful to ensure the quality of resulting statistics and hence need to invest in more systematic knowledge about their quality. Topics of interest in this session are, but are not limited to: Quality frameworks and quality metrics for NLP applications in official statistics; Applications of NLP on new data sources: social media data, web scraped data, ....;  Applications of Supervised classification of texts: sentiment analysis, automatic categorization of companies, product classification, ... ; Applications of Unsupervised knowledge extraction from texts and Visualisation of textual data.

The following speakers will shed light on these topics by presenting their work on applying NLP in official statistics: 

 

Statistics Canada (Roshanafshar Shirin), on “Classifying Respondent Comments from Canadian Census of Population”  

Central Bureau of Statistics, Netherlands (Piet Daas), on “Categorizing company websites” 

INEGI, Mexico (Jael Perez), on "Methodological proposal to codify records of occupation and economic activity of the National Survey of Household Income and Expenses (ENIGH), using Deep Learning". 

Statistics Flanders, Belgium (Michael Reusens), on “Challenges on using Twitter sentiment in official statistics” 

Statistics Poland (Dominik Dabrowski), on “Extracting meaningful information from web data on real estate – challenges and experiences from the Web Intelligence Network”  

 

Organiser: Dr Michael Reusens 

Chair: Dr Michael Reusens 

Speaker: Dr Cedric De Boom 

Speaker: PROF. DR. Piet Daas 

Speaker: Miss Shirin Roshanafshar 

Speaker: Klaudia Peszat

Speaker:  Jael Perez 

Good to know


For more details on registrations and submissions for the 64th ISI World Statistics Congress, please first login to your account. If you do not have an account then you can create one below:

  • X Cookies Policy

    We have placed cookies on your device to help make this website better.

    You can change your cookie settings in your web browser. Otherwise, we’ll assume you’re OK to continue.

    Some of the cookies we use are essential for the site to work.

    We also use some non-essential cookies to collect information for making reports and to help us improve the site. The cookies collect information in an anonymous form.

    To control third party cookies, you can also adjust your browser settings.

    Do Not Accept Third Party Cookies
    I'm fine with this