Publications

See also the Google Scholar profiles of prof. Chris Develder or prof. Thomas Demeester. If you have trouble obtaining a copy of one of the papers below, please get in touch via chris.develder@ugent.be.

Published papers

pubinproceedings

J. De Baer, A.S. Doğruöz, T. Demeester and C. Develder, "Single- vs. dual-prompt dialogue generation with LLMs for job interviews in human Resources", in Proc. 4th Generation Evaluation & Metrics (GEM) workshop at ACL 2025, Vienna, Austria, 31 Jul. 2025, pp. 947-957.

Single- vs. dual-prompt dialogue generation with LLMs for job interviews in human Resources

J. De Baer, A.S. Doğruöz, T. Demeester and C. Develder


in Proc. 4th Generation Evaluation & Metrics (GEM) workshop at ACL 2025, Vienna, Austria, 31 Jul. 2025, pp. 947-957.

Optimizing language models for use in conversational agents requires large quantities of example dialogues. Increasingly, these dialogues are synthetically generated by using powerful large language models (LLMs), especially in domains with challenges to obtain authentic human data. One such domain is human resources (HR). In this context, we compare two LLM-based dialogue generation methods for the use case of generating HR job interviews, and assess whether one method generates higher-quality dialogues that are more challenging to distinguish from genuine human discourse. The first method uses a single prompt to generate the complete interview dialog. The second method uses two agents that converse with each other. To evaluate dialogue quality under each method, we ask a judge LLM to determine whether AI was used for interview generation, using pairwise interview comparisons. We demonstrate that despite a sixfold increase in token cost, interviews generated with the dual-prompt method achieve a win rate up to ten times higher than those generated with the single-prompt method. This difference remains consistent regardless of whether GPT-4o or Llama 3.3 70B is used for either interview generation or judging quality.

Single- vs. dual-prompt dialogue generation with LLMs for job interviews in human Resources

J. De Baer, A.S. Doğruöz, T. Demeester and C. Develder


in Proc. 4th Generation Evaluation & Metrics (GEM) workshop at ACL 2025, Vienna, Austria, 31 Jul. 2025, pp. 947-957.

@inproceedings{debaer2025,
author = {De Baer, Joachim and Doğruöz, A. Seza and Demeester, Thomas and Develder, Chris},
title = {Single- vs. dual-prompt dialogue generation with LLMs for job interviews in human Resources},
booktitle = {Proc. 4th Generation Evaluation & Metrics (GEM) workshop at ACL 2025},
month = {31 Jul.},
year = {2025},
pages = {947--957},
address = {Vienna, Austria},
url = {https://aclanthology.org/2025.gem-1.74/}
}

pubarticle

J.-J. Decorte, J. Van Hautte, C. Develder and T. Demeester, "Efficient text encoders for labor market analysis", IEEE Access, Jul. 2025, pp. 133596-133608.

Efficient text encoders for labor market analysis

J.-J. Decorte, J. Van Hautte, C. Develder and T. Demeester


IEEE Access, Jul. 2025, pp. 133596-133608.

Labor market analysis relies on extracting insights from job advertisements, which provide valuable yet unstructured information on job titles and corresponding skill requirements. While state-of-the-art methods for skill extraction achieve strong performance, they depend on large language models (LLMs), which are computationally expensive and slow. In this paper, we propose ConTeXT-match, a novel contrastive learning approach with token-level attention that is well-suited for the extreme multi-label classification task of skill classification. ConTeXT-match significantly improves skill extraction efficiency and performance, achieving state-of-the-art results with a lightweight bi-encoder model. To support robust evaluation, we introduce Skill-XL a new benchmark with exhaustive, sentence-level skill annotations that explicitly address the redundancy in the large label space. Finally, we present JobBERT V2, an improved job title normalization model that leverages extracted skills to produce high-quality job title representations. Experiments demonstrate that our models are efficient, accurate, and scalable, making them ideal for large-scale, real-time labor market analysis.

Efficient text encoders for labor market analysis

J.-J. Decorte, J. Van Hautte, C. Develder and T. Demeester


IEEE Access, Jul. 2025, pp. 133596-133608.

@article{decorte2025access,
author = {Decorte, Jens-Joris and Van Hautte, Jeroen and Develder, Chris and Demeester, Thomas},
title = {Efficient text encoders for labor market analysis},
journal = {IEEE Access},
month = {Jul.},
year = {2025},
pages = {133596--133608},
doi = {10.1109/ACCESS.2025.3589147}
}

pubinproceedings

F. Koulischer, J. Deleu, G. Raya, T. Demeester and L. Ambrogioni, "Dynamic negative guidance of diffusion models", in Proc. 13th Int. Conf. Learning Representations (ICLR 2025), Singapore, 24-28 Apr. 2025.

Dynamic negative guidance of diffusion models

F. Koulischer, J. Deleu, G. Raya, T. Demeester and L. Ambrogioni


in Proc. 13th Int. Conf. Learning Representations (ICLR 2025), Singapore, 24-28 Apr. 2025.

Negative Prompting (NP) is widely utilized in diffusion models, particularly in text-to-image applications, to prevent the generation of undesired features. In this paper, we show that conventional NP is limited by the assumption of a constant guidance scale, which may lead to highly suboptimal results, or even complete failure, due to the non-stationarity and state-dependence of the reverse process. Based on this analysis, we derive a principled technique called Dynamic Negative Guidance, which relies on a near-optimal time and state dependent modulation of the guidance without requiring additional training. Unlike NP, negative guidance requires estimating the posterior class probability during the denoising process, which is achieved with limited additional computational overhead by tracking the discrete Markov Chain during the generative process. We evaluate the performance of DNG class-removal on MNIST and CIFAR10, where we show that DNG leads to higher safety, preservation of class balance and image quality when compared with baseline methods. Furthermore, we show that it is possible to use DNG with Stable Diffusion to obtain more accurate and less invasive guidance than NP.

Dynamic negative guidance of diffusion models

F. Koulischer, J. Deleu, G. Raya, T. Demeester and L. Ambrogioni


in Proc. 13th Int. Conf. Learning Representations (ICLR 2025), Singapore, 24-28 Apr. 2025.

@inproceedings{koulischer2025iclr,
author = {Koulischer, Felix and Deleu, Johannes and Raya, Gabriel and Demeester, Thomas and Ambrogioni, Luca},
title = {Dynamic negative guidance of diffusion models},
booktitle = {Proc. 13th Int. Conf. Learning Representations (ICLR 2025)},
month = {24--28 Apr.},
year = {2025},
address = {Singapore},
url = {https://openreview.net/forum?id=6p74UyAdLa}
}

pubarticle

K. D'Oosterlinck, W. Xu, C. Develder, T. Demeester, A. Singh, C. Potts, D. Kiela and S. Mehri, "Anchored preference optimization and contrastive revisions: addressing underspecification in alignment", Trans. Assoc. Comput. Linguist., April 2025, pp. 442-460.

Anchored preference optimization and contrastive revisions: addressing underspecification in alignment

K. D'Oosterlinck, W. Xu, C. Develder, T. Demeester, A. Singh, C. Potts, D. Kiela and S. Mehri


Trans. Assoc. Comput. Linguist., April 2025, pp. 442-460.

Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment objectives lead to better performance when they specify more control over the model during training. Based on these insights, we introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs, and Anchored Preference Optimization (APO), a controllable and more stable alignment objective. We align Llama-3-8B-Instruct using various comparable datasets and alignment objectives and measure MixEval-Hard scores, which correlate highly with human judgments. The CLAIR preferences lead to the strongest performance out of all datasets, and APO consistently outperforms less controllable objectives. Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%. Our code and datasets are available.

Anchored preference optimization and contrastive revisions: addressing underspecification in alignment

K. D'Oosterlinck, W. Xu, C. Develder, T. Demeester, A. Singh, C. Potts, D. Kiela and S. Mehri


Trans. Assoc. Comput. Linguist., April 2025, pp. 442-460.

@article{DOosterlinck2025,
author = {D'Oosterlinck, Karel and Xu, Winnie and Develder, Chris and Demeester, Thomas and Singh, Amanpreet and Potts, Christopher and Kiela, Douwe and Mehri, Shikib},
title = {Anchored preference optimization and contrastive revisions: addressing underspecification in alignment},
journal = {Trans. Assoc. Comput. Linguist.},
month = {April},
year = {2025},
pages = {442--460},
doi = {10.1162/tacl_a_00748}
}

pubarticle

M. De Raedt, F. Godin, C. Develder and T. Demeester, "Revisiting clustering for efficient unsupervised dialogue structure induction", Appl. Intell., Vol. 54, Apr. 2024, pp. 5278-5305.

Revisiting clustering for efficient unsupervised dialogue structure induction

M. De Raedt, F. Godin, C. Develder and T. Demeester


Appl. Intell., Vol. 54, Apr. 2024, pp. 5278-5305.

In the development of a task-oriented dialogue system, defining the dialogue structure is a time-consuming task. Hence, several works have looked into automatically inferring it from data, e.g., actual conversations between a customer and a support agent. To recover such dialogue structure, recent methods based on discrete variational models learn to jointly encode and cluster utterances in dialogue states, but (i) represent utterances by only considering preceding dialogue context, and (ii) are slow to train since they are optimized with a compute-expensive decoding objective. We revisit and improve upon an existing efficient pipeline approach, commonly adopted as a baseline, that first encodes utterances and then clusters them with k-means to induce the dialogue structure. However, the existing approach represents utterances as bag-of-words or skip-thought vectors, which have been shown to perform poorly in semantic similarity tasks, and without considering dialogue context. We therefore first investigate the use of more powerful transformer-based encoders for encoding utterances. Next, we propose ELLoDAR, a method for learning representations that capture both preceding and subsequent dialogue context, inspired by word-to-vec training strategies. ELLoDAR is efficient since representations are learned directly in the encoding space by finetuning just a single linear layer on top of a frozen sentence encoder with a vector-to-vector regression training objective. Extensive experiments on representative datasets for dialogue structure induction (SimDial, Schema Guided Dialogues, DSTC2, and CamRest676) demonstrate that in terms of effectiveness to induce the correct dialogue structure, (i) clustering utterances represented by transformed-based encoders improves recent joint models by 13%–32% on standard cluster metrics, and (ii) clustering ELLoDAR’s representations yields additional improvements ranging from +20% to +26%, with speedups of x10-10,000 compared to the recent joint models.

Revisiting clustering for efficient unsupervised dialogue structure induction

M. De Raedt, F. Godin, C. Develder and T. Demeester


Appl. Intell., Vol. 54, Apr. 2024, pp. 5278-5305.

@article{deraedt2024apin,
author = {De Raedt, Maarten and Godin, Fréderic and Develder, Chris and Demeester, Thomas},
title = {Revisiting clustering for efficient unsupervised dialogue structure induction},
journal = {Appl. Intell.},
month = {Apr.},
year = {2024},
volume = {54},
pages = {5278--5305},
doi = {10.1007/s10489-024-05455-5}
}

pubarticle

P. Rabaey, P. Decat, S. Heytens, D. Vogelaers, A. Mariman and T. Demeester, "Time-dependent complexity characterisation of activity patterns in patients with Chronic Fatigue Syndrome", BioPsychoSoc. Med., Vol. 18, No. 1, Apr. 2024, pp. 10.

Time-dependent complexity characterisation of activity patterns in patients with Chronic Fatigue Syndrome

P. Rabaey, P. Decat, S. Heytens, D. Vogelaers, A. Mariman and T. Demeester


BioPsychoSoc. Med., Vol. 18, No. 1, Apr. 2024, pp. 10.


Background — Chronic Fatigue Syndrome patients suffer from symptoms that cannot be explained by a single underlying biological cause. It is sometimes claimed that these symptoms are a manifestation of a disrupted autonomic nervous system. Prior works studying this claim from the complex adaptive systems perspective, have observed a lower average complexity of physical activity patterns in chronic fatigue syndrome patients compared to healthy controls. To further study the robustness of such methods, we investigate the within-patient changes in complexity of activity over time. Furthermore, we explore how these changes might be related to changes in patient functioning.
Methods — We propose an extension of the allometric aggregation method, which characterises the complexity of a physiological signal by quantifying the evolution of its fractal dimension. We use it to investigate the temporal variations in within-patient complexity. To this end, physical activity patterns of 7 patients diagnosed with chronic fatigue syndrome were recorded over a period of 3 weeks. These recordings are accompanied by physicians’ judgements in terms of the patients’ weekly functioning.
Results — We report significant within-patient variations in complexity over time. The obtained metrics are shown to depend on the range of timescales for which these are evaluated. We were unable to establish a consistent link between complexity and functioning on a week-by-week basis for the majority of the patients.
Conclusions — The considerable within-patient variations of the fractal dimension across scales and time force us to question the utility of previous studies that characterise long-term activity signals using a single static complexity metric. The complexity of a Chronic Fatigue Syndrome patient’s physical activity signal does not suffice to characterise their high-level functioning over time and has limited potential as an objective monitoring metric by itself.

Time-dependent complexity characterisation of activity patterns in patients with Chronic Fatigue Syndrome

P. Rabaey, P. Decat, S. Heytens, D. Vogelaers, A. Mariman and T. Demeester


BioPsychoSoc. Med., Vol. 18, No. 1, Apr. 2024, pp. 10.

@article{rabaey2024biomed,
author = {Paloma Rabaey and Peter Decat and Stefan Heytens and Dirk Vogelaers and Ann Mariman and Thomas Demeester},
title = {Time-dependent complexity characterisation of activity patterns in patients with Chronic Fatigue Syndrome},
journal = {BioPsychoSoc. Med.},
month = {Apr.},
year = {2024},
volume = {18},
number = {1},
pages = {10},
doi = {10.1186/s13030-024-00305-9}
}

pubarticle

S.K. Bitew, V. Schelstraete, K. Zaporojets, K. Van Nieuwenhove, R. Meganck and C. Develder, "Personality style recognition via machine learning: Identifying anaclitic and introjective personality styles from patients’ speech", Comput. Linguist. Netherlands J., Vol. 13, Mar. 2024, pp. 7-29.

Personality style recognition via machine learning: Identifying anaclitic and introjective personality styles from patients’ speech

S.K. Bitew, V. Schelstraete, K. Zaporojets, K. Van Nieuwenhove, R. Meganck and C. Develder


Comput. Linguist. Netherlands J., Vol. 13, Mar. 2024, pp. 7-29.

In disentangling the heterogeneity observed in psychopathology, personality of the patients is considered crucial. While it has been demonstrated that personality traits are reflected in the language used by a patient, we hypothesize that this enables automatic inference of the personality type directly from speech utterances, potentially more accurately than through a traditional questionnaire-based approach explicitly designed for personality classification. To validate this hypothesis, we adopt natural language processing (NLP) and standard machine learning tools for classification. We test this on a dataset of recorded clinical diagnostic interviews (CDI) on a sample of 79 patients diagnosed with major depressive disorder (MDD) – a condition for which differentiated treatment based on personality styles has been advocated – and classified into anaclitic and introjective personality styles. We start by analyzing the interviews to see which linguistic features are associated with each style, in order to gain a better understanding of the styles. Then, we develop automatic classifiers based on (a) standardized questionnaire responses; (b) basic text features, i.e., TF-IDF scores of words and word sequences; (c) more advanced text features, using LIWC (linguistic inquiry and word count) and context-aware features using BERT (bidirectional encoder representations from transformers); (d) audio features. We find that automated classification with language-derived features (i.e., based on LIWC) significantly outperforms questionnaire-based classification models. Furthermore, the best performance is achieved by combining LIWC with the questionnaire features. This suggests that more work should be put into developing linguistically based automated techniques for characterizing personality, however questionnaires still to some extent complement such methods.

Personality style recognition via machine learning: Identifying anaclitic and introjective personality styles from patients’ speech

S.K. Bitew, V. Schelstraete, K. Zaporojets, K. Van Nieuwenhove, R. Meganck and C. Develder


Comput. Linguist. Netherlands J., Vol. 13, Mar. 2024, pp. 7-29.

@article{bitew2024clin,
author = {Bitew, Semere Kiros and Schelstraete, Vincent and Zaporojets, Klim and Van Nieuwenhove, Kimberly and Meganck, Reitske and Develder, Chris},
title = {Personality style recognition via machine learning: Identifying anaclitic and introjective personality styles from patients’ speech},
journal = {Comput. Linguist. Netherlands J.},
month = {Mar.},
year = {2024},
volume = {13},
pages = {7--29},
url = {https://clinjournal.org/clinj/article/view/169}
}

pubarticle

Y. Jiang, M. De Raedt, J. Deleu, T. Demeester and C. Develder, "Few-shot out-of-scope intent classification: Analyzing the robustness of prompt-based learning", Appl. Intell., Vol. 54, Jan. 2024, pp. 1474-1496.

Few-shot out-of-scope intent classification: Analyzing the robustness of prompt-based learning

Y. Jiang, M. De Raedt, J. Deleu, T. Demeester and C. Develder


Appl. Intell., Vol. 54, Jan. 2024, pp. 1474-1496.

Out-of-scope (OOS) intent classification is an emerging field in conversational AI research. The goal is to detect out-of-scope user intents that do not belong to a predefined intent ontology. However, establishing a reliable OOS detection system is challenging due to limited data availability. This situation necessitates solutions rooted in few-shot learning techniques. For such few-shot text classification tasks, prompt-based learning has been shown more effective than conventionally finetuned large language models with a classification layer on top. Thus, we advocate for exploring prompt-based approaches for OOS intent detection. Additionally, we propose a new evaluation metric, the Area Under the In-scope and Out-of-Scope Characteristic curve (AU-IOC). This metric addresses the shortcomings of current evaluation standards for OOS intent detection. AU-IOC provides a comprehensive assessment of a model’s dual performance capacities: in-scope classification accuracy and OOS recall. Under this new evaluation method, we compare our prompt-based OOS detector against 3 strong baseline models by exploiting the metadata of intent annotations, i.e., intent description. Our study found that our prompt-based model achieved the highest AU-IOC score across different data regimes. Further experiments showed that our detector is insensitive to a variety of intent descriptions. An intriguing finding shows that for extremely low data settings (1- or 5-shot), employing a naturally phrased prompt template boosts the detector’s performance compared to rather artificially structured template patterns.

Few-shot out-of-scope intent classification: Analyzing the robustness of prompt-based learning

Y. Jiang, M. De Raedt, J. Deleu, T. Demeester and C. Develder


Appl. Intell., Vol. 54, Jan. 2024, pp. 1474-1496.

@article{jiang2024oos,
author = {Jiang, Yiwei and De Raedt, Maarten and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Few-shot out-of-scope intent classification: Analyzing the robustness of prompt-based learning},
journal = {Appl. Intell.},
month = {Jan.},
year = {2024},
volume = {54},
pages = {1474--1496},
doi = {10.1007/s10489-023-05215-x}
}

pubinproceedings

M. De Raedt, S.K. Bitew, F. Godin, T. Demeester and C. Develder, "Zero-shot cross-lingual sentiment classification under distribution shift: An exploratory study", in Proc. 3rd Multiling. Represent. Learn. Workshop (MRL 2023) at EMNLP 2023, Singapore, 7 Dec. 2023, pp. 5--66.

Zero-shot cross-lingual sentiment classification under distribution shift: An exploratory study

M. De Raedt, S.K. Bitew, F. Godin, T. Demeester and C. Develder


in Proc. 3rd Multiling. Represent. Learn. Workshop (MRL 2023) at EMNLP 2023, Singapore, 7 Dec. 2023, pp. 5--66.

The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.

Zero-shot cross-lingual sentiment classification under distribution shift: An exploratory study

M. De Raedt, S.K. Bitew, F. Godin, T. Demeester and C. Develder


in Proc. 3rd Multiling. Represent. Learn. Workshop (MRL 2023) at EMNLP 2023, Singapore, 7 Dec. 2023, pp. 5--66.

@inproceedings{deraedt2023mrl,
author = {De Raedt, Maarten and Bitew, Semere Kiros and Godin, Fréderic and Demeester, Thomas and Develder, Chris},
title = {Zero-shot cross-lingual sentiment classification under distribution shift: An exploratory study},
booktitle = {Proc. 3rd Multiling. Represent. Learn. Workshop (MRL 2023) at EMNLP 2023},
month = {7 Dec.},
year = {2023},
pages = {5---66},
address = {Singapore},
doi = {10.18653/v1/2023.mrl-1.5}
}

pubinproceedings

K. D'Oosterlinck, T. Demeester, C. Develder and C. Potts, "Flexible model interpretability through natural language model editing", in Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023, Singapore, 2023.

Flexible model interpretability through natural language model editing

K. D'Oosterlinck, T. Demeester, C. Develder and C. Potts


in Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023, Singapore, 2023.

Model interpretability and model editing are crucial goals in the age of large language models. Interestingly, there exists a link between these two goals: if a method is able to systematically edit model behavior with regard to a human concept of interest, this editor method can help make internal representations more interpretable by pointing towards relevant representations and systematically manipulating them.

Flexible model interpretability through natural language model editing

K. D'Oosterlinck, T. Demeester, C. Develder and C. Potts


in Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023, Singapore, 2023.

@inproceedings{doosterlinck2023blackboxnlp,
author = {D'Oosterlinck, Karel and Demeester, Thomas and Develder, Chris and Potts, Christopher},
title = {Flexible model interpretability through natural language model editing},
booktitle = {Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023},
year = {2023},
address = {Singapore}
}

pubinproceedings

K. D'Oosterlinck, S.K. Bitew, B. Papineau, C. Potts, T. Demeester and C. Develder, "CAW-coref: Conjunction-aware word-level coreference resolution", in Proc. 6th Workshop Comput. Models Ref. Anaphora and Coref. (CRAC 2023) at EMNLP 2023, 6-7 Dec. 2023.

CAW-coref: Conjunction-aware word-level coreference resolution

K. D'Oosterlinck, S.K. Bitew, B. Papineau, C. Potts, T. Demeester and C. Develder


in Proc. 6th Workshop Comput. Models Ref. Anaphora and Coref. (CRAC 2023) at EMNLP 2023, 6-7 Dec. 2023.

State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e.g., information extraction with large corpora). The leading word-level coreference system (WL-coref) attains 96.6% of these SOTA systems' performance while being much more efficient. In this work, we identify a routine yet important failure case of WL-coref: dealing with conjoined mentions such as 'Tom and Mary'. We offer a simple yet effective solution that improves the performance on the OntoNotes test set by 0.9% F1, shrinking the gap between efficient word-level coreference resolution and expensive SOTA approaches by 34.6%. Our Conjunction-Aware Word-level coreference model (CAW-coref) and code is available at https://github.com/KarelDO/wl-coref.

CAW-coref: Conjunction-aware word-level coreference resolution

K. D'Oosterlinck, S.K. Bitew, B. Papineau, C. Potts, T. Demeester and C. Develder


in Proc. 6th Workshop Comput. Models Ref. Anaphora and Coref. (CRAC 2023) at EMNLP 2023, 6-7 Dec. 2023.

@inproceedings{doosterlinck2023crac,
author = {Karel D'Oosterlinck and Semere Kiros Bitew and Brandon Papineau and Christopher Potts and Thomas Demeester and Chris Develder},
title = {CAW-coref: Conjunction-aware word-level coreference resolution},
booktitle = {Proc. 6th Workshop Comput. Models Ref. Anaphora and Coref. (CRAC 2023) at EMNLP 2023},
month = {6--7 Dec.},
year = {2023},
doi = {10.18653/v1/2023.crac-main.2}
}

pubinproceedings

K. D'Oosterlinck, F. Remy, J. Deleu, T. Demeester, C. Develder, K. Zaporojets, A. Ghodsi, S. Ellershaw, J. Collins and C. Potts, "BioDEX: Large-scale biomedical adverse drug event extraction for real-world pharmacovigilance", in Findings of the ACL: EMNLP 2023, Singapore, Dec. 2023, pp. 13425–13454.

BioDEX: Large-scale biomedical adverse drug event extraction for real-world pharmacovigilance

K. D'Oosterlinck, F. Remy, J. Deleu, T. Demeester, C. Develder, K. Zaporojets, A. Ghodsi, S. Ellershaw, J. Collins and C. Potts


in Findings of the ACL: EMNLP 2023, Singapore, Dec. 2023, pp. 13425–13454.

Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical literature is paramount for public safety, but involves slow and costly manual labor. We set out to improve drug safety monitoring (pharmacovigilance, PV) through the use of Natural Language Processing (NLP). We introduce BioDEX, a large-scale resource for Biomedical adverse Drug Event Extraction, rooted in the historical output of drug safety reporting in the U.S. BioDEX consists of 65k abstracts and 19k full-text biomedical papers with 256k associated document-level safety reports created by medical experts. The core features of these reports include the reported weight, age, and biological sex of a patient, a set of drugs taken by the patient, the drug dosages, the reactions experienced, and whether the reaction was life threatening. In this work, we consider the task of predicting the core information of the report given its originating paper. We estimate human performance to be 72.0% F1, whereas our best model achieves 62.3% F1, indicating significant headroom on this task. We also begin to explore ways in which these models could help professional PV reviewers. Our code and data are available: https://github.com/KarelDO/BioDEX

BioDEX: Large-scale biomedical adverse drug event extraction for real-world pharmacovigilance

K. D'Oosterlinck, F. Remy, J. Deleu, T. Demeester, C. Develder, K. Zaporojets, A. Ghodsi, S. Ellershaw, J. Collins and C. Potts


in Findings of the ACL: EMNLP 2023, Singapore, Dec. 2023, pp. 13425–13454.

@inproceedings{doosterlinck2023emnlp,
author = {Karel D'Oosterlinck and François Remy and Johannes Deleu and Thomas Demeester and Chris Develder and Klim Zaporojets and Aneiss Ghodsi and Simon Ellershaw and Jack Collins and Christopher Potts},
title = {BioDEX: Large-scale biomedical adverse drug event extraction for real-world pharmacovigilance},
booktitle = {Findings of the ACL: EMNLP 2023},
month = {Dec.},
year = {2023},
pages = {13425–13454},
address = {Singapore},
doi = {10.18653/v1/2023.findings-emnlp.896}
}

pubinproceedings

J. Huang, A. Geiger, K. D’Oosterlinck, Z. Wu and C. Potts, "Rigorously assessing natural language explanations of neurons", in Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023, Singapore, 2023, pp. 317-331.

Rigorously assessing natural language explanations of neurons

J. Huang, A. Geiger, K. D’Oosterlinck, Z. Wu and C. Potts


in Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023, Singapore, 2023, pp. 317-331.

Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging. To help address this, we develop two modes of evaluation for natural language explanations that claim individual neurons represent a concept in a text input. In the observational mode, we evaluate claims that a neuron a activates on all and only input strings that refer to a concept picked out by the proposed explanation E. In the intervention mode, we construe E as a claim that neuron a is a causal mediator of the concept denoted by E. We apply our framework to the GPT-4-generated explanations of GPT-2 XL neurons of Bills et al. (2023) and show that even the most confident explanations have high error rates and little to no causal efficacy. We close the paper by critically assessing whether natural language is a good choice for explanations and whether neurons are the best level of analysis.

Rigorously assessing natural language explanations of neurons

J. Huang, A. Geiger, K. D’Oosterlinck, Z. Wu and C. Potts


in Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023, Singapore, 2023, pp. 317-331.

@inproceedings{huang2023blackboxnlp,
author = {Jing Huang and Atticus Geiger and Karel D’Oosterlinck and Zhengxuan Wu and Christopher Potts},
title = {Rigorously assessing natural language explanations of neurons},
booktitle = {Proc. 6th BlackboxNLP Workshop: Anal. and Interpret. Neural Netw. for NLP (BlackboxMLP 2023) at EMNLP 2023},
year = {2023},
pages = {317--331},
address = {Singapore},
doi = {10.18653/v1/2023.blackboxnlp-1.24}
}

pubinproceedings

J.-J. Decorte, J. Van Haute, J. Deleu, C. Develder and T. Demeester, "Career path prediction using resume representation learning and skill-based matching", in 3rd Workshop Recomm. Syst. Human Resour. (RecSys in HR 2023) at ACM RecSys 2023, Singapore, 19 Sep. 2023, pp. 1-9.

Career path prediction using resume representation learning and skill-based matching

J.-J. Decorte, J. Van Haute, J. Deleu, C. Develder and T. Demeester


in 3rd Workshop Recomm. Syst. Human Resour. (RecSys in HR 2023) at ACM RecSys 2023, Singapore, 19 Sep. 2023, pp. 1-9.

The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach achieves the strongest result with 43.01% recall@10.

Career path prediction using resume representation learning and skill-based matching

J.-J. Decorte, J. Van Haute, J. Deleu, C. Develder and T. Demeester


in 3rd Workshop Recomm. Syst. Human Resour. (RecSys in HR 2023) at ACM RecSys 2023, Singapore, 19 Sep. 2023, pp. 1-9.

@inproceedings{decorte2023recsys,
author = {Decorte, Jens-Joris and Van Haute, Jeroen and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {Career path prediction using resume representation learning and skill-based matching},
booktitle = {3rd Workshop Recomm. Syst. Human Resour. (RecSys in HR 2023) at ACM RecSys 2023},
month = {19 Sep.},
year = {2023},
pages = {1--9},
address = {Singapore},
url = {https://ceur-ws.org/Vol-3490/RecSysHR2023-paper_1.pdf}
}

pubinproceedings

J.-J. Decorte, S. Verlinden, J. Van Hautte, J. Deleu, C. Develder and T. Demeester, "Extreme multi-label skill extraction training using large language models", in Proc. Int. Workshop AI For Human Resour. Public Employ. Serv. (AI4HR & PES) at ECML-PKDD 2023, Torino, Italy, 18 Sep. 2023, pp. 1-12.

Extreme multi-label skill extraction training using large language models

J.-J. Decorte, S. Verlinden, J. Van Hautte, J. Deleu, C. Develder and T. Demeester


in Proc. Int. Workshop AI For Human Resour. Public Employ. Serv. (AI4HR & PES) at ECML-PKDD 2023, Torino, Italy, 18 Sep. 2023, pp. 1-12.

Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classifi- cation (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in R-Precision@5 compared to previously published results that relied solely on distant supervision through literal matches.

Extreme multi-label skill extraction training using large language models

J.-J. Decorte, S. Verlinden, J. Van Hautte, J. Deleu, C. Develder and T. Demeester


in Proc. Int. Workshop AI For Human Resour. Public Employ. Serv. (AI4HR & PES) at ECML-PKDD 2023, Torino, Italy, 18 Sep. 2023, pp. 1-12.

@inproceedings{decorte2023ai4hr,
author = {Decorte, Jens-Joris and Verlinden, Severine and Van Hautte, Jeroen and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {Extreme multi-label skill extraction training using large language models},
booktitle = {Proc. Int. Workshop AI For Human Resour. Public Employ. Serv. (AI4HR & PES) at ECML-PKDD 2023},
month = {18 Sep.},
year = {2023},
pages = {1--12},
address = {Torino, Italy}
}

pubinproceedings

S.K. Bitew, J. Deleu, C. Develder and T. Demeester, "Distractor generation for multiple-choice questions with predictive prompting and large language models", in Proc. 1st Int. Tut. Workshop on Responsible Knowledge Discovery in Education (RKDE 2023) at ECML-PKDD 2023, Turin, Italy, 18 Sep. 2023, pp. 1-16.

Distractor generation for multiple-choice questions with predictive prompting and large language models

S.K. Bitew, J. Deleu, C. Develder and T. Demeester


in Proc. 1st Int. Tut. Workshop on Responsible Knowledge Discovery in Education (RKDE 2023) at ECML-PKDD 2023, Turin, Italy, 18 Sep. 2023, pp. 1-16.

Large Language Models (LLMs) such as ChatGPT have demonstrated remarkable performance across various tasks and have garnered significant attention from both researchers and practitioners. However, in an educational context, we still observe a performance gap in generating distractors -- i.e., plausible yet incorrect answers -- with LLMs for multiple-choice questions (MCQs). In this study, we propose a strategy for guiding LLMs such as ChatGPT, in generating relevant distractors by prompting them with question items automatically retrieved from a question bank as well-chosen in-context examples. We evaluate our LLM-based solutions using a quantitative assessment on an existing test set, as well as through quality annotations by human experts, i.e., teachers. We found that on average 53% of the generated distractors presented to the teachers were rated as high-quality, i.e., suitable for immediate use as is, outperforming the state-of-the-art model. We also show the gains of our approach in generating high-quality distractors by comparing it with a zero-shot ChatGPT and a few-shot ChatGPT prompted with static examples.

Distractor generation for multiple-choice questions with predictive prompting and large language models

S.K. Bitew, J. Deleu, C. Develder and T. Demeester


in Proc. 1st Int. Tut. Workshop on Responsible Knowledge Discovery in Education (RKDE 2023) at ECML-PKDD 2023, Turin, Italy, 18 Sep. 2023, pp. 1-16.

@inproceedings{bitew2023rkde,
author = {Bitew, Semere Kiros and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {Distractor generation for multiple-choice questions with predictive prompting and large language models},
booktitle = {Proc. 1st Int. Tut. Workshop on Responsible Knowledge Discovery in Education (RKDE 2023) at ECML-PKDD 2023},
month = {18 Sep.},
year = {2023},
pages = {1--16},
address = {Turin, Italy}
}

pubinproceedings

Z. Wu, K. D'Oosterlinck, A. Geiger, A. Zur and C. Potts, "Causal proxy models for concept-based model explanations", in Proc. 40th Int. Conf. Machine Learn. (ICML 2023), Honolulu, HI, USA, 23-29 Jul. 2023, pp. 1-22.

Causal proxy models for concept-based model explanations

Z. Wu, K. D'Oosterlinck, A. Geiger, A. Zur and C. Potts


in Proc. 40th Int. Conf. Machine Learn. (ICML 2023), Honolulu, HI, USA, 23-29 Jul. 2023, pp. 1-22.

Explainability methods for NLP systems encounter a version of the fundamental problem of causal inference: for a given ground-truth input text, we never truly observe the counterfactual texts necessary for isolating the causal effects of model representations on outputs. In response, many explainability methods make no use of counterfactual texts, assuming they will be unavailable. In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadata-guided heuristics. The core of our proposal is the Causal Proxy Model (CPM). A CPM explains a black-box model N because it is trained to have the same actual input/output behavior as N while creating neural representations that can be intervened upon to simulate the counterfactual input/output behavior of N. Furthermore, we show that the best CPM for N performs comparably to N in making factual predictions, which means that the CPM can simply replace N, leading to more explainable deployed models.

Causal proxy models for concept-based model explanations

Z. Wu, K. D'Oosterlinck, A. Geiger, A. Zur and C. Potts


in Proc. 40th Int. Conf. Machine Learn. (ICML 2023), Honolulu, HI, USA, 23-29 Jul. 2023, pp. 1-22.

@inproceedings{doosterlinck2023icml,
author = {Zhengxuan Wu and Karel D'Oosterlinck and Atticus Geiger and Amir Zur and Christopher Potts},
title = {Causal proxy models for concept-based model explanations},
booktitle = {Proc. 40th Int. Conf. Machine Learn. (ICML 2023)},
month = {23--29 Jul.},
year = {2023},
pages = {1--22},
address = {Honolulu, HI, USA},
url = {https://openreview.net/forum?id=1Hh1cIPJ7V}
}

pubinproceedings

M. De Raedt, F. Godin, T. Demeester and C. Develder, "IDAS: Intent Discovery with Abstractive Summarization", in Proc. 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) at ACL 2023, Toronto, Canada, 14 Jul. 2023, pp. 71-88.

IDAS: Intent Discovery with Abstractive Summarization

M. De Raedt, F. Godin, T. Demeester and C. Develder


in Proc. 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) at ACL 2023, Toronto, Canada, 14 Jul. 2023, pp. 71-88.

Intent discovery is the task of inferring latent intents from a set of unlabeled utterances, and is a useful step towards the efficient creation of new conversational agents. We show that recent competitive methods in intent discovery can be outperformed by clustering utterances based on abstractive summaries, i.e., `labels', that retain the core elements while removing non-essential information. We contribute the IDAS approach, which collects a set of descriptive utterance labels by prompting a Large Language Model, starting from a well-chosen seed set of prototypical utterances, to bootstrap an In-Context Learning procedure to generate labels for non-prototypical utterances. The utterances and their resulting noisy labels are then encoded by a frozen pre-trained encoder, and subsequently clustered to recover the latent intents. For the unsupervised task (without any intent labels) IDAS outperforms the state-of-the-art by up to +7.42% in standard cluster metrics for the Banking, StackOverflow, and Transport datasets. For the semi-supervised task (with labels for a subset of intents) IDAS surpasses 2 recent methods on the CLINC benchmark without even using labeled data.

IDAS: Intent Discovery with Abstractive Summarization

M. De Raedt, F. Godin, T. Demeester and C. Develder


in Proc. 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) at ACL 2023, Toronto, Canada, 14 Jul. 2023, pp. 71-88.

@inproceedings{deraedt2023idas,
author = {De Raedt, Maarten and Godin, Fréderic and Demeester, Thomas and Develder, Chris},
title = {IDAS: Intent Discovery with Abstractive Summarization},
booktitle = {Proc. 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023) at ACL 2023},
month = {14 Jul.},
year = {2023},
pages = {71--88},
address = {Toronto, Canada},
url = {https://aclanthology.org/2023.nlp4convai-1.7},
doi = {10.18653/v1/2023.nlp4convai-1.7}
}

pubinproceedings

S.K. Bitew, J. Deleu, A.S. Doğruöz, C. Develder and T. Demeester, "Learning from partially annotated data: Example-aware creation of gap-filling exercises for language learning", in Proc. 18th Workshop Innovative Use of NLP for Building Educational Applications (BEA 2023) at ACL 2023, Toronto, Canada, 13 Jul. 2023, pp. 598-609.

Learning from partially annotated data: Example-aware creation of gap-filling exercises for language learning

S.K. Bitew, J. Deleu, A.S. Doğruöz, C. Develder and T. Demeester


in Proc. 18th Workshop Innovative Use of NLP for Building Educational Applications (BEA 2023) at ACL 2023, Toronto, Canada, 13 Jul. 2023, pp. 598-609.

Since performing exercises (including, e.g.,practice tests) forms a crucial component of learning, and creating such exercises requires non-trivial effort from the teacher. There is a great value in automatic exercise generation in digital tools in education. In this paper, we particularly focus on automatic creation of gap-filling exercises for language learning, specifically grammar exercises. Since providing any annotation in this domain requires human expert effort, we aim to avoid it entirely and explore the task of converting existing texts into new gap-filling exercises, purely based on an example exercise, without explicit instruction or detailed annotation of the intended grammar topics. We contribute (i) a novel neural network architecture specifically designed for the aforementioned gap-filling exercise generation task, and (ii) a real-world benchmark dataset for French grammar. We show that our model for this French grammar gap-filling exercise generation outperforms a competitive baseline classifier by 8% in F1 percentage points, achieving an average F1 score of 82%. Our model implementation and the dataset are made publicly available to foster future research, thus offering a standardized evaluation and baseline solution of the proposed partially annotated data prediction task in grammar exercise creation.

Learning from partially annotated data: Example-aware creation of gap-filling exercises for language learning

S.K. Bitew, J. Deleu, A.S. Doğruöz, C. Develder and T. Demeester


in Proc. 18th Workshop Innovative Use of NLP for Building Educational Applications (BEA 2023) at ACL 2023, Toronto, Canada, 13 Jul. 2023, pp. 598-609.

@inproceedings{bitew2023bea,
author = {Bitew, Semere Kiros and Deleu, Johannes and Doğruöz, A. Seza and Develder, Chris and Demeester, Thomas},
title = {Learning from partially annotated data: Example-aware creation of gap-filling exercises for language learning},
booktitle = {Proc. 18th Workshop Innovative Use of NLP for Building Educational Applications (BEA 2023) at ACL 2023},
month = {13 Jul.},
year = {2023},
pages = {598--609},
address = {Toronto, Canada},
url = {https://aclanthology.org/2023.bea-1.51},
doi = {10.18653/v1/2023.bea-1.51}
}

pubarticle

Y. Jiang, K. Zaporojets, J. Deleu, T. Demeester and C. Develder, "CookDial: A dataset for task-oriented dialogs grounded in procedural documents", Appl. Intell., Vol. 53, No. 11, Jun. 2023, pp. 4748-4766.

CookDial: A dataset for task-oriented dialogs grounded in procedural documents

Y. Jiang, K. Zaporojets, J. Deleu, T. Demeester and C. Develder


Appl. Intell., Vol. 53, No. 11, Jun. 2023, pp. 4748-4766.

This work presents a new dialog dataset, CookDial, that facilitates research on task-oriented dialog systems with procedural knowledge understanding. The corpus contains 260 human-to-human task-oriented dialogs in which an agent, given a recipe document, guides the user to cook a dish. Dialogs in CookDial exhibit two unique features: (i) procedural alignment between the dialog flow and supporting document; (ii) complex agent decision-making that involves segmenting long sentences, paraphrasing hard instructions and resolving coreference in the dialog context. In addition, we identify three challenging (sub)tasks in the assumed task-oriented dialog system: (1) User Question Understanding, (2) Agent Action Frame Prediction, and (3) Agent Response Generation. For each of these tasks, we develop a neural baseline model, which we evaluate on the CookDial dataset. We publicly release the CookDial dataset, comprising rich annotations of both dialogs and recipe documents, to stimulate further research on domain-specific document-grounded dialog systems.

CookDial: A dataset for task-oriented dialogs grounded in procedural documents

Y. Jiang, K. Zaporojets, J. Deleu, T. Demeester and C. Develder


Appl. Intell., Vol. 53, No. 11, Jun. 2023, pp. 4748-4766.

@article{jiang2022cookdial,
author = {Jiang, Yiwei and Zaporojets, Klim and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {CookDial: A dataset for task-oriented dialogs grounded in procedural documents},
journal = {Appl. Intell.},
month = {Jun.},
year = {2023},
volume = {53},
number = {11},
pages = {4748--4766},
doi = {10.1007/s10489-022-03692-0}
}

pubinproceedings

A. Hadifar, S.K. Bitew, J. Deleu, V. Hoste, C. Develder and T. Demeester, "Diverse content selection for educational question generation", in Proc. 17th Conf. Eur. Chapter Associat. Comput. Linguist.: Stud. Research Workshop (EACL SRW 2023, Dubrovnik, Croatia, 2-6 May 2023, pp. 123-133.

Diverse content selection for educational question generation

A. Hadifar, S.K. Bitew, J. Deleu, V. Hoste, C. Develder and T. Demeester


in Proc. 17th Conf. Eur. Chapter Associat. Comput. Linguist.: Stud. Research Workshop (EACL SRW 2023, Dubrovnik, Croatia, 2-6 May 2023, pp. 123-133.

Question Generation (QG) systems have shown promising results in reducing the time and effort required to create questions for students. Typically, a first step in QG is to select the content to design a question for. In an educational setting, it is crucial that the resulting questions cover the most relevant/important pieces of knowledge the student should have acquired. Yet, current QG systems either consider just a single sentence or paragraph (thus do not include a selection step), or do not consider this educational viewpoint of content selection. Aiming to fill this research gap with a solution for educational document level QG, we thus propose to select contents for QG based on relevance and topic diversity. We demonstrate the effectiveness of our proposed content selection strategy for QG on 2 educational datasets. In our performance assessment, we also highlight limitations of existing QG evaluation metrics in light of the content selection problem.

Diverse content selection for educational question generation

A. Hadifar, S.K. Bitew, J. Deleu, V. Hoste, C. Develder and T. Demeester


in Proc. 17th Conf. Eur. Chapter Associat. Comput. Linguist.: Stud. Research Workshop (EACL SRW 2023, Dubrovnik, Croatia, 2-6 May 2023, pp. 123-133.

@inproceedings{hadifar2023eacl,
author = {Hadifar, Amir and Bitew, Semere Kiros and Deleu, Johannes and Hoste, Veronique and Develder, Chris and Demeester, Thomas},
title = {Diverse content selection for educational question generation},
booktitle = {Proc. 17th Conf. Eur. Chapter Associat. Comput. Linguist.: Stud. Research Workshop (EACL SRW 2023},
month = {2--6 May},
year = {2023},
pages = {123--133},
address = {Dubrovnik, Croatia},
url = {https://aclanthology.org/2023.eacl-srw.13},
doi = {10.18653/v1/2023.eacl-srw.13}
}

pubarticle

A. Hadifar, S.K. Bitew, J. Deleu, C. Develder and T. Demeester, "EduQG: A multi-format multiple-choice dataset for the educational domain", IEEE Access, Vol. 11, Feb. 2023, pp. 20885-20896.

EduQG: A multi-format multiple-choice dataset for the educational domain

A. Hadifar, S.K. Bitew, J. Deleu, C. Develder and T. Demeester


IEEE Access, Vol. 11, Feb. 2023, pp. 20885-20896.

Natural language processing technology has made significant progress in recent years, fuelled by increasingly powerful general language models. This has also inspired a sizeable body of work targeted specifically towards the educational domain, where the creation of questions (both for assessment and practice) is a laborious/expensive effort. Thus, automatic Question-Generation (QG) solutions have been proposed and studied. Yet, according to a recent survey of the educational QG community’s progress, a common baseline dataset unifying multiple domains and question forms (e.g., multiple choice vs. fill-the-gap), including readily available baseline models to compare against, is largely missing. This is the gap we aim to fill with this paper. In particular, we introduce a high-quality dataset in the educational domain, containing over 3,000 entries, comprising (i) multiple-choice questions, (ii) the corresponding answers (including distractors), and (iii) associated passages from the course material used as sources for the questions. Each question is phrased in two forms, normal and cloze (i.e., fill-the-gap), and correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom’s taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines are made available to support further research in question generation for education ( https://github.com/hadifar/question-generation).

EduQG: A multi-format multiple-choice dataset for the educational domain

A. Hadifar, S.K. Bitew, J. Deleu, C. Develder and T. Demeester


IEEE Access, Vol. 11, Feb. 2023, pp. 20885-20896.

@article{hadifar2023access,
author = {Hadifar, Amir and Bitew, Semere Kiros and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {EduQG: A multi-format multiple-choice dataset for the educational domain},
journal = {IEEE Access},
month = {Feb.},
year = {2023},
volume = {11},
pages = {20885--20896},
doi = {10.1109/ACCESS.2023.3248790}
}

pubarticle

S.K. Bitew, A. Hadifar, L. Sterckx, J. Deleu, C. Develder and T. Demeester, "Learning to reuse distractors to support multiple choice question generation in education", IEEE Trans. Learn. Technol., 2022, pp. 1-16.

Learning to reuse distractors to support multiple choice question generation in education

S.K. Bitew, A. Hadifar, L. Sterckx, J. Deleu, C. Develder and T. Demeester


IEEE Trans. Learn. Technol., 2022, pp. 1-16.

Multiple choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, due to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This paper studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations, and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average 3 distractors out of the 10 shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects & languages and a 77k multilingual pool of distractor vocabulary for future research.

Learning to reuse distractors to support multiple choice question generation in education

S.K. Bitew, A. Hadifar, L. Sterckx, J. Deleu, C. Develder and T. Demeester


IEEE Trans. Learn. Technol., 2022, pp. 1-16.

@article{bitew2022,
author = {Bitew, Semere Kiros and Hadifar, Amir and Sterckx, Lucas and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {Learning to reuse distractors to support multiple choice question generation in education},
journal = {IEEE Trans. Learn. Technol.},
year = {2022},
pages = {1--16},
doi = {10.1109/TLT.2022.3226523}
}

pubinproceedings

M. De Raedt, F. Godin, C. Develder and T. Demeester, "Robustifying sentiment classification by maximally exploiting few counterfactuals", in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2022), Abu Dhabi, UAE, 7-11 Dec. 2022, pp. 11386–11400.

Robustifying sentiment classification by maximally exploiting few counterfactuals

M. De Raedt, F. Godin, C. Develder and T. Demeester


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2022), Abu Dhabi, UAE, 7-11 Dec. 2022, pp. 11386–11400.

For text classification tasks, finetuned language models perform remarkably well. Yet, they tend to rely on spurious patterns in training data, thus limiting their performance on out-of-distribution (OOD) test data. Among recent models aiming to avoid this spurious pattern problem, adding extra counterfactual samples to the training data has proven to be very effective. Yet, counterfactual data generation is costly since it relies on human annotation. Thus, we propose a novel solution that only requires annotation of a small fraction (e.g., 1%) of the original training data, and uses automatic generation of extra counterfactuals in an encoding vector space. We demonstrate the effectiveness of our approach in sentiment classification, using IMDb data for training and other sets for OOD tests (i.e., Amazon, SemEval and Yelp). We achieve noticeable accuracy improvements by adding only 1% manual counterfactuals: +3% compared to adding +100% in-distribution training samples, +1.3% compared to alternate counterfactual approaches.

Robustifying sentiment classification by maximally exploiting few counterfactuals

M. De Raedt, F. Godin, C. Develder and T. Demeester


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2022), Abu Dhabi, UAE, 7-11 Dec. 2022, pp. 11386–11400.

@inproceedings{deraedt2022emnlp,
author = {De Raedt, Maarten and Godin, Fréderic and Develder, Chris and Demeester, Thomas},
title = {Robustifying sentiment classification by maximally exploiting few counterfactuals},
booktitle = {Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2022)},
month = {7--11 Dec.},
year = {2022},
pages = {11386–11400},
address = {Abu Dhabi, UAE},
url = {https://aclanthology.org/2022.emnlp-main.783}
}

pubinproceedings

P. Rabaey, C. De Boom and T. Demeester, "Neural Bayesian network understudy", in Proc. Workshop Causal Mach. Learn. Real-World Impact (CML4Impact 2022) at NeurIPS 2022, New Orleans, LA, USA, 2 Dec. 2022.

Neural Bayesian network understudy

P. Rabaey, C. De Boom and T. Demeester


in Proc. Workshop Causal Mach. Learn. Real-World Impact (CML4Impact 2022) at NeurIPS 2022, New Orleans, LA, USA, 2 Dec. 2022.

Bayesian Networks may be appealing for clinical decision-making due to their
inclusion of causal knowledge, but their practical adoption remains limited as a
result of their inability to deal with unstructured data. While neural networks do
not have this limitation, they are not interpretable and are inherently unable to deal
with causal structure in the input space. Our goal is to build neural networks that
combine the advantages of both approaches. Motivated by the perspective to inject
causal knowledge while training such neural networks, this work presents initial
steps in that direction. We demonstrate how a neural network can be trained to
output conditional probabilities, providing approximately the same functionality
as a Bayesian Network. Additionally, we propose two training strategies that allow
encoding the independence relations inferred from a given causal structure into the
neural network. We present initial results in a proof-of-concept setting, showing
that the neural model acts as an understudy to its Bayesian Network counterpart,
approximating its probabilistic and causal properties.

Neural Bayesian network understudy

P. Rabaey, C. De Boom and T. Demeester


in Proc. Workshop Causal Mach. Learn. Real-World Impact (CML4Impact 2022) at NeurIPS 2022, New Orleans, LA, USA, 2 Dec. 2022.

@inproceedings{rabaey2022neurips,
author = {Rabaey, Paloma and De Boom, Cedric and Demeester, Thomas},
title = {Neural Bayesian network understudy},
booktitle = {Proc. Workshop Causal Mach. Learn. Real-World Impact (CML4Impact 2022) at NeurIPS 2022},
month = {2 Dec.},
year = {2022},
address = {New Orleans, LA, USA}
}

pubinproceedings

K. Zaporojets, L.-A. Kaffee, T. Demeester, C. Develder and I. Augenstein, "TempEL: Linking dynamically evolving and newly emerging entities", in Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022), New Orleans, LA, USA, 28 Nov. - 9 Dec. 2022.

TempEL: Linking dynamically evolving and newly emerging entities

K. Zaporojets, L.-A. Kaffee, T. Demeester, C. Develder and I. Augenstein


in Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022), New Orleans, LA, USA, 28 Nov. - 9 Dec. 2022.

In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear. We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task. For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities’ descriptions. By capturing such temporal aspects, our newly introduced TempEL resource contrasts with currently existing entity linking datasets, which are composed of fixed mentions linked to a single static version of a target Knowledge Base (e.g., Wikipedia 2010 for CoNLL-AIDA). Indeed, for each of our collected temporal snapshots, TempEL contains links to entities that are continual, i.e., occur in all of the years, as well as completely new entities that appear for the first time at some point. Thus, we enable to quantify the performance of current state-of-the-art EL models for: (i) entities that are subject to changes over time in their Knowledge Base descriptions as well as their mentions’ contexts, and (ii) newly created entities that were previously non-existing (e.g., at the time the EL model was trained). Our experimental results show that in terms of temporal performance degradation, (i) continual entities suffer a decrease of up to 3.1% EL accuracy, while (ii) for new entities this accuracy drop is up to 17.9%. This highlights the challenge of the introduced TempEL dataset and opens new research prospects in the area of time-evolving entity disambiguation.

TempEL: Linking dynamically evolving and newly emerging entities

K. Zaporojets, L.-A. Kaffee, T. Demeester, C. Develder and I. Augenstein


in Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022), New Orleans, LA, USA, 28 Nov. - 9 Dec. 2022.

@inproceedings{Zaporojets2022NeurIPS,
author = {Zaporojets, Klim and Kaffee, Lucie-Aimée and Demeester, Thomas and Develder, Chris and Augenstein, Isabelle},
title = {TempEL: Linking dynamically evolving and newly emerging entities},
booktitle = {Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022)},
month = {28 Nov. -- 9 Dec.},
year = {2022},
address = {New Orleans, LA, USA},
url = {https://openreview.net/forum?id=vrnqr3PG4yB}
}

pubinproceedings

E.D. Abraham, K. D'Oosterlinck, A. Feder, Y. Gat, A. Geiger, C. Potts, R. Reichart and Z. Wu, "CEBaB: Estimating the causal effects of real-world concepts on NLP model behavior", in Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022), New Orleans, LA, USA, 28 Nov.-9 Dec. 2022.

CEBaB: Estimating the causal effects of real-world concepts on NLP model behavior

E.D. Abraham, K. D'Oosterlinck, A. Feder, Y. Gat, A. Geiger, C. Potts, R. Reichart and Z. Wu


in Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022), New Orleans, LA, USA, 28 Nov.-9 Dec. 2022.

The increasing size and complexity of modern ML systems has improved their predictive capabilities but made their behavior harder to explain. Many techniques for model explanation have been developed in response, but we lack clear criteria for assessing these techniques. In this paper, we cast model explanation as the causal inference problem of estimating causal effects of real-world concepts on the output behavior of ML models given actual input data. We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP). CEBaB consists of short restaurant reviews with human-generated counterfactual reviews in which an aspect (food, noise, ambiance, service) of the dining experience was modified. Original and counterfactual reviews are annotated with multiply-validated sentiment ratings at the aspect-level and review-level. The rich structure of CEBaB allows us to go beyond input features to study the effects of abstract, real-world concepts on model behavior. We use CEBaB to compare the quality of a range of concept-based explanation methods covering different assumptions and conceptions of the problem, and we seek to establish natural metrics for comparative assessments of these methods.

CEBaB: Estimating the causal effects of real-world concepts on NLP model behavior

E.D. Abraham, K. D'Oosterlinck, A. Feder, Y. Gat, A. Geiger, C. Potts, R. Reichart and Z. Wu


in Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022), New Orleans, LA, USA, 28 Nov.-9 Dec. 2022.

@inproceedings{abraham2022,
author = {Abraham, Eldar David and D'Oosterlinck, Karel and Feder, Amir and Gat, Yair and Geiger, Atticus and Potts, Christopher and Reichart, Roi and Wu, Zhengxuan},
title = {CEBaB: Estimating the causal effects of real-world concepts on NLP model behavior},
booktitle = {Proc. 36th Conf. Neural Inf. Process. Sys. (NeurIPS 2022)},
month = {28 Nov.--9 Dec.},
year = {2022},
address = {New Orleans, LA, USA},
url = {https://proceedings.neurips.cc/paper_files/paper/2022/hash/701ec28790b29a5bc33832b7bdc4c3b6-Abstract-Conference.html}
}

pubinproceedings

S. Labat, A. Hadifar, T. Demeester and V. Hoste, "An emotional journey: Detecting emotion trajectories in Dutch customer service dialogues", in Proc. 8th Workshop Noisy User-generated Text (W-NUT 2022) at COLING 2022, Geongjy, Republic of Korea, Oct. 16 2022, pp. 106-112.

An emotional journey: Detecting emotion trajectories in Dutch customer service dialogues

S. Labat, A. Hadifar, T. Demeester and V. Hoste


in Proc. 8th Workshop Noisy User-generated Text (W-NUT 2022) at COLING 2022, Geongjy, Republic of Korea, Oct. 16 2022, pp. 106-112.

The ability to track fine-grained emotions in customer service dialogues has many real-world applications, but has not been studied extensively. This paper measures the potential of prediction models on that task, based on a real-world dataset of Dutch Twitter conversations in the domain of customer service. We find that modeling emotion trajectories has a small, but measurable benefit compared to predictions based on isolated turns. The models used in our study are shown to generalize well to different companies and economic sectors.

An emotional journey: Detecting emotion trajectories in Dutch customer service dialogues

S. Labat, A. Hadifar, T. Demeester and V. Hoste


in Proc. 8th Workshop Noisy User-generated Text (W-NUT 2022) at COLING 2022, Geongjy, Republic of Korea, Oct. 16 2022, pp. 106-112.

@inproceedings{labat2022,
author = {Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Véronique},
title = {An emotional journey: Detecting emotion trajectories in Dutch customer service dialogues},
booktitle = {Proc. 8th Workshop Noisy User-generated Text (W-NUT 2022) at COLING 2022},
month = {Oct. 16},
year = {2022},
pages = {106--112},
address = {Geongjy, Republic of Korea},
url = {https://aclanthology.org/2022.wnut-1.12/}
}

pubinproceedings

J.-J. Decorte, J. Van Hautte, J. Deleu, C. Develder and T. Demeester, "Design of negative sampling strategies for distantly supervised skill extraction", in Proc. 2nd Workshop Recomm. Sys. Hum. Resour. at RecSys 2022 (RecSys in HR 2022), Seattle, WA, USA, 22 Sep. 2022.

Design of negative sampling strategies for distantly supervised skill extraction

J.-J. Decorte, J. Van Hautte, J. Deleu, C. Develder and T. Demeester


in Proc. 2nd Workshop Recomm. Sys. Hum. Resour. at RecSys 2022 (RecSys in HR 2022), Seattle, WA, USA, 22 Sep. 2022.

Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three different strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.

Design of negative sampling strategies for distantly supervised skill extraction

J.-J. Decorte, J. Van Hautte, J. Deleu, C. Develder and T. Demeester


in Proc. 2nd Workshop Recomm. Sys. Hum. Resour. at RecSys 2022 (RecSys in HR 2022), Seattle, WA, USA, 22 Sep. 2022.

@inproceedings{Decorte2022RecSysHR,
author = {Decorte, Jens-Joris and Van Hautte, Jeroen and Deleu, Johannes and Develder, Chris and Demeester, T.},
title = {Design of negative sampling strategies for distantly supervised skill extraction},
booktitle = {Proc. 2nd Workshop Recomm. Sys. Hum. Resour. at RecSys 2022 (RecSys in HR 2022)},
month = {22 Sep.},
year = {2022},
address = {Seattle, WA, USA}
}

pubinproceedings

S. Labat, N. Ackaert, T. Demeester and V. Hoste, "Variation in the expression and annotation of emotions: a Wizard of Oz pilot study", in Proc. 1st Workshop Perspectivist Approaches to NLP @LREC2022 (NLPerspectives 2022), Marseille, France, 20 Jun. 2022, pp. 66-72.

Variation in the expression and annotation of emotions: a Wizard of Oz pilot study

S. Labat, N. Ackaert, T. Demeester and V. Hoste


in Proc. 1st Workshop Perspectivist Approaches to NLP @LREC2022 (NLPerspectives 2022), Marseille, France, 20 Jun. 2022, pp. 66-72.

This pilot study employs the Wizard of Oz technique to collect a corpus of written human-computer conversations in the domain of customer service. The resulting dataset contains 192 conversations and is used to test three hypotheses related to the expression and annotation of emotions. First, we hypothesize that there is a discrepancy between the emotion annotations of the participant (the experiencer) and the annotations of our external annotator (the observer). Furthermore, we hypothesize that the personality of the participants has an influence on the emotions they expressed, and on the way they evaluated (annotated) these emotions. We found that for an external, trained annotator, not all emotion labels were equally easy to work with. We also noticed that the trained annotator had a tendency to opt for emotion labels that were more centered in the valence-arousal space, while participants made more `extreme' annotations. For the second hypothesis, we discovered a positive correlation between the personality trait extraversion and the emotion dimensions valence and dominance in our sample. Finally, for the third premise, we observed a positive correlation between the internal-external agreement on emotion labels and the personality traits conscientiousness and extraversion. Our insights and findings will be used in future research to conduct a larger Wizard of Oz experiment.

Variation in the expression and annotation of emotions: a Wizard of Oz pilot study

S. Labat, N. Ackaert, T. Demeester and V. Hoste


in Proc. 1st Workshop Perspectivist Approaches to NLP @LREC2022 (NLPerspectives 2022), Marseille, France, 20 Jun. 2022, pp. 66-72.

@inproceedings{labat2022lrec,
author = {Labat, Sofie and Ackaert, Naomi and Demeester, Thomas and Hoste, Veronique},
title = {Variation in the expression and annotation of emotions: a Wizard of Oz pilot study},
booktitle = {Proc. 1st Workshop Perspectivist Approaches to NLP @LREC2022 (NLPerspectives 2022)},
month = {20 Jun.},
year = {2022},
pages = {66--72},
address = {Marseille, France},
url = {https://aclanthology.org/2022.nlperspectives-1.9/}
}

pubinproceedings

Y. Jiang, A. Hadifar, J. Deleu, T. Demeester and C. Develder, "UGent-T2K at the 2nd DialDoc shared task: A retrieval-focused dialog system grounded in multiple documents", in Proc. DialDoc Workshop at ACL 2022, Dublin, Ireland, May 26 2022, pp. 1-8.

UGent-T2K at the 2nd DialDoc shared task: A retrieval-focused dialog system grounded in multiple documents

Y. Jiang, A. Hadifar, J. Deleu, T. Demeester and C. Develder


in Proc. DialDoc Workshop at ACL 2022, Dublin, Ireland, May 26 2022, pp. 1-8.

This work presents the contribution from the text-to-Knowledge team of Ghent University (UGent-T2K)1 to the MultiDoc2Dial shared task on modeling dialogs grounded in multiple documents. We propose a pipeline system, comprising (1) document retrieval, (2) passage retrieval, and (3) response generation. We engineered these individual components mainly by, for (1)-(2), combining multiple ranking models and adding a final LambdaMART reranker, and, for (3), by adopting a Fusion-in-Decoder (FiD) model. We thus significantly boost the baseline system’s performance (over +10 points for both F1 and SacreBLEU). Further, error analysis reveals two major failure cases, to be addressed in future work: (i) in case of topic shift within the dialog, retrieval often fails to select the correct grounding document(s), and (ii) generation sometimes fails to use the correctly retrieved grounding passage.

UGent-T2K at the 2nd DialDoc shared task: A retrieval-focused dialog system grounded in multiple documents

Y. Jiang, A. Hadifar, J. Deleu, T. Demeester and C. Develder


in Proc. DialDoc Workshop at ACL 2022, Dublin, Ireland, May 26 2022, pp. 1-8.

@inproceedings{jiang2022acl,
author = {Jiang, Yiwei and Hadifar, Amir and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {UGent-T2K at the 2nd DialDoc shared task: A retrieval-focused dialog system grounded in multiple documents},
booktitle = {Proc. DialDoc Workshop at ACL 2022},
month = {May 26},
year = {2022},
pages = {1--8},
address = {Dublin, Ireland},
doi = {10.18653/v1/2022.dialdoc-1.12}
}

pubinproceedings

K. Zaporojets, J. Deleu, Y. Jiang, T. Demeester and C. Develder, "Towards consistent document-level entity linking: Joint Models for entity linking and coreference resolution", in Proc. 60th Annual Meet. Assoc. Comput. Linguist. (ACL 2022), Dublin, Ireland, 22-27 May 2022, pp. 1-7.

Towards consistent document-level entity linking: Joint Models for entity linking and coreference resolution

K. Zaporojets, J. Deleu, Y. Jiang, T. Demeester and C. Develder


in Proc. 60th Annual Meet. Assoc. Comput. Linguist. (ACL 2022), Dublin, Ireland, 22-27 May 2022, pp. 1-7.

We consider the task of document-level entity linking (EL), where it is important to make onsistent decisions for entity mentions over the full document jointly. We aim to leverage explicit “connections” among mentions within the document itself: we propose to join EL and coreference resolution (coref) in a single structured prediction task over directed trees and use a globally normalized model to solve it. This contrasts with related works where two separate models are trained for each of the tasks and additional logic is required to merge the outputs. Experimental results on two datasets show a boost of up to +5% F1-score on both coref and EL tasks, compared to their standalone counterparts. For a subset of hard cases, with individual mentions lacking the correct EL in their candidate entity list, we obtain a +50% increase in accuracy.

Towards consistent document-level entity linking: Joint Models for entity linking and coreference resolution

K. Zaporojets, J. Deleu, Y. Jiang, T. Demeester and C. Develder


in Proc. 60th Annual Meet. Assoc. Comput. Linguist. (ACL 2022), Dublin, Ireland, 22-27 May 2022, pp. 1-7.

@inproceedings{zaporojets2022acl,
author = {Zaporojets, Klim and Deleu, Johannes and Jiang, Yiwei and Demeester, Thomas and Develder, Chris},
title = {Towards consistent document-level entity linking: Joint Models for entity linking and coreference resolution},
booktitle = {Proc. 60th Annual Meet. Assoc. Comput. Linguist. (ACL 2022)},
month = {22--27 May},
year = {2022},
pages = {1--7},
address = {Dublin, Ireland},
doi = {10.18653/v1/2022.acl-short.88}
}

pubinproceedings

S.K. Bitew, J. Deleu, C. Develder and T. Demeester, "Lazy low-resource coreference resolution: A study on leveraging black-box translation tools", in Proc. 4th Workshop Comput. Models of Reference, Anaphora and Coreference (CRAC 2021) at EMNLP 2021, Punta Cana, Domenican Republic, 11 Nov. 2021, pp. 1-6.

Lazy low-resource coreference resolution: A study on leveraging black-box translation tools

S.K. Bitew, J. Deleu, C. Develder and T. Demeester


in Proc. 4th Workshop Comput. Models of Reference, Anaphora and Coreference (CRAC 2021) at EMNLP 2021, Punta Cana, Domenican Republic, 11 Nov. 2021, pp. 1-6.

Large annotated corpora for coreference resolution are available for few languages. For machine translation, however, strong black-box systems exist for many languages. We empirically explore the appealing idea of leveraging such translation tools for bootstrapping coreference resolution in languages with limited resources. Two scenarios are analyzed, in which a large coreference corpus in a high-resource language is used for coreference predictions in a smaller language, i.e., by machine translating either the training corpus, or the test data. In our empirical evaluation of coreference resolution using the two scenarios on several medium-resource languages, we find no improvement over monolingual baseline models. Our analysis of the various sources of error inherent to the studied scenarios, reveals that in fact the quality of contemporary machine translation tools is the main limiting factor.

Lazy low-resource coreference resolution: A study on leveraging black-box translation tools

S.K. Bitew, J. Deleu, C. Develder and T. Demeester


in Proc. 4th Workshop Comput. Models of Reference, Anaphora and Coreference (CRAC 2021) at EMNLP 2021, Punta Cana, Domenican Republic, 11 Nov. 2021, pp. 1-6.

@inproceedings{bitew2021crac,
author = {Bitew, Semere Kiros and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {Lazy low-resource coreference resolution: A study on leveraging black-box translation tools},
booktitle = {Proc. 4th Workshop Comput. Models of Reference, Anaphora and Coreference (CRAC 2021) at EMNLP 2021},
month = {11 Nov.},
year = {2021},
pages = {1--6},
address = {Punta Cana, Domenican Republic},
url = {https://aclanthology.org/2021.crac-1.6/}
}

pubinproceedings

M. De Raedt, F. Godin, P. Buteneers, C. Develder and T. Demeester, "A simple geometric method for cross-lingual linguistic transformations with pre-trained autoencoders", in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2021), Punta Cana, Domenican Republic, 7-11 Nov. 2021.

A simple geometric method for cross-lingual linguistic transformations with pre-trained autoencoders

M. De Raedt, F. Godin, P. Buteneers, C. Develder and T. Demeester


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2021), Punta Cana, Domenican Republic, 7-11 Nov. 2021.

Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linuistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector represen- tations can be manipulated to indirectly steer such properties. For efficient learning, we i vestigate the use of a geometric mapping in embedding space to transform linguistic prop- erties, without any tuning of the pre-trained sentence encoder or decoder. We validate our approach on three linguistic properties using a pre-trained multilingual autoencoder and ana- lyze the results in both monolingual and cross- lingual settings.

A simple geometric method for cross-lingual linguistic transformations with pre-trained autoencoders

M. De Raedt, F. Godin, P. Buteneers, C. Develder and T. Demeester


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2021), Punta Cana, Domenican Republic, 7-11 Nov. 2021.

@inproceedings{deraedt2021emnlp,
author = {De Raedt, Maarten and Godin, Fréderic and Buteneers, Pieter and Develder, Chris and Demeester, Thomas},
title = {A simple geometric method for cross-lingual linguistic transformations with pre-trained autoencoders},
booktitle = {Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2021)},
month = {7--11 Nov.},
year = {2021},
address = {Punta Cana, Domenican Republic},
url = {https://aclanthology.org/2021.emnlp-main.792/}
}

pubarticle

A. Hadifar, J. Deleu, C. Develder and T. Demeester, "Exploration of block-wise dynamic sparseness", Pattern Recognit. Lett., Vol. 151, Nov. 2021, pp. 187-192.

Exploration of block-wise dynamic sparseness

A. Hadifar, J. Deleu, C. Develder and T. Demeester


Pattern Recognit. Lett., Vol. 151, Nov. 2021, pp. 187-192.

Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for dynamic sparseness, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform static sparseness baselines. In addition, our method can reach similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.

Exploration of block-wise dynamic sparseness

A. Hadifar, J. Deleu, C. Develder and T. Demeester


Pattern Recognit. Lett., Vol. 151, Nov. 2021, pp. 187-192.

@article{hadifar2021prl,
author = {Hadifar, Amir and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {Exploration of block-wise dynamic sparseness},
journal = {Pattern Recognit. Lett.},
month = {Nov.},
year = {2021},
volume = {151},
pages = {187--192},
doi = {10.1016/j.patrec.2021.08.013}
}

pubinproceedings

J.-J. Decorte, J. Van Hautte, T. Demeester and C. Develder, "JobBERT: Understanding job titles through skills", in Proc. Int. Workshop Fair, Effective and Sustainable Talent at ECML-PKDD (FEAST 2021), Bilbao, Spain, 13-17 Sep. 2021.

JobBERT: Understanding job titles through skills

J.-J. Decorte, J. Van Hautte, T. Demeester and C. Develder


in Proc. Int. Workshop Fair, Effective and Sustainable Talent at ECML-PKDD (FEAST 2021), Bilbao, Spain, 13-17 Sep. 2021.

Job titles form a cornerstone of today’s human resources (HR) processes. Within online recruitment, they allow candidates to understand the contents of a vacancy at a glance, while internal HR departments use them to organize and structure many of their processes. As job titles are a compact, convenient, and readily available data source, modeling them with high accuracy can greatly benefit many HR tech applications. In this paper, we propose a neural representation model for job titles, by augmenting a pre-trained language model with co-occurrence information from skill labels extracted from vacancies. Our JobBERT method leads to considerable improvements compared to using generic sentence encoders, for the task of job title normalization, for which we release a new evaluation benchmark.

JobBERT: Understanding job titles through skills

J.-J. Decorte, J. Van Hautte, T. Demeester and C. Develder


in Proc. Int. Workshop Fair, Effective and Sustainable Talent at ECML-PKDD (FEAST 2021), Bilbao, Spain, 13-17 Sep. 2021.

@inproceedings{decorte2021feast,
author = {Decorte, Jens-Joris and Van Hautte, Jeroen and Demeester, Thomas and Develder, Chris},
title = {JobBERT: Understanding job titles through skills},
booktitle = {Proc. Int. Workshop Fair, Effective and Sustainable Talent at ECML-PKDD (FEAST 2021)},
month = {13--17 Sep.},
year = {2021},
address = {Bilbao, Spain}
}

pubinproceedings

S. Verlinden, K. Zaporojets, J. Deleu, T. Demeester and C. Develder, "Injecting knowledge base information into end-to-end joint entity and relation extraction and coreference resolution", in Findings of the ACL: ACL-IJCNLP 2021, Bangkok, Thailand, 1-6 Aug. 2021.

Injecting knowledge base information into end-to-end joint entity and relation extraction and coreference resolution

S. Verlinden, K. Zaporojets, J. Deleu, T. Demeester and C. Develder


in Findings of the ACL: ACL-IJCNLP 2021, Bangkok, Thailand, 1-6 Aug. 2021.

We consider a joint information extraction (IE) model, solving named entity recognition, coreference resolution and relation extraction jointly over the whole document. In particu- lar, we study how to inject information from a knowledge base (KB) in such IE model, based on unsupervised entity linking. The used KB entity representations are learned from either (i) hyperlinked text documents (Wikipedia), or (ii) a knowledge graph (Wikidata), and ap- pear complementary in raising IE performance. Representations of corresponding entity link- ing (EL) candidates are added to text span rep- resentations of the input document, and we ex- periment with (i) taking a weighted average of the EL candidate representations based on their prior (in Wikipedia), and (ii) using an attention scheme over the EL candidate list. Results demonstrate an increase of up to 5% F1-score for the evaluated IE tasks on two datasets. Despite a strong performance of the prior-based model, our quantitative and quali- tative analysis reveals the advantage of using the attention-based approach.

Injecting knowledge base information into end-to-end joint entity and relation extraction and coreference resolution

S. Verlinden, K. Zaporojets, J. Deleu, T. Demeester and C. Develder


in Findings of the ACL: ACL-IJCNLP 2021, Bangkok, Thailand, 1-6 Aug. 2021.

@inproceedings{verlinden2021,
author = {Verlinden, Severine and Zaporojets, Klim and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Injecting knowledge base information into end-to-end joint entity and relation extraction and coreference resolution},
booktitle = {Findings of the ACL: ACL-IJCNLP 2021},
month = {1--6 Aug.},
year = {2021},
address = {Bangkok, Thailand},
doi = {10.18653/v1/2021.findings-acl.171}
}

pubarticle

K. Zaporojets, G. Bekoulis, J. Deleu, T. Demeester and C. Develder, "Solving arithmetic word problems by scoring equations with recursive neural networks", Expert Syst. Appl., Vol. 174, 15 Jul. 2021.

Solving arithmetic word problems by scoring equations with recursive neural networks

K. Zaporojets, G. Bekoulis, J. Deleu, T. Demeester and C. Develder


Expert Syst. Appl., Vol. 174, 15 Jul. 2021.

Solving arithmetic word problems is a cornerstone task in assessing language understanding and reasoning capabilities in NLP systems. Recent works use automatic extraction and ranking of candidate solution equations providing the answer to arithmetic word problems. In this work, we explore novel approaches to score such candidate solution equations using tree-structured recursive neural network (Tree-RNN) configurations. The advantage of this Tree-RNN approach over using more established sequential representations, is that it can naturally capture the structure of the equations. Our proposed method consists of transforming the mathematical expression of the equation into an expression tree. Further, we encode this tree into a Tree-RNN by using different Tree-LSTM architectures. Experimental results show that our proposed method (i) improves overall performance with more than 3% accuracy points compared to previous state-of-the-art, and with over 15% points on a subset of problems that require more complex reasoning, and (ii) outperforms sequential LSTMs by 4% accuracy points on such more complex problems.

Solving arithmetic word problems by scoring equations with recursive neural networks

K. Zaporojets, G. Bekoulis, J. Deleu, T. Demeester and C. Develder


Expert Syst. Appl., Vol. 174, 15 Jul. 2021.

@article{zaporojets2021eswa,
author = {Zaporojets, Klim and Bekoulis, Giannis and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Solving arithmetic word problems by scoring equations with recursive neural networks},
journal = {Expert Syst. Appl.},
month = {15 Jul.},
year = {2021},
volume = {174},
doi = {10.1016/j.eswa.2021.114704}
}

pubarticle

K. Zaporojets, J. Deleu, C. Develder and T. Demeester, "DWIE: An entity-centric dataset for multi-task document-level information extraction", Inf. Process. Manag., Vol. 158, No. 4, Jul. 2021.

DWIE: An entity-centric dataset for multi-task document-level information extraction

K. Zaporojets, J. Deleu, C. Develder and T. Demeester


Inf. Process. Manag., Vol. 158, No. 4, Jul. 2021.

This paper presents DWIE, the ‘Deutsche Welle corpus for Information Extraction’, a newly created multi-task dataset that combines four main Information Extraction (IE) annotation subtasks: (i) Named Entity Recognition (NER), (ii) Coreference Resolution, (iii) Relation Extraction (RE), and (iv) Entity Linking. DWIE is conceived as an entity-centric dataset that describes interactions and properties of conceptual entities on the level of the complete document. This contrasts with currently dominant mention-driven approaches that start from the detection and classification of named entity mentions in individual sentences. Further, DWIE presented two main challenges when building and evaluating IE models for it. First, the use of traditional mention-level evaluation metrics for NER and RE tasks on entity-centric DWIE dataset can result in measurements dominated by predictions on more frequently mentioned entities. We tackle this issue by proposing a new entity-driven metric that takes into account the number of mentions that compose each of the predicted and ground truth entities. Second, the document-level multi-task annotations require the models to transfer information between entity mentions located in different parts of the document, as well as between different tasks, in a joint learning setting. To realize this, we propose to use graph-based neural message passing techniques between document-level mention spans. Our experiments show an improvement of up to 5.5 F1 percentage points when incorporating neural graph propagation into our joint model. This demonstrates DWIE’s potential to stimulate further research in graph neural networks for representation learning in multi-task IE. We make DWIE publicly available at https://github.com/klimzaporojets/DWIE.

DWIE: An entity-centric dataset for multi-task document-level information extraction

K. Zaporojets, J. Deleu, C. Develder and T. Demeester


Inf. Process. Manag., Vol. 158, No. 4, Jul. 2021.

@article{zaporojets2021dwie,
author = {Zaporojets, Klim and Deleu, Johannes and Develder, Chris and Demeester, Thomas},
title = {DWIE: An entity-centric dataset for multi-task document-level information extraction},
journal = {Inf. Process. Manag.},
month = {Jul.},
year = {2021},
volume = {158},
number = {4},
doi = {10.1016/j.ipm.2021.102563}
}

pubinproceedings

A. Hadifar, S. Labat, V. Hoste, C. Develder and T. Demeester, "A million tweets are worth a few points: Tuning transformers for customer support tasks", in Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL 2021), Online, 6-11 Jun. 2021.

A million tweets are worth a few points: Tuning transformers for customer support tasks

A. Hadifar, S. Labat, V. Hoste, C. Develder and T. Demeester


in Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL 2021), Online, 6-11 Jun. 2021.

In online domain-specific customer service applications, many companies struggle to deploy advanced NLP models successfully, due to the limited availability of and noise in their datasets. While prior research demonstrated the potential of migrating large open-domain pretrained models for domain-specific tasks, the appropriate (pre)training strategies have not yet been rigorously evaluated in such social media customer service settings, especially under multilingual conditions. We address this gap by (i) collecting a multilingual social media corpus containing customer service conversations (865k tweets), (ii) comparing various pipelines of pretraining and fine- tuning approaches, (iii) applying them on 5 different end tasks. We show that pretraining a generic multilingual transformer model on our in-domain dataset, before finetuning on specific end tasks, consistently boosts performance, especially in non-English settings.

A million tweets are worth a few points: Tuning transformers for customer support tasks

A. Hadifar, S. Labat, V. Hoste, C. Develder and T. Demeester


in Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL 2021), Online, 6-11 Jun. 2021.

@inproceedings{hadifar2021naacl,
author = {Hadifar, Amir and Labat, Sofie and Hoste, Veronique and Develder, Chris and Demeester, Thomas},
title = {A million tweets are worth a few points: Tuning transformers for customer support tasks},
booktitle = {Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL 2021)},
month = {6--11 Jun.},
year = {2021},
address = {Online},
url = {https://www.aclweb.org/anthology/2021.naacl-main.21/}
}

pubinproceedings

Y. Jiang, K. Zaporojets, J. Deleu, T. Demeester and C. Develder, "Recipe instruction semantics corpus (RISeC): Resolving semantic structure and zero anaphora in recipes", in Proc. 1st Conf. Asia-Pacific Chapter of the Assoc. Comput. Linguist. and 10th Int. Joint Conf. Natural Lang. Processing (AACL-IJCNLP 2020), Online, 4-7 Dec. 2020, pp. 821-826.

Recipe instruction semantics corpus (RISeC): Resolving semantic structure and zero anaphora in recipes

Y. Jiang, K. Zaporojets, J. Deleu, T. Demeester and C. Develder


in Proc. 1st Conf. Asia-Pacific Chapter of the Assoc. Comput. Linguist. and 10th Int. Joint Conf. Natural Lang. Processing (AACL-IJCNLP 2020), Online, 4-7 Dec. 2020, pp. 821-826.

We propose a newly annotated dataset for information extraction on recipes. Unlike previous approaches to machine comprehension of procedural texts, we avoid a priori pre-defining domain-specific predicates to recognize (e.g., the primitive instructions in MILK) and focus on basic understanding of the expressed semantics rather than directly reduce them to a simplified state representation (e.g., ProPara). We thus frame the semantic comprehension of procedural text such as recipes, as fairly generic NLP subtasks, covering (i) entity recognition (ingredients, tools and actions), (ii) relation extraction (what ingredients and tools are involved in the actions), and (iii) zero anaphora resolution (link actions to implicit arguments, e.g., results from previous recipe steps). Further, our Recipe Instruction Semantic Corpus (RISeC) dataset includes textual descriptions for the zero anaphora, to facilitate language generation thereof. Besides the dataset itself, we contribute a pipeline neural architecture that addresses entity and relation extraction as well as identification of zero anaphora. These basic building blocks can facilitate more advanced downstream applications (e.g., question answering, conversational agents).

Recipe instruction semantics corpus (RISeC): Resolving semantic structure and zero anaphora in recipes

Y. Jiang, K. Zaporojets, J. Deleu, T. Demeester and C. Develder


in Proc. 1st Conf. Asia-Pacific Chapter of the Assoc. Comput. Linguist. and 10th Int. Joint Conf. Natural Lang. Processing (AACL-IJCNLP 2020), Online, 4-7 Dec. 2020, pp. 821-826.

@inproceedings{jiang2020aacl,
author = {Jiang, Yiwei and Zaporojets, Klim and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Recipe instruction semantics corpus (RISeC): Resolving semantic structure and zero anaphora in recipes},
booktitle = {Proc. 1st Conf. Asia-Pacific Chapter of the Assoc. Comput. Linguist. and 10th Int. Joint Conf. Natural Lang. Processing (AACL-IJCNLP 2020)},
month = {4--7 Dec.},
year = {2020},
pages = {821--826},
address = {Online},
url = {https://www.aclweb.org/anthology/2020.aacl-main.82}
}

pubinproceedings

G. Bekoulis, J. Deleu, T. Demeester and C. Develder, "Adversarial perturbations for joint entity and relation extraction", in Proc. 28th Belgian Dutch Conf. Machine Learn. (BeneLearn 2019), Brussels, Belgium, 6-8 Nov. 2019.

Adversarial perturbations for joint entity and relation extraction

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. 28th Belgian Dutch Conf. Machine Learn. (BeneLearn 2019), Brussels, Belgium, 6-8 Nov. 2019.

The goal of the entity recognition and relation extraction task is to discover relational structures of entity mentions from unstructured texts. It is a central problem in information extraction since it is critical for tasks such as knowledge base population and question answering. In this work, we focus on extending the training procedure of our newly proposed general purpose joint model [4] for entity recognition and relation extraction with adversarial training (AT) [2]. Our model performs the two tasks of entity recognition and relation extraction simultaneously. It achieves state-of-the-art performance in a number of different contexts (i.e., news, biomedical, real estate) and languages (i.e., English, Dutch) without relying on any manually engineered features nor additional NLP tools. In summary, our proposed model: (i) does not rely on external NLP tools nor hand-crafted features, (ii) entities and relations within the same text fragment (typically a sentence) are extracted simultaneously, where (iii) an entity can be involved in multiple relations at once. To evaluate the proposed AT method, we perform the same set of experiments while we apply AT on top of our joint model. Compared to the baseline model, applying AT during training leads to a consistent additional increase in joint extraction effectiveness.

Adversarial perturbations for joint entity and relation extraction

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. 28th Belgian Dutch Conf. Machine Learn. (BeneLearn 2019), Brussels, Belgium, 6-8 Nov. 2019.

@inproceedings{bekoulis2019benelearn,
author = {Bekoulis, Giannis and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Adversarial perturbations for joint entity and relation extraction},
booktitle = {Proc. 28th Belgian Dutch Conf. Machine Learn. (BeneLearn 2019)},
month = {6--8 Nov.},
year = {2019},
address = {Brussels, Belgium},
url = {http://ceur-ws.org/Vol-2491/abstract5.pdf}
}

pubinproceedings

L. De Raedt, R. Manhaeve, S. Dumančić, T. Demeester and A. Kimmig, "Neuro-Symbolic = Neural + Logical + Probabilistic", in Proc. 14th Int. Workshop Neural-Symbolic Learn. and Reasoning (NeSy 2019 @ IJCAI 2019), Macao, China, 12 Aug. 2019.

Neuro-Symbolic = Neural + Logical + Probabilistic

L. De Raedt, R. Manhaeve, S. Dumančić, T. Demeester and A. Kimmig


in Proc. 14th Int. Workshop Neural-Symbolic Learn. and Reasoning (NeSy 2019 @ IJCAI 2019), Macao, China, 12 Aug. 2019.

The overall goal of neuro-symbolic computation is to integrate high-level reasoning with low-level perception. We argue (1) that neuro-symbolic computation should integrate neural networks with the two most prominent methods for reasoning, that is, logic and probability, and (2) that neuro-symbolic integrated methods should have the pure neural, logical and probabilistic methods as special cases. We examine the state-of-the-art with regard to these claims and briefly position our own contribution DeepProbLog in this perspective.

Neuro-Symbolic = Neural + Logical + Probabilistic

L. De Raedt, R. Manhaeve, S. Dumančić, T. Demeester and A. Kimmig


in Proc. 14th Int. Workshop Neural-Symbolic Learn. and Reasoning (NeSy 2019 @ IJCAI 2019), Macao, China, 12 Aug. 2019.

@inproceedings{deraedt2019,
author = {De Raedt, Luc and Manhaeve, Robin and Dumančić, Sebastijan and Demeester, Thomas and Kimmig, Angelika},
title = {Neuro-Symbolic = Neural + Logical + Probabilistic},
booktitle = {Proc. 14th Int. Workshop Neural-Symbolic Learn. and Reasoning (NeSy 2019 @ IJCAI 2019)},
month = {12 Aug.},
year = {2019},
address = {Macao, China}
}

pubinproceedings

A. Hadifar, L. Sterckx, T. Demeester and C. Develder, "A self-training approach for short text clustering", in Proc. 4th Workshop Represent. Learn. for NLP (RepL4NLP) at ACL 2019, Florence, Italy, 2 Aug. 2019, pp. 194-199.

A self-training approach for short text clustering

A. Hadifar, L. Sterckx, T. Demeester and C. Develder


in Proc. 4th Workshop Represent. Learn. for NLP (RepL4NLP) at ACL 2019, Florence, Italy, 2 Aug. 2019, pp. 194-199.

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.

A self-training approach for short text clustering

A. Hadifar, L. Sterckx, T. Demeester and C. Develder


in Proc. 4th Workshop Represent. Learn. for NLP (RepL4NLP) at ACL 2019, Florence, Italy, 2 Aug. 2019, pp. 194-199.

@inproceedings{hadifar2019repl4nlp,
author = {Hadifar, Amir and Sterckx, Lucas and Demeester, Thomas and Develder, Chris},
title = {A self-training approach for short text clustering},
booktitle = {Proc. 4th Workshop Represent. Learn. for NLP (RepL4NLP) at ACL 2019},
month = {2 Aug.},
year = {2019},
pages = {194--199},
address = {Florence, Italy},
url = {https://www.aclweb.org/anthology/papers/W/W19/W19-4322/}
}

pubarticle

C. De Boom, T. Demeester and B. Dhoedt, "Character-level recurrent neural networks in practice: comparing training and sampling schemes", Neural Comput. & Applic., Vol. 31, No. 8, Aug. 2019, pp. 4001-4017.

Character-level recurrent neural networks in practice: comparing training and sampling schemes

C. De Boom, T. Demeester and B. Dhoedt


Neural Comput. & Applic., Vol. 31, No. 8, Aug. 2019, pp. 4001-4017.

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

Character-level recurrent neural networks in practice: comparing training and sampling schemes

C. De Boom, T. Demeester and B. Dhoedt


Neural Comput. & Applic., Vol. 31, No. 8, Aug. 2019, pp. 4001-4017.

@article{deboom2019,
author = {De Boom, Cedric and Demeester, Thomas and Dhoedt, Bart},
title = {Character-level recurrent neural networks in practice: comparing training and sampling schemes},
journal = {Neural Comput. & Applic.},
month = {Aug.},
year = {2019},
volume = {31},
number = {8},
pages = {4001--4017},
doi = {10.1007/s00521-017-3322-z}
}

pubinproceedings

S.K. Bitew, G. Bekoulis, J. Deleu, L. Sterckx, K. Zaporojets, T. Demeester and C. Develder, "Predicting suicide risk from online postings in Reddit: The UGent-IDLab submission to the CLPysch 2019 Shared Task A", in Proc. 6th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2019) at NAACL-HLT 2019, Minneapolis, MN, USA, 6 Jun. 2019, pp. 158-161.

Predicting suicide risk from online postings in Reddit: The UGent-IDLab submission to the CLPysch 2019 Shared Task A

S.K. Bitew, G. Bekoulis, J. Deleu, L. Sterckx, K. Zaporojets, T. Demeester and C. Develder


in Proc. 6th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2019) at NAACL-HLT 2019, Minneapolis, MN, USA, 6 Jun. 2019, pp. 158-161.

This paper describes IDLab’s text classifica-tion systems submitted to Task A as part of the CLPsych 2019 shared task. The aim of this shared task was to develop automated sys-tems that predict the degree of suicide risk of people based on their posts on Reddit. Bag-of-words features, emotion features and post-level predictions are used to derive user-levelpredictions. Linear models and ensembles of these models are used to predict final scores. We find that predicting fine-grained risk levels is much more difficult than flagging potentially at-risk users. Furthermore, we do not find clear added value from building richer ensembles compared to simple baselines, given the available training data and the nature of the prediction task.

Predicting suicide risk from online postings in Reddit: The UGent-IDLab submission to the CLPysch 2019 Shared Task A

S.K. Bitew, G. Bekoulis, J. Deleu, L. Sterckx, K. Zaporojets, T. Demeester and C. Develder


in Proc. 6th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2019) at NAACL-HLT 2019, Minneapolis, MN, USA, 6 Jun. 2019, pp. 158-161.

@inproceedings{bitew2019clpsych,
author = {Bitew, Semere Kiros and Giannis Bekoulis and Johannes Deleu and Lucas Sterckx and Klim Zaporojets and Thomas Demeester and Chris Develder},
title = {Predicting suicide risk from online postings in Reddit: The UGent-IDLab submission to the CLPysch 2019 Shared Task A},
booktitle = {Proc. 6th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2019) at NAACL-HLT 2019},
month = {6 Jun.},
year = {2019},
pages = {158--161},
address = {Minneapolis, MN, USA},
url = {https://www.aclweb.org/anthology/papers/W/W19/W19-3019/},
doi = {10.18653/v1/W19-3019}
}

pubinproceedings

G. Bekoulis, J. Deleu, T. Demeester and C. Develder, "Sub-event detection from Twitter streams as a sequence labeling problem", in Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL-HLT 2019), Minneapolis, MN, USA, 3-5 Jun. 2019, pp. 745-750.

Sub-event detection from Twitter streams as a sequence labeling problem

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL-HLT 2019), Minneapolis, MN, USA, 3-5 Jun. 2019, pp. 745-750.

This paper introduces improved methods for sub-event detection in social media streams ,by applying neural sequence models not only on the level of individual posts, but also directly on the stream level. Current approaches to identify sub-events within a given event (e.g., a goal during a soccer match), essentially do not exploit the sequential nature of social media streams. We address this shortcoming by framing the sub-event detection problem in social media streams as a sequence labeling task and adopt a neural sequence architecture that explicitly accounts for the chronological order of posts. Specifically, we (i) establish aneural baseline that outperforms a graph-based state-of-the-art method for binary sub-event detection (2.7% F1 improvement), as well as (ii) demonstrate superiority of a recurrent neural network model on the posts sequence level for labeled sub-events (2.4% F1 improvement over non-sequential models).

Sub-event detection from Twitter streams as a sequence labeling problem

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL-HLT 2019), Minneapolis, MN, USA, 3-5 Jun. 2019, pp. 745-750.

@inproceedings{Bekoulis2019NAACL,
author = {Bekoulis, Giannis and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Sub-event detection from Twitter streams as a sequence labeling problem},
booktitle = {Proc. Ann. Conf. North American Chapter Assoc. Comp. Linguist. (NAACL-HLT 2019)},
month = {3--5 Jun.},
year = {2019},
pages = {745--750},
address = {Minneapolis, MN, USA},
url = {https://www.aclweb.org/anthology/papers/N/N19/N19-1081/},
doi = {10.18653/v1/N19-1081}
}

pubarticle

G. Bekoulis, J. Deleu, T. Demeester and C. Develder, "Joint entity recognition and relation extraction as a multi-head selection problem", Expert Syst. Appl., Vol. 114, Dec. 2018, pp. 34-45.

Joint entity recognition and relation extraction as a multi-head selection problem

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


Expert Syst. Appl., Vol. 114, Dec. 2018, pp. 34-45.

State-of-the-art models for joint entity recognition and relation extraction strongly rely on external natural language processing (NLP) tools such as POS (part-of-speech) taggers and dependency parsers. Thus, the performance of such joint models depends on the quality of the features obtained from these NLP tools. However, these features are not always accurate for various languages and contexts. In this paper, we propose a joint neural model which performs entity recognition and relation extraction simultaneously, without the need of any manually extracted features or the use of any external tool. Specifically, we model the entity recognition task using a CRF (Conditional Random Fields) layer and the relation extraction task as a multi-head selection problem (i.e., potentially identify multiple relations for each entity). We present an extensive experimental setup, to demonstrate the effectiveness of our method using datasets from various contexts (i.e., news, biomedical, real estate) and languages (i.e., English, Dutch). Our model outperforms the previous neural models that use automatically extracted features, while it performs within a reasonable margin of feature-based neural models, or even beats them.

Joint entity recognition and relation extraction as a multi-head selection problem

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


Expert Syst. Appl., Vol. 114, Dec. 2018, pp. 34-45.

@article{bekoulis2018eswa2,
author = {Giannis Bekoulis and Johannes Deleu and Thomas Demeester and Chris Develder},
title = {Joint entity recognition and relation extraction as a multi-head selection problem},
journal = {Expert Syst. Appl.},
month = {Dec.},
year = {2018},
volume = {114},
pages = {34--45},
doi = {10.1016/j.eswa.2018.07.032}
}

pubinproceedings

G. Bekoulis, J. Deleu, T. Demeester and C. Develder, "Adversarial training for multi-context joint entity and relation extraction", in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018), Brussels, Belgium, 31 Oct. - 4 Nov. 2018, pp. 2830-36.

Adversarial training for multi-context joint entity and relation extraction

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018), Brussels, Belgium, 31 Oct. - 4 Nov. 2018, pp. 2830-36.

Adversarial training (AT) is a regularization method that can be used to improve the robustness of neural network methods by adding small perturbations in the training data. We show how to use AT for the tasks of entity recognition and relation extraction. In particular, we demonstrate that applying AT to a general purpose baseline model for jointly extracting entities and relations, allows improving the state-of-the-art effectiveness on several datasets in different contexts (i.e., news, biomedical, and real estate data) and for different languages (English and Dutch).

Adversarial training for multi-context joint entity and relation extraction

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018), Brussels, Belgium, 31 Oct. - 4 Nov. 2018, pp. 2830-36.

@inproceedings{bekoulis2018emnlp,
author = {Bekoulis, Giannis and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Adversarial training for multi-context joint entity and relation extraction},
booktitle = {Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018)},
month = {31 Oct. -- 4 Nov.},
year = {2018},
pages = {2830--36},
address = {Brussels, Belgium},
url = {https://www.aclweb.org/anthology/papers/D/D18/D18-1307/},
doi = {10.18653/v1/D18-1307}
}

pubinproceedings

F. Godin, K. Demuynck, J. Dambre, W. De Neve and T. Demeester, "Explaining character-aware neural networks for word-level prediction: Do they discover linguistic rules?", in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018), Brussels, Belgium, 31 Oct. - 4 Nov. 2018, pp. 3275-3284.

Explaining character-aware neural networks for word-level prediction: Do they discover linguistic rules?

F. Godin, K. Demuynck, J. Dambre, W. De Neve and T. Demeester


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018), Brussels, Belgium, 31 Oct. - 4 Nov. 2018, pp. 3275-3284.

Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns those models learn. Moreover, models are often compared only quantitatively while a qualitative analysis is missing. In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations. To that end, we extend the contextual decomposition technique (Murdoch et al. 2018) to convolutional neural networks which allows us to compare convolutional neural networks and bidirectional long short-term memory networks. We evaluate and compare these models for the task of morphological tagging on three morphologically different languages and show that these models implicitly discover understandable linguistic rules.

Explaining character-aware neural networks for word-level prediction: Do they discover linguistic rules?

F. Godin, K. Demuynck, J. Dambre, W. De Neve and T. Demeester


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018), Brussels, Belgium, 31 Oct. - 4 Nov. 2018, pp. 3275-3284.

@inproceedings{godin2018emnlp,
author = {Godin, Frederic and Kris Demuynck and Joni Dambre and De Neve, Wesley and Thomas Demeester},
title = {Explaining character-aware neural networks for word-level prediction: Do they discover linguistic rules?},
booktitle = {Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2018)},
month = {31 Oct. -- 4 Nov.},
year = {2018},
pages = {3275--3284},
address = {Brussels, Belgium},
url = {https://www.aclweb.org/anthology/D18-1365},
doi = {10.18653/v1/D18-1365}
}

pubinproceedings

T. Demeester, J. Deleu, F. Godin and C. Develder, "Predefined sparseness in recurrent sequence models", in Proc. SIGNLL Conf. Comput. Lang. Learn. (CoNLL 2018), Brussels, Belgium, 31 Oct. - 1 Nov. 2018, pp. 324-333.

Predefined sparseness in recurrent sequence models

T. Demeester, J. Deleu, F. Godin and C. Develder


in Proc. SIGNLL Conf. Comput. Lang. Learn. (CoNLL 2018), Brussels, Belgium, 31 Oct. - 1 Nov. 2018, pp. 324-333.

Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.

Predefined sparseness in recurrent sequence models

T. Demeester, J. Deleu, F. Godin and C. Develder


in Proc. SIGNLL Conf. Comput. Lang. Learn. (CoNLL 2018), Brussels, Belgium, 31 Oct. - 1 Nov. 2018, pp. 324-333.

@inproceedings{demeester2018conll,
author = {Demeester, Thomas and Deleu, Johannes and Godin, Frederic and Develder, Chris},
title = {Predefined sparseness in recurrent sequence models},
booktitle = {Proc. SIGNLL Conf. Comput. Lang. Learn. (CoNLL 2018)},
month = {31 Oct. -- 1 Nov.},
year = {2018},
pages = {324--333},
address = {Brussels, Belgium},
url = {https://www.aclweb.org/anthology/papers/K/K18/K18-1032/},
doi = {10.18653/v1/K18-1032}
}

pubarticle

G. Bekoulis, J. Deleu, T. Demeester and C. Develder, "An attentive neural architecture for joint segmentation and parsing and its application to real estate ads", Expert Syst. Appl., Vol. 102, 15 Jul. 2018, pp. 100-112.

An attentive neural architecture for joint segmentation and parsing and its application to real estate ads

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


Expert Syst. Appl., Vol. 102, 15 Jul. 2018, pp. 100-112.

In processing human produced text using natural language processing (NLP) techniques, two fundamental subtasks that arise are (i) segmentation of the plain text into meaningful subunits (e.g., entities), and (ii) dependency parsing, to establish relations between subunits. Such structural interpretation of text provides essential building blocks for upstream expert system tasks: e.g., from interpreting textual real estate ads, one may want to provide an accurate price estimate and/or provide selection filters for end users looking for a particular property -- which all could rely on knowing the types and number of rooms, etc. In this paper we develop a relatively simple and effective neural joint model that performs both segmentation and dependency parsing together, instead of one after the other as in most state-of-the-art works. We will focus in particular on the real estate ad setting, aiming to convert an ad to a structured description, which we name property tree, comprising the tasks of (1) identifying important entities of a property (e.g., rooms) from classifieds and (2) structuring them into a tree format. In this work, we propose a new joint model that is able to tackle the two tasks simultaneously and construct the property tree by (i) avoiding the error propagation that would arise from the subtasks one after the other in a pipelined fashion, and (ii) exploiting the interactions between the subtasks. For this purpose, we perform an extensive comparative study of the pipeline methods and the new proposed joint model, reporting an improvement of over three percentage points in the overall edge F1 score of the property tree. Also, we propose attention methods, to encourage our model to focus on salient tokens during the construction of the property tree. Thus we experimentally demonstrate the usefulness of attentive neural architectures for the proposed joint model, showcasing a further improvement of two percentage points in edge F1 score for our application. While the results demonstrated are for the particular real estate setting, the model is generic in nature, and thus could be equally applied to other expert system scenarios requiring the general tasks of both (i) detecting entities (segmentation) and (ii) establishing relations among them (dependency parsing).

An attentive neural architecture for joint segmentation and parsing and its application to real estate ads

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


Expert Syst. Appl., Vol. 102, 15 Jul. 2018, pp. 100-112.

@article{bekoulis2018eswa,
author = {Bekoulis, Giannis and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {An attentive neural architecture for joint segmentation and parsing and its application to real estate ads},
journal = {Expert Syst. Appl.},
month = {15 Jul.},
year = {2018},
volume = {102},
pages = {100--112},
doi = {10.1016/j.eswa.2018.02.031}
}

pubinproceedings

D. Weissenborn, P. Minervini, T. Dettmers, I. Augenstein, J. Welbl, T. Rocktaschel, M. Bosnjak, J. Mitchell, T. Demeester, P. Stenetorp and S. Riedel, "Jack the Reader - A machine reading framework", in Proc. 56th Annual. Meeting Assoc. Comput. Ling. - Demos Track (ACL 2018), Melbourne, Australia, 15-20 Jul. 2018.

Jack the Reader - A machine reading framework

D. Weissenborn, P. Minervini, T. Dettmers, I. Augenstein, J. Welbl, T. Rocktaschel, M. Bosnjak, J. Mitchell, T. Demeester, P. Stenetorp and S. Riedel


in Proc. 56th Annual. Meeting Assoc. Comput. Ling. - Demos Track (ACL 2018), Melbourne, Australia, 15-20 Jul. 2018.

Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of related tasks would allow for expressive modelling, and easier model comparison and replication. To that end, we present Jack the Reader (Jack), a framework for Machine Reading that allows for quick model prototyping by component reuse, evaluation of new models on existing datasets as well as integrating new datasets and applying them on a growing set of implemented baseline models. Jack is currently supporting (but not limited to) three tasks: Question Answering, Natural Language Inference, and Link Prediction. It is developed with the aim of increasing research efficiency and code reuse.

Jack the Reader - A machine reading framework

D. Weissenborn, P. Minervini, T. Dettmers, I. Augenstein, J. Welbl, T. Rocktaschel, M. Bosnjak, J. Mitchell, T. Demeester, P. Stenetorp and S. Riedel


in Proc. 56th Annual. Meeting Assoc. Comput. Ling. - Demos Track (ACL 2018), Melbourne, Australia, 15-20 Jul. 2018.

@inproceedings{weissenborn2018,
author = {Dirk Weissenborn and Pasquale Minervini and Tim Dettmers and Isabelle Augenstein and Johannes Welbl and Tim Rocktaschel and Matko Bosnjak and Jeff Mitchell and Thomas Demeester and Pontus Stenetorp and Sebastian Riedel},
title = {Jack the Reader - A machine reading framework},
booktitle = {Proc. 56th Annual. Meeting Assoc. Comput. Ling. - Demos Track (ACL 2018)},
month = {15--20 Jul.},
year = {2018},
address = {Melbourne, Australia}
}

pubarticle

L. Sterckx, J. Deleu, T. Demeester and C. Develder, "Prior attention for style-aware sequence-to-sequence models", Arxiv Preprint, 2018.

Prior attention for style-aware sequence-to-sequence models

L. Sterckx, J. Deleu, T. Demeester and C. Develder


Arxiv Preprint, 2018.

We extend sequence-to-sequence models with the possibility to control the characteristics or style of the generated output, via attention that is generated a priori (before decoding) from a latent code vector. After training an initial attention-based sequence-to-sequence model, we use a variational auto-encoder conditioned on representations of input sequences and a latent code vector space to generate attention matrices. By sampling the code vector from specific regions of this latent space during decoding and imposing prior attention generated from it in the seq2seq model, output can be steered towards having certain attributes. This is demonstrated for the task of sentence simplification, where the latent code vector allows control over output length and lexical simplification, and enables fine-tuning to optimize for different evaluation metrics.

Prior attention for style-aware sequence-to-sequence models

L. Sterckx, J. Deleu, T. Demeester and C. Develder


Arxiv Preprint, 2018.

@article{sterckx2018arxiv,
author = {Sterckx, Lucas and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Prior attention for style-aware sequence-to-sequence models},
journal = {Arxiv Preprint},
year = {2018}
}

pubinproceedings

K. Zaporojets, L. Sterckx, J. Deleu, T. Demeester and C. Develder, "Predicting psychological health from childhood essays: The UGent-IDLab CLPsych 2018 shared task system", in Proc. 5th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2018) at NAACL-HLT 2018, New Orleans, LA, USA, 5 Jun. 2018, pp. 119-125.

Predicting psychological health from childhood essays: The UGent-IDLab CLPsych 2018 shared task system

K. Zaporojets, L. Sterckx, J. Deleu, T. Demeester and C. Develder


in Proc. 5th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2018) at NAACL-HLT 2018, New Orleans, LA, USA, 5 Jun. 2018, pp. 119-125.

This paper describes the IDLab system submitted to Task A of the CLPsych 2018 shared task. The goal of this task is predicting psychological health of children based on language used in hand-written essays and socio-demographic control variables. Our entry uses word- and character-based features as well as lexicon-based features and features derived from the essays such as the quality of the language. We apply linear models, gradient boosting as well as neural-network based regressors (feed-forward, CNNs and RNNs) to predict scores. We then make ensembles of our best performing models using a weighted average.

Predicting psychological health from childhood essays: The UGent-IDLab CLPsych 2018 shared task system

K. Zaporojets, L. Sterckx, J. Deleu, T. Demeester and C. Develder


in Proc. 5th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2018) at NAACL-HLT 2018, New Orleans, LA, USA, 5 Jun. 2018, pp. 119-125.

@inproceedings{zaporojets2018clpsych,
author = {Zaporojets, Klim and Lucas Sterckx and Johannes Deleu and Thomas Demeester and Chris Develder},
title = {Predicting psychological health from childhood essays: The UGent-IDLab CLPsych 2018 shared task system},
booktitle = {Proc. 5th Ann. Workshop on Comput. Ling. Clin. Psychol. (CLPsych 2018) at NAACL-HLT 2018},
month = {5 Jun.},
year = {2018},
pages = {119--125},
address = {New Orleans, LA, USA},
url = {https://www.aclweb.org/anthology/papers/W/W18/W18-0613/},
doi = {10.18653/v1/W18-0613}
}

pubarticle

L. Sterckx, T. Demeester, J. Deleu and C. Develder, "Creation and evaluation of large keyphrase extraction collections with multiple opinions", Lang. Resour. Eval., Vol. 52, No. 2, Jul. 2018, pp. 503-532.

Creation and evaluation of large keyphrase extraction collections with multiple opinions

L. Sterckx, T. Demeester, J. Deleu and C. Develder


Lang. Resour. Eval., Vol. 52, No. 2, Jul. 2018, pp. 503-532.

While several Automatic Keyphrase Extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large user panel. Our user study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.

Creation and evaluation of large keyphrase extraction collections with multiple opinions

L. Sterckx, T. Demeester, J. Deleu and C. Develder


Lang. Resour. Eval., Vol. 52, No. 2, Jul. 2018, pp. 503-532.

@article{Sterckx2017LREV,
author = {Sterckx, Lucas and Demeester, Thomas and Deleu, Johannes and Develder, Chris},
title = {Creation and evaluation of large keyphrase extraction collections with multiple opinions},
journal = {Lang. Resour. Eval.},
month = {Jul.},
year = {2018},
volume = {52},
number = {2},
pages = {503--532},
doi = {10.1007/s10579-017-9395-6}
}

pubarticle

C. De Boom, R. Agrawal, S. Hansen, E. Kumar, R. Yon, C.-W. Chen, T. Demeester and B. Dhoedt, "Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales", Multimed. Tools Applic., Vol. 77, Jun. 2018, pp. 15385-15407.

Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

C. De Boom, R. Agrawal, S. Hansen, E. Kumar, R. Yon, C.-W. Chen, T. Demeester and B. Dhoedt


Multimed. Tools Applic., Vol. 77, Jun. 2018, pp. 15385-15407.

The amount of content on online music streaming platforms is immense, and most users only access a tiny fraction of this content. Recommender systems are the application of choice to open up the collection to these users. Collaborative filtering has the disadvantage that it relies on explicit ratings, which are often unavailable, and generally disregards the temporal nature of music consumption. On the other hand, item co-occurrence algorithms, such as the recently introduced word2vec-based recommenders, are typically left without an effective user representation. In this paper, we present a new approach to model users through recurrent neural networks by sequentially processing consumed items, represented by any type of embeddings and other context features. This way we obtain semantically rich user representations, which capture a user’s musical taste over time. Our experimental analysis on large-scale user data shows that our model can be used to predict future songs a user will likely listen to, both in the short and long term.

Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales

C. De Boom, R. Agrawal, S. Hansen, E. Kumar, R. Yon, C.-W. Chen, T. Demeester and B. Dhoedt


Multimed. Tools Applic., Vol. 77, Jun. 2018, pp. 15385-15407.

@article{deboom2018,
author = {De Boom, Cedric and Agrawal, Rohan and Hansen, Samantha and Kumar, Esh and Yon, Romain and Chen, Ching-Wei and Demeester, Thomas and Dhoedt, Bart},
title = {Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales},
journal = {Multimed. Tools Applic.},
month = {Jun.},
year = {2018},
volume = {77},
pages = {15385--15407},
doi = {10.1007/s11042-017-5121-z}
}

pubarticle

S. Van Canneyt, P. Leroux, B. Dhoedt and T. Demeester, "Modeling and predicting the popularity of online news based on temporal and content-related features", Multimed. Tools Appl., Vol. 77, No. 1, Jan. 2018, pp. 1409-1436.

Modeling and predicting the popularity of online news based on temporal and content-related features

S. Van Canneyt, P. Leroux, B. Dhoedt and T. Demeester


Multimed. Tools Appl., Vol. 77, No. 1, Jan. 2018, pp. 1409-1436.

As the market of globally available online news is large and still growing, there is a strong competition between online publishers in order to reach the largest possible audience. Therefore an intelligent online publishing strategy is of the highest importance to publishers. A prerequisite for being able to optimize any online strategy, is to have trustworthy predictions of how popular new online content may become. This paper presents a novel methodology to model and predict the popularity of online news. We first introduce a new strategy and mathematical model to capture view patterns of online news. After a thorough analysis of such view patterns, we show that well-chosen base functions lead to suitable models, and show how the influence of day versus night on the total view patterns can be taken into account to further increase the accuracy, without leading to more complex models. Second, we turn to the prediction of future popularity, given recently published content. By means of a new real-world dataset, we show that the combination of features related to content, meta-data, and the temporal behavior leads to significantly improved predictions, compared to existing approaches which only consider features based on the historical popularity of the considered articles. Whereas traditionally linear regression is used for the application under study, we show that the more expressive gradient tree boosting method proves beneficial for predicting news popularity.

Modeling and predicting the popularity of online news based on temporal and content-related features

S. Van Canneyt, P. Leroux, B. Dhoedt and T. Demeester


Multimed. Tools Appl., Vol. 77, No. 1, Jan. 2018, pp. 1409-1436.

@article{vancanneyt2018,
author = {Van Canneyt, Steven and Leroux, Philippe and Dhoedt, Bart and Demeester, Thomas},
title = {Modeling and predicting the popularity of online news based on temporal and content-related features},
journal = {Multimed. Tools Appl.},
month = {Jan.},
year = {2018},
volume = {77},
number = {1},
pages = {1409--1436},
doi = {10.1007/s11042-017-4348-z}
}

pubarticle

B. Deygers, K. Van Gorp and T. Demeester, "The B2 level and the dream of a common standard", Lang. Assess. Quarterly, Vol. 15, No. 1, Jan. 2018, pp. 44-58.

The B2 level and the dream of a common standard

B. Deygers, K. Van Gorp and T. Demeester


Lang. Assess. Quarterly, Vol. 15, No. 1, Jan. 2018, pp. 44-58.

In Flanders, Belgium, university admission of undergraduate international L2 students requires a certificate of an accredited test of Dutch. The two main university entrance tests used for certification share highly comparable oral components and CEFR-based oral rating criteria. This article discusses to what extent ratings on the oral components of these tests can be compared. The data used are the ratings of the oral performances of the same 82 candidates on both oral test components, which were administered within the same week. The correlation on the overall scores is high, but lower on the oral test component. Further analyses, including linear regression and multifaceted Rasch analysis, indicate that the B2 level was interpreted differently in the two tests. The results show that using the same language proficiency scales as the basis for rating scale criteria may lead to superficial correspondences or a perceived equivalence but does not necessarily lead to greater comparability of shared criteria. The findings of this study are especially useful for contexts in which different tests use similar criteria that are based on the same descriptors, and comparability is only assumed.

The B2 level and the dream of a common standard

B. Deygers, K. Van Gorp and T. Demeester


Lang. Assess. Quarterly, Vol. 15, No. 1, Jan. 2018, pp. 44-58.

@article{deygers2018,
author = {Deygers, Bart and Van Gorp, Koen and Demeester, Thomas},
title = {The B2 level and the dream of a common standard},
journal = {Lang. Assess. Quarterly},
month = {Jan.},
year = {2018},
volume = {15},
number = {1},
pages = {44--58},
doi = {10.1080/15434303.2017.1421955}
}

pubinproceedings

P. Minervini, T. Demeester, T. Rocktäschel and S. Riedel, "Adversarial sets for regularising neural link predictors", in Proc. 33rd Conf. Uncertainty in Artificial Intelligence (UAI 2017), Sydney, Australia, Aug. 11-15 2017.

Adversarial sets for regularising neural link predictors

P. Minervini, T. Demeester, T. Rocktäschel and S. Riedel


in Proc. 33rd Conf. Uncertainty in Artificial Intelligence (UAI 2017), Sydney, Australia, Aug. 11-15 2017.

In adversarial training, a set of models learn together by pursuing competing goals, usually defined on single data instances. However, in relational learning and other non-i.i.d domains, goals can also be defined over sets of instances. For example, a link predictor for the is-a relation needs to be consistent with the transitivity property: if is-a(x_1, x_2) and is-a(x_2, x_3) hold, is-a(x_1, x_3) needs to hold as well. Here we use such assumptions for deriving an inconsistency loss, measuring the degree to which the model violates the assumptions on an adversarially-generated set of examples. The training objective is defined as a minimax problem, where an adversary finds the most offending adversarial examples by maximising the inconsistency loss, and the model is trained by jointly minimising a supervised loss and the inconsistency loss on the adversarial examples. This yields the first method that can use function-free Horn clauses (as in Datalog) to regularise any neural link predictor, with complexity independent of the domain size. We show that for several link prediction models, the optimisation problem faced by the adversary has efficient closed-form solutions. Experiments on link prediction benchmarks indicate that given suitable prior knowledge, our method can significantly improve neural link predictors on all relevant metrics.

Adversarial sets for regularising neural link predictors

P. Minervini, T. Demeester, T. Rocktäschel and S. Riedel


in Proc. 33rd Conf. Uncertainty in Artificial Intelligence (UAI 2017), Sydney, Australia, Aug. 11-15 2017.

@inproceedings{Minervini2017,
author = {Minervini, Pasquale and Demeester, Thomas and Rocktäschel, Tim and Riedel, Sebastian},
title = {Adversarial sets for regularising neural link predictors},
booktitle = {Proc. 33rd Conf. Uncertainty in Artificial Intelligence (UAI 2017)},
month = {Aug. 11--15},
year = {2017},
address = {Sydney, Australia}
}

pubinproceedings

L. Sterckx, J. Naradowsky, B. Byrne, T. Demeester and C. Develder, "Break it down for me: A study in automated lyric annotation", in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2017), Copenhagen, Denmark, 7-11 Sep. 2017, pp. 2064-70.

Break it down for me: A study in automated lyric annotation

L. Sterckx, J. Naradowsky, B. Byrne, T. Demeester and C. Develder


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2017), Copenhagen, Denmark, 7-11 Sep. 2017, pp. 2064-70.

Comprehending lyrics, as found in songs and poems, can pose a challenge to human and machine readers alike. This motivates the need for systems that can understand the ambiguity and jargon found in such creative texts, and provide commentary to aid readers in reaching the correct interpretation.
We introduce the task of automated lyric annotation (ALA). Like text simplification, a goal of ALA is to rephrase the original text in a more easily understandable manner. However, in ALA the system must often include additional information to clarify niche terminology and abstract concepts. To stimulate research on this task, we release a large collection of crowdsourced annotations for song lyrics. We analyze the performance of translation and retrieval models on this task, measuring performance with both automated and human evaluation. We find that each model captures a unique type of information important to the task.

Break it down for me: A study in automated lyric annotation

L. Sterckx, J. Naradowsky, B. Byrne, T. Demeester and C. Develder


in Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2017), Copenhagen, Denmark, 7-11 Sep. 2017, pp. 2064-70.

@inproceedings{Sterckx2017EMNLP,
author = {Sterckx, Lucas and and Jason Naradowsky and Bill Byrne and Thomas Demeester and Develder, Chris},
title = {Break it down for me: A study in automated lyric annotation},
booktitle = {Proc. Conf. Empirical Methods in Natural Lang. Processing (EMNLP 2017)},
month = {7--11 Sep.},
year = {2017},
pages = {2064--70},
address = {Copenhagen, Denmark},
url = {https://www.aclweb.org/anthology/papers/D/D17/D17-1220/},
doi = {10.18653/v1/D17-1220}
}

pubinproceedings

G. Bekoulis, J. Deleu, T. Demeester and C. Develder, "Reconstructing the house from the ad: Structured prediction on real estate classifieds", in Proc. 15th Conf. Eur. Chapter Assoc. Comput. Ling. (EACL 2017), Vol. 2, Valencia, Spain, 3-7 Apr. 2017, pp. 274-279.

Reconstructing the house from the ad: Structured prediction on real estate classifieds

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. 15th Conf. Eur. Chapter Assoc. Comput. Ling. (EACL 2017), Vol. 2, Valencia, Spain, 3-7 Apr. 2017, pp. 274-279.

In this paper, we address the (to the best of our knowledge) new problem of extracting a structured description of real estate properties from their natural language descriptions in classifieds. We survey and present several models to (a) identify important entities of a property (e.g., rooms) from classifieds and (b) structure them into a tree format, with the entities as nodes and edges representing a part-of relation. Experiments show that a graph-based system deriving the tree from an initially fully connected entity graph, outperforms a transition-based system starting from only the entity nodes, since it better reconstructs the tree.

Reconstructing the house from the ad: Structured prediction on real estate classifieds

G. Bekoulis, J. Deleu, T. Demeester and C. Develder


in Proc. 15th Conf. Eur. Chapter Assoc. Comput. Ling. (EACL 2017), Vol. 2, Valencia, Spain, 3-7 Apr. 2017, pp. 274-279.

@inproceedings{Bekoulis2017EACL,
author = {Bekoulis, Giannis and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Reconstructing the house from the ad: Structured prediction on real estate classifieds},
booktitle = {Proc. 15th Conf. Eur. Chapter Assoc. Comput. Ling. (EACL 2017), Vol. 2},
month = {3--7 Apr.},
year = {2017},
pages = {274--279},
address = {Valencia, Spain},
url = {https://www.aclweb.org/anthology/papers/E/E17/E17-2044/}
}

pubinproceedings

L. Sterckx, C. Caragea, T. Demeester and C. Develder, "Supervised keyphrase extraction as positive unlabeled learning", in Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016), Austin, TX, USA, 1-5 Nov. 2016, pp. 1924-29.

Supervised keyphrase extraction as positive unlabeled learning

L. Sterckx, C. Caragea, T. Demeester and C. Develder


in Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016), Austin, TX, USA, 1-5 Nov. 2016, pp. 1924-29.

The problem of noisy and unbalanced train- ing data for supervised keyphrase extraction results from the subjectivity of keyphrase assignment, which we quantify by crowdsourcing keyphrases for news and fashion magazine articles with many annotators per document. We show that annotators exhibit substantial disagreement, meaning that single annotator data could lead to very different training sets for supervised keyphrase extractors. Thus, annotations from single authors or readers lead to noisy training data and poor extraction performance of the resulting supervised extractor. We provide a simple but effective solution to still work with such data by reweighting the importance of unlabeled candidate phrases in a two stage Positive Unlabeled Learning setting. We show that performance of trained keyphrase extractors approximates a classifier trained on articles labeled by multiple annotators, leading to higher average F1scores and better rankings of keyphrases. We apply this strategy to a variety of test collections from different backgrounds and show improvements over strong baseline models.

Supervised keyphrase extraction as positive unlabeled learning

L. Sterckx, C. Caragea, T. Demeester and C. Develder


in Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016), Austin, TX, USA, 1-5 Nov. 2016, pp. 1924-29.

@inproceedings{sterckx2016emnlp,
author = {Sterckx, Lucas and Caragea, Cornelia and Demeester, Thomas and Develder, Chris},
title = {Supervised keyphrase extraction as positive unlabeled learning},
booktitle = {Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016)},
month = {1--5 Nov.},
year = {2016},
pages = {1924--29},
address = {Austin, TX, USA},
url = {https://www.aclweb.org/anthology/papers/D/D16/D16-1198/},
doi = {10.18653/v1/D16-1198}
}

pubinproceedings

T. Demeester, T. Rocktäschel and S. Riedel, "Lifted rule injection for relation embeddings", in Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016), Austin, TX, USA, 1-5 Nov. 2016, pp. 1389-1399.

Lifted rule injection for relation embeddings

T. Demeester, T. Rocktäschel and S. Riedel


in Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016), Austin, TX, USA, 1-5 Nov. 2016, pp. 1389-1399.

Methods based on representation learning currently hold the state-of-the-art in many natural language processing and knowledge base inference tasks. Yet, a major challenge is how to efficiently incorporate commonsense knowledge into such models. A recent approach regularizes relation and entity representations by propositionalization of first-order logic rules. However, propositionalization does not scale beyond domains with only few entities and rules. In this paper we present a highly efficient method for incorporating implication rules into distributed representations for automated knowledge base construction. We map entity-tuple embeddings into an approximately Boolean space and encourage a partial ordering over relation embeddings based on implication rules mined from WordNet. Surprisingly, we find that the strong restriction of the entity-tuple embedding space does not hurt the expressiveness of the model and even acts as a regularizer that improves generalization. By incorporating few commonsense rules, we achieve an increase of 2 percentage points mean average precision over a matrix factorization baseline, while observing a negligible increase in runtime.

Lifted rule injection for relation embeddings

T. Demeester, T. Rocktäschel and S. Riedel


in Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016), Austin, TX, USA, 1-5 Nov. 2016, pp. 1389-1399.

@inproceedings{demeester2016emnlp,
author = {Demeester, Thomas and Rocktäschel, Tim and Riedel, Sebastian},
title = {Lifted rule injection for relation embeddings},
booktitle = {Proc. Conf. Empirical Methods in Natural Lang. Proc. (EMNLP 2016)},
month = {1--5 Nov.},
year = {2016},
pages = {1389--1399},
address = {Austin, TX, USA},
url = {https://www.aclweb.org/anthology/D16-1146},
doi = {10.18653/v1/D16-1146}
}

pubarticle

L. Sterckx, T. Demeester, J. Deleu and C. Develder, "Knowledge base population using semantic label propagation", Knowledge-Based Syst., Vol. 108, Sep. 2016, pp. 79-91.

Knowledge base population using semantic label propagation

L. Sterckx, T. Demeester, J. Deleu and C. Develder


Knowledge-Based Syst., Vol. 108, Sep. 2016, pp. 79-91.

Training relation extractors for the purpose of automated knowledge base population requires the availability of sufficient training data. The amount of manual labeling can be significantly reduced by applying distant supervision, which generates training data by aligning large text corpora with existing knowledge bases. This typically results in a highly noisy training set, where many training sentences do not express the intended relation. In this paper, we propose to combine distant supervision with minimal human supervision by annotating features (in particular shortest dependency paths) rather than complete relation instances. Such feature labeling eliminates noise from the initial training set, resulting in a significant increase of precision at the expense of recall. We further improve on this approach by introducing the Semantic Label Propagation (SLP) method, which uses the similarity between low-dimensional representations of candidate training instances to again extend the (filtered) training set in order to increase recall while maintaining high precision. Our strategy is evaluated on an established test collection designed for knowledge base population (KBP) from the TAC KBP English slot filling task. The experimental results show that SLP leads to substantial performance gains when compared to existing approaches while requiring an almost negligible human annotation effort.

Knowledge base population using semantic label propagation

L. Sterckx, T. Demeester, J. Deleu and C. Develder


Knowledge-Based Syst., Vol. 108, Sep. 2016, pp. 79-91.

@article{Sterckx2016KBS,
author = {Lucas Sterckx and Thomas Demeester and Johannes Deleu and Chris Develder},
title = {Knowledge base population using semantic label propagation},
journal = {Knowledge-Based Syst.},
month = {Sep.},
year = {2016},
volume = {108},
pages = {79--91},
doi = {10.1016/j.knosys.2016.05.015}
}

pubarticle

C. De Boom, S. Van Canneyt, T. Demeester and B. Dhoedt, "Representation learning for very short texts using weighted word embedding aggregation", Pattern Recogn. Lett., Vol. 80, Sep. 2016, pp. 150-156.

Representation learning for very short texts using weighted word embedding aggregation

C. De Boom, S. Van Canneyt, T. Demeester and B. Dhoedt


Pattern Recogn. Lett., Vol. 80, Sep. 2016, pp. 150-156.

We create text representations by weighing word embeddings using idf information.A novel median-based loss is designed to mitigate the negative effect of outliers.A dataset of semantically related textual pairs from Wikipedia and Twitter is made.Our method outperforms all word embedding baselines in a semantic similarity task.Our method is out-of-the-box and thus requires no retraining in different contexts. Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.

Representation learning for very short texts using weighted word embedding aggregation

C. De Boom, S. Van Canneyt, T. Demeester and B. Dhoedt


Pattern Recogn. Lett., Vol. 80, Sep. 2016, pp. 150-156.

@article{deboom2016prl,
author = {De Boom, Cedric and Van Canneyt, Steven and Demeester, Thomas and Dhoedt, Bart},
title = {Representation learning for very short texts using weighted word embedding aggregation},
journal = {Pattern Recogn. Lett.},
month = {Sep.},
year = {2016},
volume = {80},
pages = {150--156},
doi = {10.1016/j.patrec.2016.06.012}
}

pubinproceedings

C. De Boom, S. Leroux, S. Bohez, P. Simoens, T. Demeester and B. Dhoedt, "Efficiency evaluation of character-level RNN training schedules", in Proc. ICML 2016 Workshop Data Efficient Machine Learn. (DEML 2016), 24 Jun. 2016.

Efficiency evaluation of character-level RNN training schedules

C. De Boom, S. Leroux, S. Bohez, P. Simoens, T. Demeester and B. Dhoedt


in Proc. ICML 2016 Workshop Data Efficient Machine Learn. (DEML 2016), 24 Jun. 2016.

We present four training and prediction schedules from the same character-level recurrent neural network. The efficiency of these schedules is tested in terms of model effectiveness as a function of training time and amount of training data seen. We show that the choice of training and prediction schedule potentially has a considerable impact on the prediction effectiveness for a given training budget.

Efficiency evaluation of character-level RNN training schedules

C. De Boom, S. Leroux, S. Bohez, P. Simoens, T. Demeester and B. Dhoedt


in Proc. ICML 2016 Workshop Data Efficient Machine Learn. (DEML 2016), 24 Jun. 2016.

@inproceedings{deboom2016deml,
author = {De Boom, Cedric and Leroux, Sam and Bohez, Steven and Simoens, Pieter and Demeester, Thomas and Dhoedt, Bart},
title = {Efficiency evaluation of character-level RNN training schedules},
booktitle = {Proc. ICML 2016 Workshop Data Efficient Machine Learn. (DEML 2016)},
month = {24 Jun.},
year = {2016}
}

pubinproceedings

T. Demeester, T. Rocktäschel and S. Riedel, "Regularizing relation representations by first-order implications", in Proc. 5th Workshop Autom. Knowl. Base Constr. (AKBC 2016), San Diego, CA, USA, 17 Jun. 2016, pp. 75-80.

Regularizing relation representations by first-order implications

T. Demeester, T. Rocktäschel and S. Riedel


in Proc. 5th Workshop Autom. Knowl. Base Constr. (AKBC 2016), San Diego, CA, USA, 17 Jun. 2016, pp. 75-80.

Methods for automated knowledge base construction often rely on trained fixed-length vector representations of relations and entities to predict facts. Recent work showed that such representations can be regularized to inject first-order logic formulae. This enables to incorporate domain-knowledge for improved prediction of facts, especially for uncommon relations. However, current approaches rely on propositionalization of formulae and thus do not scale to large sets of formulae or knowledge bases with many facts. Here we propose a method that imposes first-order constraints directly on relation representations, avoiding costly grounding of formulae. We show that our approach works well for implications between pairs of relations on artificial datasets.

Regularizing relation representations by first-order implications

T. Demeester, T. Rocktäschel and S. Riedel


in Proc. 5th Workshop Autom. Knowl. Base Constr. (AKBC 2016), San Diego, CA, USA, 17 Jun. 2016, pp. 75-80.

@inproceedings{demeester2016akbc,
author = {Demeester, Thomas and Rocktäschel, Tim and Riedel, Sebastian},
title = {Regularizing relation representations by first-order implications},
booktitle = {Proc. 5th Workshop Autom. Knowl. Base Constr. (AKBC 2016)},
month = {17 Jun.},
year = {2016},
pages = {75--80},
address = {San Diego, CA, USA},
url = {https://www.aclweb.org/anthology/W16-1314},
doi = {10.18653/v1/W16-1314}
}

pubinproceedings

B. Vandersmissen, L. Sterckx, T. Demeester, A. Jalalvand, W. De Neve and R. Van de Walle, "An automated end-to-end pipeline for fine-grained video annotation using deep neural networks", in Proc. ACM Int. Conf. Multimedia Retr. (ICMR 2016), New York, NY, USA, 6-9 Jun. 2016.

An automated end-to-end pipeline for fine-grained video annotation using deep neural networks

B. Vandersmissen, L. Sterckx, T. Demeester, A. Jalalvand, W. De Neve and R. Van de Walle


in Proc. ACM Int. Conf. Multimedia Retr. (ICMR 2016), New York, NY, USA, 6-9 Jun. 2016.

The searchability of video content is often limited to the descriptions authors and/or annotators care to provide. The level of description can range from absolutely nothing to fine-grained annotations at the level of frames. Based on these annotations, certain parts of the video content are more searchable than others.
Within the context of the STEAMER project, we developed an innovative end-to-end system that attempts to tackle the problem of unsupervised retrieval of news video content, leveraging multiple information streams and deep neural networks. In particular, we extracted keyphrases and named entities from transcripts, subsequently refining these keyphrases and named entities based on their visual appearance in the news video content. Moreover, to allow for fine-grained frame-level annotations, we temporally located high-confidence keyphrases in the news video content. To that end, we had to tackle challenges such as the automatic construction of training sets and the automatic assessment of keyphrase imageability.
In this paper, we discuss the main components of our end-to-end system, capable of transforming textual and visual information into fine-grained video annotations.

An automated end-to-end pipeline for fine-grained video annotation using deep neural networks

B. Vandersmissen, L. Sterckx, T. Demeester, A. Jalalvand, W. De Neve and R. Van de Walle


in Proc. ACM Int. Conf. Multimedia Retr. (ICMR 2016), New York, NY, USA, 6-9 Jun. 2016.

@inproceedings{vandersmissen2016,
author = {Vandersmissen, Baptist and Sterckx, Lucas and Demeester, Thomas and Jalalvand, Azarakhsh and De Neve, Wesley and Van de Walle, Rik},
title = {An automated end-to-end pipeline for fine-grained video annotation using deep neural networks},
booktitle = {Proc. ACM Int. Conf. Multimedia Retr. (ICMR 2016)},
month = {6--9 Jun.},
year = {2016},
address = {New York, NY, USA},
doi = {10.1145/2911996.2912028}
}

pubarticle

T. Demeester, R. Aly, D. Hiemstra, D. Nguyen and C. Develder, "Predicting relevance based on assessor disagreement: Analysis and practical applications for search evaluation", Inf. Retr., Vol. 19, No. 3, Jun. 2016, pp. 284-312.

Predicting relevance based on assessor disagreement: Analysis and practical applications for search evaluation

T. Demeester, R. Aly, D. Hiemstra, D. Nguyen and C. Develder


Inf. Retr., Vol. 19, No. 3, Jun. 2016, pp. 284-312.

Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the predicted relevance model (PRM), which allows predicting a particular result’s relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain, which nowadays often use heuristic and data-independent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out non-relevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections.

Predicting relevance based on assessor disagreement: Analysis and practical applications for search evaluation

T. Demeester, R. Aly, D. Hiemstra, D. Nguyen and C. Develder


Inf. Retr., Vol. 19, No. 3, Jun. 2016, pp. 284-312.

@article{Demeester2015IR,
author = {Demeester, Thomas and Aly, Robin and Hiemstra, Djoerd and Nguyen, Dong and Develder, Chris},
title = {Predicting relevance based on assessor disagreement: Analysis and practical applications for search evaluation},
journal = {Inf. Retr.},
month = {Jun.},
year = {2016},
volume = {19},
number = {3},
pages = {284--312},
doi = {10.1007/s10791-015-9275-x}
}

pubinproceedings

L. Sterckx, T. Demeester, J. Deleu and C. Develder, "Ghent University-IBCN participation in the TAC KBP 2015 cold start slot filling task", in Proc. 8th Text Analysis Conf. (TAC 2015), Gaithersburg, MD, USA, 16-17 Nov. 2015.

Ghent University-IBCN participation in the TAC KBP 2015 cold start slot filling task

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 8th Text Analysis Conf. (TAC 2015), Gaithersburg, MD, USA, 16-17 Nov. 2015.

This paper presents the system of the UGENT IBCN team for the TAC KBP 2015 cold start (slot filling variant) task. This was the team’s second participation. The slot filling system uses distant supervision to generate training data combined with feature labeling and semi-supervision, and two different types of classifiers. We show that the noise reduction step significantly improves precision, and propose an application of word embeddings for slot filling.

Ghent University-IBCN participation in the TAC KBP 2015 cold start slot filling task

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 8th Text Analysis Conf. (TAC 2015), Gaithersburg, MD, USA, 16-17 Nov. 2015.

@inproceedings{sterckx2015tac,
author = {Lucas Sterckx and Thomas Demeester and Johannes Deleu and Chris Develder},
title = {Ghent University-IBCN participation in the TAC KBP 2015 cold start slot filling task},
booktitle = {Proc. 8th Text Analysis Conf. (TAC 2015)},
month = {16--17 Nov.},
year = {2015},
address = {Gaithersburg, MD, USA},
url = {https://tac.nist.gov/publications/2015/participant.papers/TAC2015.UGENT_IBCN.proceedings.pdf}
}

pubinproceedings

C. De Boom, S. Van Canneyt, S. Bohez, T. Demeester and B. Dhoedt, "Learning semantic similarity for very short texts", in Proc. IEEE Int. Conf. Data Min. Workshop (ICDMW 2015), Atlantic City, NJ, USA, 15-17 Nov. 2015.

Learning semantic similarity for very short texts

C. De Boom, S. Van Canneyt, S. Bohez, T. Demeester and B. Dhoedt


in Proc. IEEE Int. Conf. Data Min. Workshop (ICDMW 2015), Atlantic City, NJ, USA, 15-17 Nov. 2015.

Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level. In order to pair short text fragments -- as a concatenation of separate words - an adequate distributed sentence representation is needed, in existing literature often obtained by naively combining the individual word representations. We therefore investigated several text representations as a combination of word embeddings in the context of semantic pair matching. This paper investigates the effectiveness of several such naive techniques, as well as traditional tf-idf similarity, for fragments of different lengths. Our main contribution is a first step towards a hybrid method that combines the strength of dense distributed representations - as opposed to sparse term matching - with the strength of tf-idf based methods to automatically reduce the impact of less informative terms. Our new approach outperforms the existing techniques in a toy experimental set-up, leading to the conclusion that the combination of word embeddings and tf-idf information might lead to a better model for semantic content within very short text fragments.

Learning semantic similarity for very short texts

C. De Boom, S. Van Canneyt, S. Bohez, T. Demeester and B. Dhoedt


in Proc. IEEE Int. Conf. Data Min. Workshop (ICDMW 2015), Atlantic City, NJ, USA, 15-17 Nov. 2015.

@inproceedings{deboom2015icdmw,
author = {De Boom, Cedric and Van Canneyt, Steven and Bohez, Steven and Demeester, Thomas and Dhoedt, Bart},
title = {Learning semantic similarity for very short texts},
booktitle = {Proc. IEEE Int. Conf. Data Min. Workshop (ICDMW 2015)},
month = {15--17 Nov.},
year = {2015},
address = {Atlantic City, NJ, USA},
doi = {10.1109/ICDMW.2015.86}
}

pubinproceedings

P. Barrio, L. Gravano and C. Develder, "Ranking deep web text collections for scalable information extraction", in Proc. 24th ACM Int. Conf. Inf. Knowl. Management (CIKM 2015), Melbourne, Australia, 19-23 Oct. 2015, pp. 153-162.

Ranking deep web text collections for scalable information extraction

P. Barrio, L. Gravano and C. Develder


in Proc. 24th ACM Int. Conf. Inf. Knowl. Management (CIKM 2015), Melbourne, Australia, 19-23 Oct. 2015, pp. 153-162.

Information extraction (IE) systems discover structured in- formation from natural language text, to enable much richer querying and data mining than possible directly over the unstructured text. Unfortunately, IE is generally a com- putationally expensive process, and hence improving its ef- ficiency, so that it scales over large volumes of text, is of critical importance. State-of-the-art approaches for scaling the IE process focus on one text collection at a time. These approaches prioritize the extraction effort by learning key- word queries to identify the “useful” documents for the IE task at hand, namely, those that lead to the extraction of structured “tuples.” These approaches, however, do not at- tempt to predict which text collections are useful for the IE task—and hence merit further processing—and which ones will not contribute any useful output—and hence should be ignored altogether, for efficiency. In this paper, we focus on an especially valuable family of text sources, the so-called deep web collections, whose (remote) contents are only ac- cessible via querying. Specifically, we introduce and study techniques for ranking deep web collections for an IE task, to prioritize the extraction effort by focusing on collections with substantial numbers of useful documents for the task. We study both (adaptations of) state-of-the-art resource selec- tion strategies for distributed information retrieval, as well as IE-specific approaches. Our large-scale experimental eval- uation over realistic deep web collections, and for several different IE tasks, shows the merits and limitations of the alternative families of approaches, and provides a roadmap for addressing this critically important building block for efficient, scalable information extraction.

Ranking deep web text collections for scalable information extraction

P. Barrio, L. Gravano and C. Develder


in Proc. 24th ACM Int. Conf. Inf. Knowl. Management (CIKM 2015), Melbourne, Australia, 19-23 Oct. 2015, pp. 153-162.

@inproceedings{Barrio2015CIKM,
author = {Barrio, Pablo and Gravano, Luis and Develder, Chris},
title = {Ranking deep web text collections for scalable information extraction},
booktitle = {Proc. 24th ACM Int. Conf. Inf. Knowl. Management (CIKM 2015)},
month = {19--23 Oct.},
year = {2015},
pages = {153--162},
address = {Melbourne, Australia},
doi = {10.1145/2806416.2806581}
}

pubinproceedings

L. Sterckx, T. Demeester, J. Deleu and C. Develder, "When topic models disagree: Keyphrase extraction with multiple topic models", in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015, pp. 123-124.

When topic models disagree: Keyphrase extraction with multiple topic models

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015, pp. 123-124.

We explore how the unsupervised extraction of topic-related keywords benefits from combining multiple topic models. We show that averaging multiple topic models, inferred from different corpora, leads to more accurate keyphrases than when using a single topic model and other state-of-the-art techniques. The experiments confirm the intuitive idea that a prerequisite for the significant benefit of combining multiple models is that the models should be sufficiently different, i.e., they should provide distinct contexts in terms of topical word importance.

When topic models disagree: Keyphrase extraction with multiple topic models

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015, pp. 123-124.

@inproceedings{Sterckx2015WWWa,
author = {Lucas Sterckx and Thomas Demeester and Johannes Deleu and Chris Develder},
title = {When topic models disagree: Keyphrase extraction with multiple topic models},
booktitle = {Proc. 24th Int. World Wide Web Conf. (WWW 2015)},
month = {18--22 May},
year = {2015},
pages = {123--124},
address = {Florence, Italy},
doi = {10.1145/2740908.2742731}
}

pubinproceedings

L. Sterckx, T. Demeester, J. Deleu and C. Develder, "Topical word importance for fast keyphrase extraction", in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015, pp. 121-122.

Topical word importance for fast keyphrase extraction

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015, pp. 121-122.

We propose an improvement on a state-of-the-art keyphrase extraction algorithm, Topical PageRank (TPR), incorporating topical information from topic models. While the original algorithm requires a random walk for each topic in the topic model being used, ours is independent of the topic model, computing but a single PageRank for each text regardless of the amount of topics in the model. This increases the speed drastically and enables it for use on large collections of text using vast topic models, while not altering performance of the original algorithm.

Topical word importance for fast keyphrase extraction

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015, pp. 121-122.

@inproceedings{Sterckx2015WWWb,
author = {Lucas Sterckx and Thomas Demeester and Johannes Deleu and Chris Develder},
title = {Topical word importance for fast keyphrase extraction},
booktitle = {Proc. 24th Int. World Wide Web Conf. (WWW 2015)},
month = {18--22 May},
year = {2015},
pages = {121--122},
address = {Florence, Italy},
doi = {10.1145/2740908.2742730}
}

pubinproceedings

T. Demeester, D. Trieschnigg, K. Zhou, D. Nguyen and D. Hiemstra, "FedWeb greatest hits: Presenting the new test collection for federated web search", in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015.

FedWeb greatest hits: Presenting the new test collection for federated web search

T. Demeester, D. Trieschnigg, K. Zhou, D. Nguyen and D. Hiemstra


in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015.

This paper presents ‘FedWeb Greatest Hits’, a large new test collection for research in web information retrieval. As a combination and extension of the datasets used in the TREC Federated Web Search Track, this collection opens up new research possibilities on federated web search challenges, as well as on various other problems.

FedWeb greatest hits: Presenting the new test collection for federated web search

T. Demeester, D. Trieschnigg, K. Zhou, D. Nguyen and D. Hiemstra


in Proc. 24th Int. World Wide Web Conf. (WWW 2015), Florence, Italy, 18-22 May 2015.

@inproceedings{demeester2015fedweb,
author = {Demeester, Thomas and Trieschnigg, Dolf and Zhou, Ke and Nguyen, Dong and Hiemstra, Djoerd},
title = {FedWeb greatest hits: Presenting the new test collection for federated web search},
booktitle = {Proc. 24th Int. World Wide Web Conf. (WWW 2015)},
month = {18--22 May},
year = {2015},
address = {Florence, Italy},
doi = {10.1145/2740908.2742755}
}

pubinproceedings

L. Sterckx, T. Demeester, J. Deleu and C. Develder, "Using semantic clustering and active learning for noise reduction in distant supervision", in Proc. 4th Workshop on Automated Knowledge Base Construction (AKBC 2014) at NIPS 2014, Montreal, Canada, 13 Dec. 2014.

Using semantic clustering and active learning for noise reduction in distant supervision

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 4th Workshop on Automated Knowledge Base Construction (AKBC 2014) at NIPS 2014, Montreal, Canada, 13 Dec. 2014.

The use of external databases to generate training data, also known as Distant Supervision, has become an effective way to train supervised relation extractors but this approach inherently suffers from noise. In this paper we propose a method for noise reduction in distantly supervised training data, using a discriminative classifier and semantic similarity between the contexts of the training examples. We describe an active learning strategy which exploits hierarchical clustering of the candidate training samples. To further improve the effectiveness of this approach, we study the use of several methods for dimensionality reduction of the training samples. We find that semantic clustering of training data combined with cluster-based active learning allows filtering the training data, hence facilitating the creation of a clean training set for relation extraction, at a reduced manual labeling cost.

Using semantic clustering and active learning for noise reduction in distant supervision

L. Sterckx, T. Demeester, J. Deleu and C. Develder


in Proc. 4th Workshop on Automated Knowledge Base Construction (AKBC 2014) at NIPS 2014, Montreal, Canada, 13 Dec. 2014.

@inproceedings{Sterckx2014AKBC,
author = {Sterckx, Lucas and Demeester, Thomas and Deleu, Johannes and Develder, Chris},
title = {Using semantic clustering and active learning for noise reduction in distant supervision},
booktitle = {Proc. 4th Workshop on Automated Knowledge Base Construction (AKBC 2014) at NIPS 2014},
month = {13 Dec.},
year = {2014},
address = {Montreal, Canada}
}

pubinproceedings

M. Feys, T. Demeester, B. Fortuna, J. Deleu and C. Develder, "On the robustness of event detection evaluation: A case study", in Proc. Forum for Inf. Retr. Evaluation (FIRE 2014), Bangalore, India, 5-7 Dec. 2014.

On the robustness of event detection evaluation: A case study

M. Feys, T. Demeester, B. Fortuna, J. Deleu and C. Develder


in Proc. Forum for Inf. Retr. Evaluation (FIRE 2014), Bangalore, India, 5-7 Dec. 2014.

Research on evaluation of IR systems has led to the insight that a robust evaluation strategy requires tests on a large number of events/queries. However, especially for event detection, the number of manually labeled events may be limited. In this paper we investigate how to optimize the evaluation strategy in those cases to maximize robustness. We also introduce two new vector space models for event detection that aim to incorporate bursty information of terms and compare these with existing models. Our experiments show that by using graded relevance levels we can reduce the impact of subjectivity and ambiguity of event detection evaluation. We also show that although user disagreement is significant, it has no real impact on the ranking of the results.

On the robustness of event detection evaluation: A case study

M. Feys, T. Demeester, B. Fortuna, J. Deleu and C. Develder


in Proc. Forum for Inf. Retr. Evaluation (FIRE 2014), Bangalore, India, 5-7 Dec. 2014.

@inproceedings{Feys2014FIRE,
author = {Feys, Matthias and Demeester, Thomas and Fortuna, Blaz and Deleu, Johannes and Develder, Chris},
title = {On the robustness of event detection evaluation: A case study},
booktitle = {Proc. Forum for Inf. Retr. Evaluation (FIRE 2014)},
month = {5--7 Dec.},
year = {2014},
address = {Bangalore, India}
}

pubinproceedings

L. Mertens, T. Demeester, J. Deleu, M. Feys and C. Develder, "Entity linking: Test collections revisited", in Proc. Forum for Inf. Retr. Evaluation (FIRE 2014), Bangalore, India, 5-7 Dec. 2014.

Entity linking: Test collections revisited

L. Mertens, T. Demeester, J. Deleu, M. Feys and C. Develder


in Proc. Forum for Inf. Retr. Evaluation (FIRE 2014), Bangalore, India, 5-7 Dec. 2014.

This paper analyzes two important conditions that are usually taken for granted in the evaluation of information retrieval systems: the test queries should be representative for the intended application scenario, and a sufficient amount of queries are needed to robustly assess system performance, as well as discern performance differ- ences between systems. Both issues have important consequences, as studied in this paper for the specific case of Entity Linking systems. We investigate two methods for automatic query generation, and show them to have a vast impact on evaluated system perfor- mance. We further demonstrate the effect a query set’s size has on its ability to faithfully distinguish systems, and propose a method for assessing the possible impact on system performance adding a specific number of queries to the set might have.

Entity linking: Test collections revisited

L. Mertens, T. Demeester, J. Deleu, M. Feys and C. Develder


in Proc. Forum for Inf. Retr. Evaluation (FIRE 2014), Bangalore, India, 5-7 Dec. 2014.

@inproceedings{Mertens2014FIRE,
author = {Mertens, Laurent and Demeester, Thomas and Deleu, Johannes and Feys, Matthias and Develder, Chris},
title = {Entity linking: Test collections revisited},
booktitle = {Proc. Forum for Inf. Retr. Evaluation (FIRE 2014)},
month = {5--7 Dec.},
year = {2014},
address = {Bangalore, India}
}

pubinproceedings

M. Feys, L. Sterckx, L. Mertens, J. Deleu, T. Demeester and C. Develder, "Ghent University-IBCN participation in TAC-KBP 2014 slot filling and cold start tasks", in Proc. 7th Text Analysis Conf. (TAC 2014), Gaithersburg, MD, USA, 17-18 Nov. 2014.

Ghent University-IBCN participation in TAC-KBP 2014 slot filling and cold start tasks

M. Feys, L. Sterckx, L. Mertens, J. Deleu, T. Demeester and C. Develder


in Proc. 7th Text Analysis Conf. (TAC 2014), Gaithersburg, MD, USA, 17-18 Nov. 2014.

This paper presents the system of the UGENT IBCN team for the TAC KBP 2014 slot filling and cold start (slot filling variant) tasks. This was the team’s first participation in both tasks. The slot filling system uses distant supervision to generate training data combined with a noise reduction step, and two different types of classifiers. We show that the noise reduction step significantly improves precision, and propose an application of word embeddings for slot filling.

Ghent University-IBCN participation in TAC-KBP 2014 slot filling and cold start tasks

M. Feys, L. Sterckx, L. Mertens, J. Deleu, T. Demeester and C. Develder


in Proc. 7th Text Analysis Conf. (TAC 2014), Gaithersburg, MD, USA, 17-18 Nov. 2014.

@inproceedings{Feys2014TAC,
author = {Feys, Matthias and Sterckx, Lucas and Mertens, Laurent and Deleu, Johannes and Demeester, Thomas and Develder, Chris},
title = {Ghent University-IBCN participation in TAC-KBP 2014 slot filling and cold start tasks},
booktitle = {Proc. 7th Text Analysis Conf. (TAC 2014)},
month = {17--18 Nov.},
year = {2014},
address = {Gaithersburg, MD, USA}
}

pubinproceedings

K. Zhou, T. Demeester, D. Nguyen, D. Hiemstra and D. Trieschnigg, "Aligning vertical collection relevance with user intent", in Proc. 23rd ACM Int. Conf. Inf. Knowl. Management (CIKM 2014), Shanghai, China, 3-7 Nov. 2014, pp. 1915-1918.

Aligning vertical collection relevance with user intent

K. Zhou, T. Demeester, D. Nguyen, D. Hiemstra and D. Trieschnigg


in Proc. 23rd ACM Int. Conf. Inf. Knowl. Management (CIKM 2014), Shanghai, China, 3-7 Nov. 2014, pp. 1915-1918.

Selecting and aggregating different types of content from multiple vertical search engines is becoming popular in web search. The user vertical intent, the verticals the user expects to be relevant for a particular information need, might not correspond to the vertical collection relevance, the verticals containing the most relevant content. In this work we propose different approaches to define the set of relevant verticals based on document judgments. We correlate the collection-based relevant verticals obtained from these approaches to the real user vertical intent, and show that they can be aligned relatively well. The set of relevant verticals defined by those approaches could therefore serve as an approximate but reliable ground-truth for evaluating vertical selection, avoiding the need for collecting explicit user vertical intent, and vice versa.

Aligning vertical collection relevance with user intent

K. Zhou, T. Demeester, D. Nguyen, D. Hiemstra and D. Trieschnigg


in Proc. 23rd ACM Int. Conf. Inf. Knowl. Management (CIKM 2014), Shanghai, China, 3-7 Nov. 2014, pp. 1915-1918.

@inproceedings{zhou2014,
author = {Zhou, Ke and Demeester, Thomas and Nguyen, Dong and Hiemstra, Djoerd and Trieschnigg, Dolf},
title = {Aligning vertical collection relevance with user intent},
booktitle = {Proc. 23rd ACM Int. Conf. Inf. Knowl. Management (CIKM 2014)},
month = {3--7 Nov.},
year = {2014},
pages = {1915--1918},
address = {Shanghai, China},
doi = {10.1145/2661829.2661941}
}

pubinproceedings

T. Demeester, D. Drieschnigg, K. Zhou and D. Hiemstra, "Overview of the TREC 2014 federated web search track", in Proc. 23rd Text Retr. Conf. (TREC 2014), Gaithersburg, MD, USA, 19-21 Nov. 2014.

Overview of the TREC 2014 federated web search track

T. Demeester, D. Drieschnigg, K. Zhou and D. Hiemstra


in Proc. 23rd Text Retr. Conf. (TREC 2014), Gaithersburg, MD, USA, 19-21 Nov. 2014.

The TREC Federated Web Search track facilitates research on federated web search, by providing a large realistic data collection sampled from a multitude of online search engines. The FedWeb 2013 Resource Selection and Results Merging tasks are again included in FedWeb 2014, and we additionally introduced the task of vertical selection. Other new aspects are the required link between the Resource Selection and Results Merging tasks, and the importance of diversity in the merged results. After an overview of the new data collection and relevance judgments, the individual participants’ results for the tasks are introduced, analyzed, and compared.

Overview of the TREC 2014 federated web search track

T. Demeester, D. Drieschnigg, K. Zhou and D. Hiemstra


in Proc. 23rd Text Retr. Conf. (TREC 2014), Gaithersburg, MD, USA, 19-21 Nov. 2014.

@inproceedings{demeester2014trec,
author = {Demeester, Thomas and Drieschnigg, Dong and Zhou, Ke and Hiemstra, Djoerd},
title = {Overview of the TREC 2014 federated web search track},
booktitle = {Proc. 23rd Text Retr. Conf. (TREC 2014)},
month = {19--21 Nov.},
year = {2014},
address = {Gaithersburg, MD, USA},
url = {https://trec.nist.gov/pubs/trec23/papers/overview-federated.pdf}
}

pubinproceedings

B. Fortuna, T. Demeester and C. Develder, "Towards large-scale event detection and extraction from news", in Proc. Large-scale Online Learn. and Decision Making Workshop (LSOLDM 2014), Windsor, UK, 10-12 Sep. 2014.

Towards large-scale event detection and extraction from news

B. Fortuna, T. Demeester and C. Develder


in Proc. Large-scale Online Learn. and Decision Making Workshop (LSOLDM 2014), Windsor, UK, 10-12 Sep. 2014.

Understanding and reasoning about textual data is one of the important topics in artificial intelligence and is being addressed by various research communities, ranging from knowledge representation, over natural language processing to text mining. Each community provides a different set of often overlapping intuitions, tools and methodologies for working with text.
Event processing from news and social media can be seen as a subtopic of text understanding [1, 2, 3, 4]. It comprises different tasks, including New and Retrospective Event Discovery, Event Type Classification and Event Template Extraction.
Research on event processing requires access to annotated data covering different tasks in the event processing pipeline. Over the last decades, several datasets have been created covering event discovery and event extraction, e.g., [5, 6, 7]. These datasets are rather limited in scope. For example, they contain articles from only few selected sources, or they contain a limited number of annotated events with a high selection bias (e.g., towards larger or well defined events like natural disasters or terrorism). Using such limited datasets to evaluate solutions for the event processing tasks may lead to favoring approaches that do not work well on real-world datasets. The main reasons for these limitations are (1) limited access to data resources and (2) the required and expensive manual annotations.
The main contributions we are working towards are (1) a systematic methodology for efficiently creating a large golden standard of manually annotated events over a large corpus of news articles with a realistic distribution over the covered topics and events, and (2) a resulting annotated corpus a resulting annotated corpus of 10,000 English general news articles embedded in 31 million news articles.

Towards large-scale event detection and extraction from news

B. Fortuna, T. Demeester and C. Develder


in Proc. Large-scale Online Learn. and Decision Making Workshop (LSOLDM 2014), Windsor, UK, 10-12 Sep. 2014.

@inproceedings{Fortuna2014,
author = {Fortuna, Blaz and Demeester, Thomas and Develder, Chris},
title = {Towards large-scale event detection and extraction from news},
booktitle = {Proc. Large-scale Online Learn. and Decision Making Workshop (LSOLDM 2014)},
month = {10--12 Sep.},
year = {2014},
address = {Windsor, UK}
}

pubinproceedings

L. Sterckx, T. Demeester, J. Deleu, L. Mertens and C. Develder, "Assessing quality of unsupervised topics in song lyrics", in Proc. 36th Eur. Conf. Inf. Retr. (ECIR 2014), Amsterdam, The Netherlands, 13-16 Apr. 2014.

Assessing quality of unsupervised topics in song lyrics

L. Sterckx, T. Demeester, J. Deleu, L. Mertens and C. Develder


in Proc. 36th Eur. Conf. Inf. Retr. (ECIR 2014), Amsterdam, The Netherlands, 13-16 Apr. 2014.

How useful are topic models based on song lyrics for applications in music information retrieval? Unsupervised topic models on text corpora are often difficult to interpret. Based on a large collection of lyrics, we investigate how well automatically generated topics are related to manual topic annotations. We propose to use the kurtosis metric to align unsupervised topics with a reference model of supervised topics. This metric is well-suited for topic assessments, as it turns out to be more strongly correlated with manual topic quality scores than existing measures for semantic coherence. We also show how it can be used for a detailed graphical topic quality assessment.

Assessing quality of unsupervised topics in song lyrics

L. Sterckx, T. Demeester, J. Deleu, L. Mertens and C. Develder


in Proc. 36th Eur. Conf. Inf. Retr. (ECIR 2014), Amsterdam, The Netherlands, 13-16 Apr. 2014.

@inproceedings{Sterckx2014ECIR,
author = {Sterckx, Lucas and Demeester, Thomas and Deleu, Johannes and Mertens, Laurent and Develder, Chris},
title = {Assessing quality of unsupervised topics in song lyrics},
booktitle = {Proc. 36th Eur. Conf. Inf. Retr. (ECIR 2014)},
month = {13--16 Apr.},
year = {2014},
address = {Amsterdam, The Netherlands},
doi = {10.1007/978-3-319-06028-6_55}
}

pubinproceedings

S. Van Canneyt, M. Feys, S. Schockaert, T. Demeester, C. Develder and B. Dhoedt, "Detecting newsworthy topics in Twitter", in Proc. 2nd Workshop on Social News on the Web at WWW 2014 (SNOW 2014), Seoul, Korea, 8 Apr. 2014.

Detecting newsworthy topics in Twitter

S. Van Canneyt, M. Feys, S. Schockaert, T. Demeester, C. Develder and B. Dhoedt


in Proc. 2nd Workshop on Social News on the Web at WWW 2014 (SNOW 2014), Seoul, Korea, 8 Apr. 2014.

The task of the SNOW 2014 Data Challenge is to mine Twit- ter streams to provide journalists a set of headlines and complementary information that summarize the most newswor- thy topics for a number of given time intervals. We propose a 4-step approach to solve this. First, a classifier is trained to determine whether a Twitter user is likely to post tweets about newsworthy stories. Second, tweets posted by these users during the time interval of interest are clustered into topics. For this clustering, the cosine similarity between a boosted tf-idf representation of the tweets is used. Third, we use a classifier to estimate the confidence that the obtained topics are newsworthy. Finally, for each obtained newswor- thy topic, a descriptive headline is generated together with relevant keywords, tweets and pictures. Experimental re- sults show the effectiveness of the proposed methodology.

Detecting newsworthy topics in Twitter

S. Van Canneyt, M. Feys, S. Schockaert, T. Demeester, C. Develder and B. Dhoedt


in Proc. 2nd Workshop on Social News on the Web at WWW 2014 (SNOW 2014), Seoul, Korea, 8 Apr. 2014.

@inproceedings{vancanneyt2014snow,
author = {Van Canneyt, Steven and Feys, Matthias and Schockaert, Steven and Demeester, Thomas and Develder, Chris and Dhoedt, Bart},
title = {Detecting newsworthy topics in Twitter},
booktitle = {Proc. 2nd Workshop on Social News on the Web at WWW 2014 (SNOW 2014)},
month = {8 Apr.},
year = {2014},
address = {Seoul, Korea}
}

pubarticle

R. Aly, T. Demeester and S. Robertson, "Probabilistic models in IR and their relationships", Inf. Retr., Vol. 17, No. 2, Apr. 2014, pp. 177-201.

Probabilistic models in IR and their relationships

R. Aly, T. Demeester and S. Robertson


Inf. Retr., Vol. 17, No. 2, Apr. 2014, pp. 177-201.

A solid research path towards new information retrieval models is to further develop the theory behind existing models. A profound understanding of these models is therefore essential. In this paper, we revisit probability ranking principle (PRP)-based models, probability of relevance (PR) models, and language models, finding conceptual differences in their definition and interrelationships. The probabilistic model of the PRP has not been explicitly defined previously, but doing so leads to the formulation of two actual principles with different objectives. First, the belief probability ranking principle (BPRP), which considers uncertain relevance between known documents and the current query, and second, the popularity probability ranking principle (PPRP), which considers the probability of relevance of documents among multiple queries with the same features. Our analysis shows how some of the discussed PR models implement the BPRP or the PPRP while others do not. However, for some models the parameter estimation is challenging. Finally, language models are often presented as related to PR models. However, we find that language models differ from PR models in every aspect of a probabilistic model and the effectiveness of language models cannot be explained by the PRP.

Probabilistic models in IR and their relationships

R. Aly, T. Demeester and S. Robertson


Inf. Retr., Vol. 17, No. 2, Apr. 2014, pp. 177-201.

@article{demeester2014ir,
author = {Aly, Robin and Demeester, Thomas and Robertson, Stephen},
title = {Probabilistic models in IR and their relationships},
journal = {Inf. Retr.},
month = {Apr.},
year = {2014},
volume = {17},
number = {2},
pages = {177--201},
doi = {10.1007/s10791-013-9226-3}
}

pubinproceedings

T. Demeester, R. Aly, D. Hiemstra, D. Nguyen, D. Trieschnigg and C. Develder, "Exploiting user disagreement for web search evaluation: An experimental approach", in Proc. 7th ACM Int. Conf. Web Search and Data Min. (WSDM 2014), New York, NY, USA, 24-28 Feb. 2014.

Exploiting user disagreement for web search evaluation: An experimental approach

T. Demeester, R. Aly, D. Hiemstra, D. Nguyen, D. Trieschnigg and C. Develder


in Proc. 7th ACM Int. Conf. Web Search and Data Min. (WSDM 2014), New York, NY, USA, 24-28 Feb. 2014.

In order to express a more nuanced notion of relevance as compared to binary judgments, graded relevance levels can be used for the evaluation of search results. Especially in Web search, users strongly prefer top results over less relevant results, and yet they often disagree on which are the top results for a given information need. This paper proposes a method to capture this user disagreement and integrate it into the evaluation procedure.
First, we present experiments that investigate the user disagreement. After that, a probabilistic model is proposed that results in a weighting of the relevance levels with a probabilistic interpretation. This is followed by a validity analysis, and an explanation of how to integrate the model with well-established evaluation metrics. Finally, we discuss a specific application of the model, in the estimation of suitable combined page and snippet relevance weights from Web search assessments.

Exploiting user disagreement for web search evaluation: An experimental approach

T. Demeester, R. Aly, D. Hiemstra, D. Nguyen, D. Trieschnigg and C. Develder


in Proc. 7th ACM Int. Conf. Web Search and Data Min. (WSDM 2014), New York, NY, USA, 24-28 Feb. 2014.

@inproceedings{Demeester2014WSDM,
author = {Demeester, Thomas and Robin Aly and Djoerd Hiemstra and Dong Nguyen and Dolf Trieschnigg and Chris Develder},
title = {Exploiting user disagreement for web search evaluation: An experimental approach},
booktitle = {Proc. 7th ACM Int. Conf. Web Search and Data Min. (WSDM 2014)},
month = {24--28 Feb.},
year = {2014},
address = {New York, NY, USA},
note = {Acceptance rate: 17% (64/376)},
doi = {10.1145/2556195.2556268}
}

pubinproceedings

L. Mertens, T. Demeester, J. Deleu and C. Develder, "UGent participation in the TAC 2013 entity-linking task", in Proc. Text Analysis Conference (TAC 2013), Gaithersburg, MD, USA, 18-19 Nov. 2013, pp. 1-12.

UGent participation in the TAC 2013 entity-linking task

L. Mertens, T. Demeester, J. Deleu and C. Develder


in Proc. Text Analysis Conference (TAC 2013), Gaithersburg, MD, USA, 18-19 Nov. 2013, pp. 1-12.

This article describes the system used by the UGent-IBCN team for participating in the Text Analysis Conference (TAC) 2013 English Entity-Linking task. We kept the overall rule-based workflow of our last year’s submission, but significantly altered individual
components. Most importantly, these changes include improved document pre-processing, new ways of candidate selection, and completely redesigned scoring and NIL-detection mechanisms. Finally, we provide detailed data of our system’s performance.

UGent participation in the TAC 2013 entity-linking task

L. Mertens, T. Demeester, J. Deleu and C. Develder


in Proc. Text Analysis Conference (TAC 2013), Gaithersburg, MD, USA, 18-19 Nov. 2013, pp. 1-12.

@inproceedings{Mertens13_TAC,
author = {Mertens, Laurent and Demeester, Thomas and Deleu, Johannes and Develder, Chris},
title = {UGent participation in the TAC 2013 entity-linking task},
booktitle = {Proc. Text Analysis Conference (TAC 2013)},
month = {18--19 Nov.},
year = {2013},
pages = {1--12},
address = {Gaithersburg, MD, USA},
url = {https://tac.nist.gov//publications/2013/participant.papers/UGENT_IBCN.TAC2013.proceedings.pdf}
}

pubinproceedings

T. Demeester, D. Drieschnigg, K. Zhou and D. Hiemstra, "Overview of the TREC 2013 federated web search track", in Proc. 22nd Text Retr. Conf. (TREC 2013), Gaithersburg, MD, USA, 19-22 Nov. 2013.

Overview of the TREC 2013 federated web search track

T. Demeester, D. Drieschnigg, K. Zhou and D. Hiemstra


in Proc. 22nd Text Retr. Conf. (TREC 2013), Gaithersburg, MD, USA, 19-22 Nov. 2013.

The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb
2013. The focus was on basic challenges in federated search: (1) resource selection, and (2) results merging. After an overview of the provided data collection and the relevance judgments for the test topics, the participants’ individual approaches and results on both tasks are discussed. Promising research directions and an outlook on the 2014 edition of the track are provided as well.

Overview of the TREC 2013 federated web search track

T. Demeester, D. Drieschnigg, K. Zhou and D. Hiemstra


in Proc. 22nd Text Retr. Conf. (TREC 2013), Gaithersburg, MD, USA, 19-22 Nov. 2013.

@inproceedings{demeester2013trec,
author = {Demeester, Thomas and Drieschnigg, Dong and Zhou, Ke and Hiemstra, Djoerd},
title = {Overview of the TREC 2013 federated web search track},
booktitle = {Proc. 22nd Text Retr. Conf. (TREC 2013)},
month = {19--22 Nov.},
year = {2013},
address = {Gaithersburg, MD, USA},
url = {https://trec.nist.gov/pubs/trec22/papers/FEDERATED.OVERVIEW.pdf}
}

pubinproceedings

R. Aly, D. Hiemstra, D. Trieschnigg and T. Demeester, "Mirex and Taily at TREC 2013", in Proc. 22nd Text Retr. Conf. (TREC 2013), Gaithersburg, MD, USA, 19-23 Nov. 2013.

Mirex and Taily at TREC 2013

R. Aly, D. Hiemstra, D. Trieschnigg and T. Demeester


in Proc. 22nd Text Retr. Conf. (TREC 2013), Gaithersburg, MD, USA, 19-23 Nov. 2013.

We describe the participation of the Lowlands at the Web Track and the FedWeb track of TREC 2013. For the Web Track we used the Mirex Map-Reduce library with out-of-thebox approaches and for the FedWeb Track we adapted our shard selection method Taily for resource selection. Here, our results were above median and close to the maximum performance achieved.

Mirex and Taily at TREC 2013

R. Aly, D. Hiemstra, D. Trieschnigg and T. Demeester


in Proc. 22nd Text Retr. Conf. (TREC 2013), Gaithersburg, MD, USA, 19-23 Nov. 2013.

@inproceedings{aly2013,
author = {Aly, Robin and Hiemstra, Djoerd and Trieschnigg, Dolf and Demeester, Thomas},
title = {Mirex and Taily at TREC 2013},
booktitle = {Proc. 22nd Text Retr. Conf. (TREC 2013)},
month = {19--23 Nov.},
year = {2013},
address = {Gaithersburg, MD, USA},
url = {https://trec.nist.gov/pubs/trec22/papers/lowlands-web-federated.pdf}
}

pubinproceedings

R. Aly, D. Hiemstra and T. Demeester, "Taily: Shard selection using the tail of score distributions", in Proc. 36th Int. ACM SIGIR Conf. Research, Dublin, Ireland, 28 Jul.-1 Aug. 2013, pp. 673-682.

Taily: Shard selection using the tail of score distributions

R. Aly, D. Hiemstra and T. Demeester


in Proc. 36th Int. ACM SIGIR Conf. Research, Dublin, Ireland, 28 Jul.-1 Aug. 2013, pp. 673-682.

Search engines can improve their efficiency by selecting only few promising shards for each query. State-of-the-art shard selection algorithms first query a central index of sampled documents, and their effectiveness is similar to searching all shards. However, the search in the central index also hurts efficiency. Additionally, we show that the effectiveness of these approaches varies substantially with the sampled documents. This paper proposes Taily, a novel shard selection algorithm that models a query's score distribution in each shard as a Gamma distribution and selects shards with highly scored documents in the tail of the distribution. Taily estimates the parameters of score distributions based on the mean and variance of the score function's features in the collections and shards. Because Taily operates on term statistics instead of document samples, it is efficient and has deterministic effectiveness. Experiments on large web collections (Gov2, CluewebA and CluewebB) show that Taily achieves similar effectiveness to sample-based approaches, and improves upon their efficiency by roughly 20% in terms of used resources and response time.

Taily: Shard selection using the tail of score distributions

R. Aly, D. Hiemstra and T. Demeester


in Proc. 36th Int. ACM SIGIR Conf. Research, Dublin, Ireland, 28 Jul.-1 Aug. 2013, pp. 673-682.

@inproceedings{aly2013sigir,
author = {Aly, Robin and Hiemstra, Djoerd and Demeester, Thomas},
title = {Taily: Shard selection using the tail of score distributions},
booktitle = {Proc. 36th Int. ACM SIGIR Conf. Research},
month = {28 Jul.--1 Aug.},
year = {2013},
pages = {673--682},
address = {Dublin, Ireland},
doi = {10.1145/2484028.2484033}
}

pubinproceedings

T. Demeester, D. Nguyen, D. Trieschnigg, C. Develder and D. Hiemstra, "Snippet-based relevance predictions for federated web search", in Proc. 35th Eur. Conf. Inf. Retr. (ECIR 2013), Moscow, Russia, 24-27 Mar. 2013, pp. 697-700.

Snippet-based relevance predictions for federated web search

T. Demeester, D. Nguyen, D. Trieschnigg, C. Develder and D. Hiemstra


in Proc. 35th Eur. Conf. Inf. Retr. (ECIR 2013), Moscow, Russia, 24-27 Mar. 2013, pp. 697-700.

How well can the relevance of a page be predicted, purely based on snippets? This would be highly useful in a Federated Web Search setting where caching large amounts of result snippets is more feasible than caching entire pages. The experiments reported in this pa- per make use of result snippets and pages from a diverse set of actual Web search engines. A linear classifier is trained to predict the snippet- based user estimate of page relevance, but also, to predict the actual page relevance, again based on snippets alone. The presented results confirm the validity of the proposed approach and provide promising insights into future result merging strategies for a Federated Web Search setting.

Snippet-based relevance predictions for federated web search

T. Demeester, D. Nguyen, D. Trieschnigg, C. Develder and D. Hiemstra


in Proc. 35th Eur. Conf. Inf. Retr. (ECIR 2013), Moscow, Russia, 24-27 Mar. 2013, pp. 697-700.

@inproceedings{Demeester2013ECIR,
author = {Demeester, Thomas and Nguyen, Dong and Trieschnigg, Dolf and Develder, Chris and Hiemstra, Djoerd},
title = {Snippet-based relevance predictions for federated web search},
booktitle = {Proc. 35th Eur. Conf. Inf. Retr. (ECIR 2013)},
month = {24--27 Mar.},
year = {2013},
pages = {697--700},
address = {Moscow, Russia},
doi = {10.1007/978-3-642-36973-5_63}
}

pubinproceedings

T. Demeester, D. Nguyen, D. Trieschnigg, C. Develder and D. Hiemstra, "What snippets say about pages in federated web search", in Proc. 8th Asia Inf. Retr. Soc. Conf. (AIRS 2012), Tianjin, China, 17-19 Dec. 2012.

What snippets say about pages in federated web search

T. Demeester, D. Nguyen, D. Trieschnigg, C. Develder and D. Hiemstra


in Proc. 8th Asia Inf. Retr. Soc. Conf. (AIRS 2012), Tianjin, China, 17-19 Dec. 2012.

What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research questions from a global perspective. Our test collection covers the main Web search engines like Google, Yahoo!, and Bing, as well as a number of smaller search engines dedicated to multimedia, shopping, etc., and as such reflects a realistic Web environment.
Using a large set of relevance assessments, we are able to investigate the connection between snippet quality and page relevance. The dataset is strongly inhomogeneous, and although the assessors’ consistency is shown to be satisfying, care is required when comparing resources. To this end, a number of probabilistic quantities, based on snippet and page relevance, are introduced and evaluated.

What snippets say about pages in federated web search

T. Demeester, D. Nguyen, D. Trieschnigg, C. Develder and D. Hiemstra


in Proc. 8th Asia Inf. Retr. Soc. Conf. (AIRS 2012), Tianjin, China, 17-19 Dec. 2012.

@inproceedings{Demeester2012AIRS,
author = {Demeester, Thomas and Dong Nguyen and Dolf Trieschnigg and Chris Develder and Djoerd Hiemstra},
title = {What snippets say about pages in federated web search},
booktitle = {Proc. 8th Asia Inf. Retr. Soc. Conf. (AIRS 2012)},
month = {17--19 Dec.},
year = {2012},
address = {Tianjin, China},
doi = {10.1007/978-3-642-35341-3_21}
}

pubinproceedings

L. Mertens, T. Demeester, J. Deleu, P. Demeester and C. Develder, "UGent participation in the TAC 2012 entity-linking task", in Proc. 5th Text Analysis Conf. (TAC 2012), Gaithersburg, MD, USA, 14-15 Nov. 2012.

UGent participation in the TAC 2012 entity-linking task

L. Mertens, T. Demeester, J. Deleu, P. Demeester and C. Develder


in Proc. 5th Text Analysis Conf. (TAC 2012), Gaithersburg, MD, USA, 14-15 Nov. 2012.

This article describes in detail the system used by the UGent-IBCN team for participating in the Text Analysis Conference (TAC) 2012 Mono-Lingual Entity-Linking task. The pre- sented system is essentially rule-based, following a generic framework that is highly optimised for each label (i.e. with different rules for persons, organisations, and locations). The main contribution of this work is in identifying a number of label-specific issues and presenting simple heuristic solutions that yet allow building an efficient and effective system. These treated issues include resolving abbreviated organisation names, resolving popular nicknames, or taking into account American vs British spelling.

UGent participation in the TAC 2012 entity-linking task

L. Mertens, T. Demeester, J. Deleu, P. Demeester and C. Develder


in Proc. 5th Text Analysis Conf. (TAC 2012), Gaithersburg, MD, USA, 14-15 Nov. 2012.

@inproceedings{Mertens2012TAC,
author = {Mertens, Laurent and Demeester, Thomas and Deleu, Johannes and Demeester, Piet and Develder, Chris},
title = {UGent participation in the TAC 2012 entity-linking task},
booktitle = {Proc. 5th Text Analysis Conf. (TAC 2012)},
month = {14-15 Nov.},
year = {2012},
address = {Gaithersburg, MD, USA}
}

pubinproceedings

T.H. Van Duc, T. Demeester, J. Deleu and C. Develder, "UGent participation in the Microblog Track 2012", in Proc. Text Retr. Conf. (TREC 2012), Gaithersburg, MD, 6-9 Nov. 2012.

UGent participation in the Microblog Track 2012

T.H. Van Duc, T. Demeester, J. Deleu and C. Develder


in Proc. Text Retr. Conf. (TREC 2012), Gaithersburg, MD, 6-9 Nov. 2012.

In this paper, we describe the search system, developed at Ghent University for the TREC 2012 Microblog Track in order to rank Twitter messages or ‘tweets’ from a fixed corpus in response to a number of search requests. Our system ranks the tweets based on a Logistic Regression classifier trained with data from the Microblog Track 2011. The features used for training the classifier include local tweets features, but also, query expansion and tweet expansion features, based on external Web data, which appear to significantly improve results.

UGent participation in the Microblog Track 2012

T.H. Van Duc, T. Demeester, J. Deleu and C. Develder


in Proc. Text Retr. Conf. (TREC 2012), Gaithersburg, MD, 6-9 Nov. 2012.

@inproceedings{VanDuc2012TREC,
author = {Van Duc, Thong Hoang and Demeester, Thomas and Deleu, Johannes and Develder, Chris},
title = {UGent participation in the Microblog Track 2012},
booktitle = {Proc. Text Retr. Conf. (TREC 2012)},
month = {6--9 Nov.},
year = {2012},
address = {Gaithersburg, MD}
}

pubinproceedings

D. Nguyen, T. Demeester, D. Trieschnigg and D. Hiemstra, "Federated search in the wild: The combined power of over a hundred search engines", in Proc. 21st ACM Int. Conf. Inf. Knowl. Management (CIKM 2012), Maui, HI, USA, 29 Oct. - 2 Nov. 2012, pp. 1874-1878.

Federated search in the wild: The combined power of over a hundred search engines

D. Nguyen, T. Demeester, D. Trieschnigg and D. Hiemstra


in Proc. 21st ACM Int. Conf. Inf. Knowl. Management (CIKM 2012), Maui, HI, USA, 29 Oct. - 2 Nov. 2012, pp. 1874-1878.

Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federated search reflecting an actual web environment has been absent. As a result, it has been difficult to assess whether proposed systems are suitable for the web setting. We introduce a new test collection containing the results from more than a hundred actual search engines, ranging from large general web search engines such as Google and Bing to small domain-specific engines. We discuss the design and analyze the effect of several sampling methods. For a set of test queries, we collected relevance judgements for the top 10 results of each search engine. The dataset is publicly available and is useful for researchers interested in resource selection for web search collections, result merging and size estimation of uncooperative resources.

Federated search in the wild: The combined power of over a hundred search engines

D. Nguyen, T. Demeester, D. Trieschnigg and D. Hiemstra


in Proc. 21st ACM Int. Conf. Inf. Knowl. Management (CIKM 2012), Maui, HI, USA, 29 Oct. - 2 Nov. 2012, pp. 1874-1878.

@inproceedings{nguyen2012cikm,
author = {Nguyen, Dong and Demeester, Thomas and Trieschnigg, Dolf and Hiemstra, Djoerd},
title = {Federated search in the wild: The combined power of over a hundred search engines},
booktitle = {Proc. 21st ACM Int. Conf. Inf. Knowl. Management (CIKM 2012)},
month = {29 Oct. - 2 Nov.},
year = {2012},
pages = {1874--1878},
address = {Maui, HI, USA},
doi = {10.1145/2396761.2398535}
}

pubinproceedings

J. Deleu, A. De Moor, T. Demeester, B. Vermeulen and P. Demeester, "Named entity recognition on flemish audio-visual and news-paper archives", in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 38-41.

Named entity recognition on flemish audio-visual and news-paper archives

J. Deleu, A. De Moor, T. Demeester, B. Vermeulen and P. Demeester


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 38-41.

This paper describes a number of specific issues that we needed to deal with, in order to compose an accurate Named Entity Recognition tool on multimedia archives in Dutch. The considered data consists of archivation metadata from video collections, and large newspaper collections. For the video collections, the main challenge is to cope with a lack of capitalization in the metadata. To this end, specific capitalization features are calculated from Wikipedia. For the newspaper collections, the main concern is to create a system that maintains its performance over the course of many years. For that goal, special clustering features allow dealing with words that have not been encountered in training data. Results for the different components of the tool are reported on the target data, as well as on publicly available test data.

Named entity recognition on flemish audio-visual and news-paper archives

J. Deleu, A. De Moor, T. Demeester, B. Vermeulen and P. Demeester


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 38-41.

@inproceedings{deleu2012dir,
author = {Deleu, Johannes and De Moor, An and Demeester, Thomas and Vermeulen, Brecht and Demeester, Piet},
title = {Named entity recognition on flemish audio-visual and news-paper archives},
booktitle = {Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012)},
month = {23--24 Feb.},
year = {2012},
pages = {38--41},
address = {Ghent, Belgium}
}

pubinproceedings

L. Mertens, T. Demeester, J. Deleu, C. Develder and P. Demeester, "Context-based person identification for news collections", in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 26-29.

Context-based person identification for news collections

L. Mertens, T. Demeester, J. Deleu, C. Develder and P. Demeester


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 26-29.

In modern automated information extraction systems, Named Entity Disambiguation (NED) techniques are becoming increasingly important. The ambiguity of person names leads to a decrease in the output quality of search engines. This paper presents a two-stage rule-based NED model, based on a local and global context of the mentioned persons. A number of experiments with different scoring functions are reported, as well as a specific evaluation method to estimate the efficiency of the model on a real-life data collection in an unsupervised way.

Context-based person identification for news collections

L. Mertens, T. Demeester, J. Deleu, C. Develder and P. Demeester


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 26-29.

@inproceedings{Mertens2012,
author = {Mertens, Laurent and Demeester, Thomas and Deleu, Johannes and Develder, Chris and Demeester, Piet},
title = {Context-based person identification for news collections},
booktitle = {Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012)},
month = {23--24 Feb.},
year = {2012},
pages = {26--29},
address = {Ghent, Belgium}
}

pubinproceedings

S. Vandamme, T. Wauters, T. Demeester and F. De Turck, "Implementation and evaluation of query filtering in a role ontology-enhanced search engine", in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 34-37.

Implementation and evaluation of query filtering in a role ontology-enhanced search engine

S. Vandamme, T. Wauters, T. Demeester and F. De Turck


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 34-37.

We designed a role ontology-enhanced multimedia search enginewhere the user can search and subsequently filter news items withqueries and filter options describing the roles of the people whoappear in the items, specifically politicians. The system makes useof a separate knowledge base with domain information on politics.We demonstrate that when a user fails to recollect the name of a politician, role-based queries combined with filter options tailoredto the query and the result set, lead the user fast to both the namehe failed to recollect and the intended results in the multimediadatabase.

Implementation and evaluation of query filtering in a role ontology-enhanced search engine

S. Vandamme, T. Wauters, T. Demeester and F. De Turck


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 34-37.

@inproceedings{vandamme2012dir,
author = {Vandamme, Stijn and Wauters, Tim and Demeester, Thomas and De Turck, Filip},
title = {Implementation and evaluation of query filtering in a role ontology-enhanced search engine},
booktitle = {Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012)},
month = {23--24 Feb.},
year = {2012},
pages = {34--37},
address = {Ghent, Belgium}
}

pubinproceedings

B. Van Den Bossche, B. Vermeulen, J. Deleu, T. Demeester and P. Demeester, "MediaHaven: Multimedia asset management with integrated NER and categorization", in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 85-86.

MediaHaven: Multimedia asset management with integrated NER and categorization

B. Van Den Bossche, B. Vermeulen, J. Deleu, T. Demeester and P. Demeester


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 85-86.

In order to allow for flexible search and asset management on the textual metadata of multimedia archives, the extraction of information and especially named entities is an essential step. Practically, they are of great help for applications like facetted search, input assistance, search suggestions, linking assets, etc. This paper describes MediaHaven, a Media Asset Management (MAM) system, commercialized by Zeticon, a spin-off of Ghent University-IBBT. MediaHaven incorporates an advanced NER and categorisation system to improve the user experience

MediaHaven: Multimedia asset management with integrated NER and categorization

B. Van Den Bossche, B. Vermeulen, J. Deleu, T. Demeester and P. Demeester


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 85-86.

@inproceedings{vandenbossche2012dir,
author = {Van Den Bossche, Bruno and Vermeulen, Brecht and Deleu, Johannes and Demeester, Thomas and Demeester, Piet},
title = {MediaHaven: Multimedia asset management with integrated NER and categorization},
booktitle = {Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012)},
month = {23--24 Feb.},
year = {2012},
pages = {85--86},
address = {Ghent, Belgium}
}

pubinproceedings

T.H. Van Duc, T. Demeester, C. Develder and H. Shin, "Effectiveness of learning to rank for finding user similarity in social media", in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 30-33.

Effectiveness of learning to rank for finding user similarity in social media

T.H. Van Duc, T. Demeester, C. Develder and H. Shin


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 30-33.

This paper focuses on an automatic and accurate approach for finding similar users in social networks. Many types of social networks could benefit from such techniques, but the focus in this paper is on online photo services. The similarity between users needs to be considered on two different levels, i.e., the semantic similarity (or correspondence in tagging behavior), and the similarity in terms of social relations. In recent work, heuristic formulas were introduced for the tag commonness (TC) and the link strength (LS), with an adaptive combination scheme to describe how relevant each of these similarity aspects are for particular users, in order to define the user similarity. This paper presents an experiment, where a Learning-to-Rank approach is used to find suitable combinations of TC and LS related parameter values, hence taking into account the proficiency of users to tag their photos, and their noticeability in the online community, in order to obtain an overall user similarity. The user experiments show that the results with this learning-to-rank approach are significantly better than with a former, heuristic, approach.

Effectiveness of learning to rank for finding user similarity in social media

T.H. Van Duc, T. Demeester, C. Develder and H. Shin


in Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012), Ghent, Belgium, 23-24 Feb. 2012, pp. 30-33.

@inproceedings{VanDuc2012DIR,
author = {Van Duc, Thong Hoang and Demeester, Thomas and Develder, Chris and Shin, Hyoseop},
title = {Effectiveness of learning to rank for finding user similarity in social media},
booktitle = {Proc. 12th Dutch-Belgian Inf. Retr. Workshop (DIR 2012)},
month = {23--24 Feb.},
year = {2012},
pages = {30--33},
address = {Ghent, Belgium}
}

pubinproceedings

R. Aly and T. Demeester, "Towards a better understanding of the relationship between probabilistic models in IR", in Proc. 3rd Int. Conf. Theory Inf. Retr. (ICTIR 2011), Bertorino, Italy, 12-14 Sep. 2011, pp. 164-175.

Towards a better understanding of the relationship between probabilistic models in IR

R. Aly and T. Demeester


in Proc. 3rd Int. Conf. Theory Inf. Retr. (ICTIR 2011), Bertorino, Italy, 12-14 Sep. 2011, pp. 164-175.

Probability of relevance (PR) models are generally assumed to implement the Probability Ranking Principle (PRP) of IR, and recent publications claim that PR models and language models are similar. However, a careful analysis reveals two gaps in the chain of reasoning behind this statement. First, the PRP considers the relevance of particular documents, whereas PR models consider the relevance of any query-document pair. Second, unlike PR models, language models consider draws of terms and documents. We bridge the first gap by showing how the probability measure of PR models can be used to define the probabilistic model of the PRP. Furthermore, we argue that given the differences between PR models and language models, the second gap cannot be bridged at the probabilistic model level. We instead define a new PR model based on logistic regression, which has a similar score function to the one of the query likelihood model. The performance of both models is strongly correlated, hence providing a bridge for the second gap at the functional and ranking level. Understanding language models in relation with logistic regression models opens ample new research directions which we propose as future work.

Towards a better understanding of the relationship between probabilistic models in IR

R. Aly and T. Demeester


in Proc. 3rd Int. Conf. Theory Inf. Retr. (ICTIR 2011), Bertorino, Italy, 12-14 Sep. 2011, pp. 164-175.

@inproceedings{aly2011,
author = {Aly, Robin and Demeester, T.},
title = {Towards a better understanding of the relationship between probabilistic models in IR},
booktitle = {Proc. 3rd Int. Conf. Theory Inf. Retr. (ICTIR 2011)},
month = {12--14 Sep.},
year = {2011},
pages = {164--175},
address = {Bertorino, Italy},
doi = {10.1007/978-3-642-23318-0_16}
}