Welcome to the website of the Computational Linguistics Group (CLAUSE) at Bielefeld University.
Generally, we work on natural language processing and on learning models for language generation & understanding from data. We would like to develop machines that use language as flexibly and smoothly as humans do. This is why we are particularly interested in computational modeling of language use, visual language grounding, reference, pragmatics and dialogue.
Back row: Manar Ali, Larissa Koch, Clara Lachenmaier, Judith Sieker, Simeon Junker, Sanne Hoeken, Emilie Sitter, Omar Momen
Front row: Özge Alaçam, Sina Zarrieß, Bastian Bunzeck
News
- Mar ‘26: We are presenting 2 papers at EACL 2026 in Rabat/Morocco:
- Omar Momen, Emilie Sitter, Berenike Herrmann, Sina Zarrieß, Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets, EACL (Main)
- Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prevot, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, and Leshem Choshen, BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data, EACL (Main)
- Nov ‘25: We presented 5 papers at EMNLP 2025 and co-located workshops in Suzhou/China. Our paper “Are BabyLMs Deaf to Gricean Maxims?” (BabyLM Workshop) received an Outstanding Paper Award!
- Özge Alacam, Sanne Hoeken, Andreas Säuberli, Hannes Gröner, Diego Frassinelli, Sina Zarrieß, and Barbara Plank, Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze, EMNLP (Main)
- Marc Felix Brinner and Sina Zarrieß, SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts, EMNLP (Main)
- Marc Felix Brinner, Tarek Al Mustafa, and Sina Zarrieß, Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them, EMNLP (Findings)
- Raha Askari, Sina Zarrieß, Özge Alacam, and Judith Sieker, Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models, BabyLM Workshop – Outstanding Paper Award
- Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier, and Sina Zarrieß, Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning), BabyLM Workshop
- We hosted Bialogue 2025, the 29th Workshop on the Semantics and Pragmatics of Dialogue, in Bielefeld (September 3–5, 2025)
- Jul/Aug ‘25: Judith and Clara presented their paper LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High (Judith Sieker, Clara Lachenmaier, Sina Zarrieß) at CogSci 2025 in San Francisco!
- Jul/Aug ‘25: We presented 9 papers at ACL 2025 and adjacent workshops in Vienna!
- Clara Lachenmaier, Judith Sieker, Sina Zarrieß, Can LLMs Ground when they (Don’t) Know: A Study on Direct and Loaded Political Questions, ACL 2025 (Main)
- Bastian Bunzeck, Sina Zarrieß, Subword models struggle with word learning, but surprisal hides it, ACL 2025 (Main)
- Simeon Junker, Sina Zarrieß, SceneGram: Conceptualizing and Describing Tangrams in Scene Context, ACL 2025 (Findings)
- Simeon Junker, Manar Ali, Larissa Koch, Sina Zarrieß, Hendrik Buschmeier, Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?, ACL 2025 (Findings)
- Bastian Bunzeck, Daniel Duran, Sina Zarrieß, Do Construction Distributions Shape Formal Language Learning In German BabyLMs?, CoNNL 2025
- Sina Zarrieß, Simeon Junker, Judith Sieker, Özge Alacam, Components of Creativity: Language Model-based Predictors for Clustering and Switching in Verbal Fluency, CoNNL 2025
- Simeon Junker, ReproHum #0729-04: Human Evaluation Reproduction Report for “MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes”, GEM2 Workshop: Generation, Evaluation & Metrics
- Emilie Sitter, Omar Momen, Florian Steig, J. Berenike Herrmann, Sina Zarrieß, Annotating Spatial Descriptions in Literary and Non-Literary Text, The 19th Linguistic Annotation Workshop (LAW XIX)
- Sebastian Loftus, A. Mülthaler, Sanne Hoeken, Sina Zarrieß, Özge Alacam, Using LLMs and Preference Optimization for Agreement-Aware HateWiC Classification, 9th Workshop on Online Abuse and Harms (WOAH 2025)
- Jan ‘25: Bastian presented his paper Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas (with Daniel Duran, Leonie Schade and Sina Zarrieß) at COLING 2025 in Abu Dhabi
- Dec ‘24: Bastian’s article The richness of the stimulus: Constructional variation and development in child-directed speech (with Holger Diessel, U of Jena) has been published in First Language