Welcome to the website of the Computational Linguistics Group (CLAUSE) at Bielefeld University.

Generally, we work on natural language processing and on learning models for language generation & understanding from data. We would like to develop machines that use language as flexibly and smoothly as humans do. This is why we are particularly interested in computational modeling of language use, visual language grounding, reference, pragmatics and dialogue.

The members of the CLAUSE group Back row: Manar Ali, Larissa Koch, Clara Lachenmaier, Judith Sieker, Simeon Junker, Sanne Hoeken, Emilie Sitter, Omar Momen
Front row: Özge Alaçam, Sina Zarrieß, Bastian Bunzeck

News

We hosted Bialogue 2025, the 29th Workshop on the Semantics and Pragmatics of Dialogue, in Bielefeld (September 3–5, 2025)
Jul/Aug ‘25: Judith and Clara presented their paper LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High (Judith Sieker, Clara Lachenmaier, Sina Zarrieß) at CogSci 2025 in San Francisco!
Jul/Aug ‘25: We presented 10 papers at ACL 2025 and adjacent workshops in Vienna!
- Clara Lachenmaier, Judith Sieker, Sina Zarrieß, Can LLMs Ground when they (Don’t) Know: A Study on Direct and Loaded Political Questions, ACL 2025 (Main)
- Bastian Bunzeck, Sina Zarrieß, Subword models struggle with word learning, but surprisal hides it, ACL 2025 (Main)
- Marc Felix Brinner, Sina Zarrieß, SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts, ACL 2025 (Findings)
- Simeon Junker, Sina Zarrieß SceneGram: Conceptualizing and Describing Tangrams in Scene Context ACL 2025 (Findings)
- Simeon Junker, Manar Ali, Larissa Koch, Sina Zarrieß, Hendrik Buschmeier, Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?, ACL 2025 (Findings)
- Bastian Bunzeck, Daniel Duran, Sina Zarrieß, Do Construction Distributions Shape Formal Language Learning In German BabyLMs?, CoNNL 2025
- Sina Zarrieß, Simeon Junker, Judith Sieker, Özge Alacam, Components of Creativity: Language Model-based Predictors for Clustering and Switching in Verbal Fluency, CoNNL 2025
- Simeon Junker, ReproHum #0729-04: Human Evaluation Reproduction Report for “MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes”, GEM2 Workshop: Generation, Evaluation & Metrics
- Emilie Sitter, Omar Momen, Florian Steig, J. Berenike Herrmann, Sina Zarrieß, Annotating Spatial Descriptions in Literary and Non-Literary Text, The 19th Linguistic Annotation Workshop (LAW XIX)
- Sebastian Loftus, A. Mülthaler, Sanne Hoeken, Sina Zarrieß, Özge Alacam, Using LLMs and Preference Optimization for Agreement-Aware HateWiC Classification, 9th Workshop on Online Abuse and Harms (WOAH 2025)
Jan ‘25: Bastian presented his paper Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas (with Daniel Duran, Leonie Schade and Sina Zarrieß) at COLING 2025 in Abu Dhabi
Dec ‘24: Bastian’s article The richness of the stimulus: Constructional variation and development in child-directed speech (with Holger Diessel, U of Jena) has been published in First Language
Nov ‘24: We presented 5 main conference papers and 4 workshop papers at EMNLP in Miami:
- Main conference: Hateful Word in Context Classification (Sanne, Sina & Özge), Eyes Don’t Lie: Subjective Hate Annotation and Detection with Gaze (Özge, Sanne & Sina), Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training (Marc & Sina), The Illusion of Competence: Evaluating the Effect of Explanations on Users’ Mental Models of Visual Question Answering Systems (Judith, Simeon, Ronja & Sina), Evaluating Diversity in Automatic Poetry Generation (Sina)
- GenBench: The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns (Bastian & Sina)
- BlackboxNLP: How LLMs Reinforce Political Misinformation: Insights from the Analysis of False Presuppositions (Judith, Clara & Sina, non-archival), Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly (Bastian & Sina, non-archival)
- BabyLM challenge: Graphemes vs. phonemes: battling it out in character-based language models (Bastian & Sina)
Simeon & Sina received the best paper award at INLG 2024 for their paper Resilience through Scene Context in Visual Referring Expression Generation
Oct ‘24: Bastian presented his paper Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly at the MILLing conference in Gothenburg
Sep ‘24: Bastian presented his ongoing work on Constructions in child-directed speech at the 10th International Conference of the German Cognitive Linguistics Association
Jun ‘24: We hosted the 3rd annual NLG in the Lowlands workshop
Mar ‘24: Clara presented her late breaking report Towards Understanding the Entanglement of Human Stereotypes and System Biases in Human–Robot Interaction at the International Conference on Human Robot Interaction (HRI) 2024 in Boulder (Colorado)!
Dec ‘23: We presented four papers at EMNLP 2023 (and adjacent workshops) in Singapore:
- Methodological Insights in Detecting Subtle Semantic Shifts with Contextualized and Static Language Models (Sanne & Özge)
- Towards Detecting Lexical Change of Hate Speech in Historical Data (Sanne, Sina & Özge)
- When Your Language Model Cannot Even Do Determiners Right: Probing for Anti-Presuppositions and the Maximize Presupposition! Principle (Judith & Sina)
- GPT-wee: How Small Can a Small Language Model Really Get? (Bastian & Sina)