Offene Abschlussarbeiten

Measuring Vocabulary Overlap from Differentially-private Text Paraphrasing (Bachelor/Master)

An open question concerns how privacy mechanisms affect lexical fidelity and content retention. While most evaluations of private text generation focus on utility metrics without directly quantifying how much of the lexical material remains after privacy perturbation, the overlap between the vocabulary spaces is an informative proxy for memorization risks (e.g., if overlap is high between original and privatized text) or utility degradation (e.g. if overlap is high between diverse topical domains).

References

Meisenbacher, S., Chevli, M., Vladika, J., & Matthes, F. (2024). DP-MLM: Differentially private text rewriting using masked language models. arXiv preprint arXiv:2407.00637.

Arnold, S. (2025). Inspecting the Representation Manifold of Differentially-Private Text. arXiv preprint arXiv:2503.14991.

Kontakt: Stefan Arnold

A Corpus-based Study on Zero/That Complementizers in Conversational AI (Bachelor/Master)

In English, the most common type of object clause is introduced by the complementizer that, as in „I know that Peter will arrive soon.“ However, in most contexts this complementizer can be omitted, resulting in an asyndetic zero clause, as in „I know Peter will arrive soon.“

This alternation between that and zero complementizers has been widely studied in natural language, but much less attention has been given to how such variation manifests in computationally generated language. Investigating the distribution and conditions of zero vs. that complementizers in AI-generated text can provide valuable insights into how closely machine language approximates human usage patterns, the degree of syntactic naturalness achieved by language models, and potential stylistic biases in AI communication.

WildChat provides a corpus of one million user-agent conversations.

References

Conde-Silvestre, J. C., & Calle-Martín, J. (2015). Zero that-clauses in the history of English. A historical sociolinguistic approach (1424–1681). Journal of Historical Sociolinguistics, 1(1), 57-86.

Shank, C., Bogaert, J. V., & Plevoets, K. (2016). The diachronic development of zero complementation: A multifactorial analysis of the that/zero alternation with think, suppose, and believe. Corpus Linguistics and Linguistic Theory, 12(1), 31-72.

Kontakt: Stefan Arnold