
I am a PhD researcher affiliated with Inria Paris, Sorbonne Université, and Qatent (A Questel Company). My research focuses on patent NLP, representation learning, information retrieval, and scientific and innovation discovery through large-scale scientific and technical corpora.
I am particularly interested in the potential and applications of self-supervised learning for long-form text, especially scientific and technical documents.
I am open to collaboration on patent NLP, information retrieval, and scientific discovery, and happy to connect with researchers and students.
PhD in Computer Science, Inria Paris (ALMAnaCH Team) and Sorbonne Université, 2023-present
Industrial PhD conducted in collaboration with Qatent (A Questel Company). Research topic: patent representation learning for innovation generation and technical trend analysis.
Supervisors: Benoît Sagot, Éric de La Clergerie, and Kim Gerdes.
Master 2 in Artificial Intelligence, Université Paris-Saclay, 2020-2021
GPA: 16.7/20, ranked 1/17. Main coursework included generative models, graphical models, natural language processing, multilingual NLP, and advanced optimisation.
Engineering Diploma, ENSIIE - L’École Nationale Supérieure d’Informatique pour l’Industrie et l’Entreprise, 2018-2021
Engineering curriculum in applied mathematics and computer science.
Bachelor in Information and Computing Science, Xidian University, 2015-2018
GPA: 3.60/4.0. Main coursework included mathematical analysis, matrix theory, optimisation, functional analysis, probability theory, and statistics.
Part-time Lecturer, Introduction to Natural Language Processing in English, Institut national des langues et civilisations orientales (INALCO), Sep. 2024-May 2025
Lecturer for the course Introduction to Natural Language Processing in English, a master’s-level course for students in the PluriTAL program.
Teaching Assistant, Introduction to Machine Learning, Université Paris-Saclay, Sep. 2024-Dec. 2024
Assisted in teaching and supporting the L3 undergraduate course Introduction to Machine Learning, taught by François Landes, including labs, assignments, and student guidance.
Research Engineer, Inria Paris, Oct. 2021-Nov. 2022
Worked on fine-grained patent classification in collaboration with INPI (French Intellectual Property Office), under the supervision of Kim Gerdes, Benoît Sagot, and Samir Ghamri Doudane.
Research Internship, LISN, Mar. 2021-Aug. 2021
Worked on technological term recognition and hypernym/hyponym prediction on patent texts, under the supervision of Kim Gerdes.
Younes Djemmal, You Zuo, Kim Gerdes, and Kirian Guiller. Citation-Driven Multi-View Training for Patent Embeddings: QaECTER and Sophia-Bench. Preprint, 2026. Preprint HAL PDF
You Zuo, Kim Gerdes, Éric de La Clergerie, and Benoît Sagot. Patent Representation Learning via Self-supervision. Preprint, 2025. Preprint HAL PDF Code
You Zuo, Kim Gerdes, Éric de La Clergerie, and Benoît Sagot. PatentEval: Understanding Errors in Patent Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2024). Conference ACL PDF Code
You Zuo, Benoît Sagot, Kim Gerdes, Houda Mouzoun, and Samir Ghamri Doudane. Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons. In Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2023). Conference ACL PDF Code
You Zuo, Houda Mouzoun, Samir Ghamri Doudane, Kim Gerdes, and Benoît Sagot. Patent Classification using Extreme Multi-label Learning: A Case Study of French Patents. In PatentSemTech 2022 Workshop, co-located with the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022). Workshop HAL PDF
You Zuo, Yixuan Li, Alma Parias Garcia, and Kim Gerdes. Technological Taxonomies for Hypernym and Hyponym Retrieval in Patent Texts. In Terminology & Ontology: Theories and Applications (ToTh 2022). Conference HAL PDF Code