You Zuo
You Zuo

Industrial PhD Student

I am a PhD researcher affiliated with Inria Paris, Sorbonne Université, and Qatent (A Questel Company). My research focuses on patent NLP, representation learning, information retrieval, and scientific and innovation discovery through large-scale scientific and technical corpora.

I am particularly interested in the potential and applications of self-supervised learning for long-form text, especially scientific and technical documents.

I am open to collaboration on patent NLP, information retrieval, and scientific discovery, and happy to connect with researchers and students.

🎓 Education
  • PhD in Computer Science, Inria Paris (ALMAnaCH Team) and Sorbonne Université, 2023-present
    Industrial PhD conducted in collaboration with Qatent (A Questel Company). Research topic: patent representation learning for innovation generation and technical trend analysis.
    Supervisors: Benoît Sagot, Éric de La Clergerie, and Kim Gerdes.

  • Master 2 in Artificial Intelligence, Université Paris-Saclay, 2020-2021
    GPA: 16.7/20, ranked 1/17. Main coursework included generative models, graphical models, natural language processing, multilingual NLP, and advanced optimisation.

  • Engineering Diploma, ENSIIE - L’École Nationale Supérieure d’Informatique pour l’Industrie et l’Entreprise, 2018-2021
    Engineering curriculum in applied mathematics and computer science.

  • Bachelor in Information and Computing Science, Xidian University, 2015-2018
    GPA: 3.60/4.0. Main coursework included mathematical analysis, matrix theory, optimisation, functional analysis, probability theory, and statistics.

🧑‍🏫 Teaching
🧪 Research Experience
  • Research Engineer, Inria Paris, Oct. 2021-Nov. 2022
    Worked on fine-grained patent classification in collaboration with INPI (French Intellectual Property Office), under the supervision of Kim Gerdes, Benoît Sagot, and Samir Ghamri Doudane.

  • Research Internship, LISN, Mar. 2021-Aug. 2021
    Worked on technological term recognition and hypernym/hyponym prediction on patent texts, under the supervision of Kim Gerdes.

📰 News
  • May 2026: Our paper Learning Sparse Representations for Patent Search via Geometric Covering of Embedding Spaces was accepted for an oral presentation at CORIA 2026.
  • February 2026: Released a new preprint on citation-driven multi-view training for patent embeddings.
  • October 2025: Released a preprint on self-supervised patent representation learning.
  • September 2024: Started teaching Introduction to Natural Language Processing in English at INALCO and served as a teaching assistant for Introduction to Machine Learning at Université Paris-Saclay.
  • June 2024: Published PatentEval at NAACL 2024.
  • July 2023: Attended the 13th Lisbon Machine Learning School (LxMLS 2023).
  • June 2023: Published work on French patent classification at TALN 2023.
  • March 2023: Started my PhD at Inria Paris and Sorbonne Université in collaboration with Qatent (A Questel Company).
  • July 2022: Served as a student volunteer at SIGIR 2022, where I also presented our work on French patent classification at the PatentSemTech workshop.
  • October 2021: Gave a seminar talk at the ALMAnaCH team, Inria, on Tech-Taxonomy with a Text to Text Transfer Transformer, jointly with Kim Gerdes.
📚 Publications