Transformer models speaking the language of patents: Distribution, Classification and Search of patent applications with artificial intelligence at the EPO using state of the art language model architectures.

Artificial Intelligence has been identified in EPO`s strategic plan as one of the key drivers to increase efficiency and quality at several stages of the patent grant process. In this presentation we focus on EP-BERT, a domain specific language model trained by EPO`s data science department. EP-BERT was trained from scratch to “understand” the special terminology and syntax used in patents. We show motivation, training, fine tuning and evaluation of models for three downstream tasks: Pre-Classification (the distribution of patents), Classification of specific CPC sections and Search, using a new approach to find relevant documents in our ever growing corpus with Neural Ranking. Our models are trained based on the results of our examiner colleagues and we demonstrate an innovative way to incorporate the whole organization in actively developing AI tools and hence increasing AI literacy at the EPO in general to solve business critical challenges.