[GuestPost] How the European Patent Office uses AI to facilitate patent searches

13th November 2024

http://ipkitten.blogspot.com/2024/11/guestpost-how-european-patent-office.html

The AmeriKat has the t-shirt…now what?

In a second of a series on AI and patents from our KatFriends at GJE, Kate Voller reports on a recent CIPA webinar with the EPO on how the EPO is leveraging AI tools in examination – with the key message of “assisting”, not “replacing” examiners.

Over to Kate for the report:

“The European Patent Office (EPO) has embraced artificial intelligence (AI) to enhance the efficiency of its patent document searching process. In a recent CIPA webinar, Alexander Klenner-Bajaja of the EPO explained how the EPO leverages AI tools to support examiners, increasing productivity and improving the quality of patent searches.

We learnt that at the core of the EPO’s AI integration are several specialised tools designed to streamline the search process. Unsurprisingly, natural language processing (NLP) and machine translation technologies help translate and interpret the often complex language and claim-specific syntax. Computer vision is another key tool, using machine learning and neural networks to interpret and analyse visual content in patent documents, including figures, tables, and other graphical elements. This AI-powered technology automatically decodes information from graphical elements, which would often be overlooked in a text-based search alone.

The EPO were keen to emphasize that its AI tools are intended to assist, not replace, human examiners. While AI manages vast amounts of data, the human element remains crucial for final decisions. Examiners retain responsibility for the final review of relevant documents, ensuring their expertise and judgment remain central to the patent search and examination process.

A key advancement in the EPO’s AI efforts came in 2020 with the introduction of the EP-AutoCla model, an AI-powered classification system. The classification system is complex, structured hierarchically into a number of main sections each divided into classes, groups, and subgroups. EP-AutoCla automatically classifies patent applications, relieving examiners of the time-consuming job of classifying documents manually.

It was interesting to learn that training the EP-AutoCla model presented some challenges, particularly due to sparse data in the lower branches of the classification tree. Manual work was required to enhance the training dataset at these lower levels to ensure accuracy of the training process. The model uses supervised machine learning, trained on a dataset comprising 8 million manually classified documents, with an additional 800,000 documents were used for testing the model. Now fully integrated into the EPO’s search engine, EP-AutoCla suggests classifications and provides a confidence score indicating the likelihood of correctness.

The EPO’s AI-enhanced search process involves several steps to narrow down relevant documents for examiners. Vector space modelling, which converts documents into vector representations, allows comparisons based on conceptual similarity rather than just keyword matching. This helps narrow down millions of potential documents to a manageable number for examiners to review. A k-nearest neighbours (k-NN) algorithm then generates a shortlist of highly relevant documents, even if they don’t share identical keywords but are conceptually similar to the new patent application. Examiners review these shortlisted documents to finalize the search results. The success of this process is measured by whether at least one highly relevant “X citation” document appears in the pre-search results produced by the k-NN algorithm. Around 60% of the top 80 documents generated include an “X citation, demonstrating the system’s high accuracy. Though computationally intensive, this process saves examiners significant time, allowing them to focus on only the most relevant documents.

The EPO is continuing to develop AI-driven features to further enhance patent searches and legal research. One upcoming feature is a figure content analysis model that identifies reference signs in prior art figures and maps them to corresponding text in patent descriptions, enabling more precise figure analysis not just based on pixel data but on the content represented in the images, regardless of orientation or style. A similar model is being designed for chemical formulae.

In legal research, the EPO is working on an interactive platform, similar to ChatGPT, but specialized for legal documents. This tool will answer questions about case law and legal texts, providing evidence and citations to support its responses and minimize hallucinations. We were also excited to learn that a new version of Espacenet is being developed that will allow users to perform natural language queries, such as “find patents by company X about concept Y.”

Data privacy is a common concern when using AI with patent applications, particularly regarding whether uploading documents to an AI model constitutes a public disclosure. The EPO clarified that its classification model should not be used for unpublished documents, as it runs on a third-party cloud platform. However, the internal AI models, which operate on private servers, can be safely used for all document types, including newly filed and unpublished applications.

In conclusion, the EPO’s integration of AI marks a significant evolution in how patent documents are searched, classified, and analysed. While AI automates many aspects of the process, human examiners remain essential. The EPO’s AI tools help examiners manage growing volumes of data, making patent searches more efficient, accurate, and comprehensive. By continuously developing new AI applications the EPO is setting a new standard for the future of patent examination.”

Content reproduced from The IPKat as permitted under the Creative Commons Licence (UK).