http://ipkitten.blogspot.com/2024/09/guest-post-german-court-finds-laions.html

The IPKat has received and is pleased to host the following guest post by former Kat Mirko Brüß (Rechtsanwalt Mirko Brüß) on yesterday’s important Hamburg decision on unlicensed text and data mining (TDM). This is the first decision in Europe tackling Articles 3 and 4 of the DSM Directive. Without further ado, then, here’s what Mirko has to say!

German Court finds LAION’s copying of images non-infringing

by Mirko Brüß
In what appears to be a first in Europe, the District Court of Hamburg has delivered a long-awaited ruling in a case brought by German photographer Robert Kneschke against LAION gemeinnütziger e.V., a German non-profit organization (case No. 310 O 227/23). While the judgment provides some useful answers, many big questions regarding the interplay of copyright and generative AI remain unresolved.
As a result of the very narrow assertion of infringement by Kneschke, the Court did not address the legality of training of AI models or subsequent outputs generated with such tools. Kneschke accused LAION of making a copy of one of his images during the creation of the LAION-5B dataset, thereby infringing his reproduction right under Section 16 UrhG (the German Copyright Code). Some background on generative AI and the technology (and content) that powers such tools can be found here and here.

Background

To create the LAION-5B dataset, LAION first obtained a preexisting dataset from Common Crawl. In essence, such a dataset contains pairs of images and text describing the images. In this case, the dataset contained almost 6 billion URLs to images that were available to the public. LAION downloaded all of the images, used their own software to ensure that the image description matches the images’ content, and thus created an enhanced dataset (LAION-5B) that was subsequently used by many generative AI tools for training purposes, including Stable Diffusion.
Non-infringing Kat-training

The dataset included a hyperlink to an image on Bigstockphoto, a website that Kneschke used to promote and sell his images. Before the Court, it remained undisputed that LAION had in fact downloaded a copy of one of Kneschke’s photo that was available in low resolution and watermarked on Bigstockphoto in 2021. The website’s TOS at that time read:

You may not use automated programs, applets, bots or the like to access the Bigstock.com website or any content thereon for any purpose, including, by way of example only, downloading Content, indexing, scraping or caching any content on the website.

Three potential exceptions

As the right of reproduction is an exclusive right of the author and LAION had not obtained a license from Kneschke for copying the image, the court had to assess whether LAION could rely on one of three exceptions:
The judges quickly dismissed Section 44a UrhG, as they found the copying to neither be “transient” nor “incidental”. In the present case, the image file was deliberately downloaded in order to analyze it using specific software. This meant that the downloading was not a mere accompanying process to the analysis carried out, but a conscious and actively controlled process that took place prior to the analysis.
Sections 44b UrhG and 60d UrhG both regulate TDM. These exceptions permit third parties to reproduce lawfully accessed works “in order to carry out text and data mining”, whereas TDM means the “automated analysis of individual or several digital or digitized works for the purpose of gathering information, in particular regarding patterns, trends and correlations”.
In an important finding, the Court held that the sort of scraping (followed by technical analysis) carried out by LAION can be considered TDM within the meaning of Section 44b UrhG. LAION’s analysis of the image file to match it with a pre-existing image description would readily amount to an analysis for the purpose of obtaining information about “correlations” (namely, the question of the non-/concordance of images and image descriptions). The judges expressly did not rule on the question of whether or not the training of artificial intelligence as a whole is subject to the limitation of Section 44b UrhG.
The application of the TDM limitation would also be compatible with three-step test (Article 5(5) InfoSoc Directive). The reproduction in the present case was limited to the purpose of analyzing the image files for their correspondence with a pre-existing image description, along with subsequent posting in a data set. It is not apparent, and is not asserted by the plaintiffs, that the exploitation possibilities of the works concerned would be adversely affected by such use, the Court found.
While it is true that the data set created in this way may subsequently be used to train artificial neural networks and that the AI-generated content created in the process may compete with the works of (human) authors, this alone would not justify regarding the creation of data sets as an impairment of the rights of exploitation of works within the meaning of Art. 5(5) InfoSoc Directive.
A large difference between Section 44b UrhG and Section 60d UrhG lies in the fact that 44b(3) UrhG allows an opt-out, while Section 60d UrhG does not. Acts of reproductions under 44b UrhG are permitted only if they have not been reserved by the rightholder. A reservation of use in the case of works which are available online is effective only if it is made in a machine-readable format. As shown above, the Bigstockphoto website made use of an opt-out, by prohibiting “downloading Content, indexing, scraping or caching any content on the website” by bots or other automated programs via their terms of service. While this is technically clearly “machine readable”, the parties had very different understandings of whether it was a sufficient opt-out in the meaning of Section 44b UrhG. Most of the oral arguments in July 2024 resolved around this issue, with LAION asking for an opt-out to be made via “robots.txt” or a similar format, and Kneschke arguing (and pointing to Recital 18 of the DSM-Directive, which states it should be considered appropriate to reserve those rights by the use of machine-readable means, including metadata and terms and conditions of a website or a service).
Unfortunately, the Court did not have to answer this question, as it found that LAION could rely on the exception of Section 60d UrhG that applies even when rights are expressly reserved.
Section 60d UrhG permits TDM for scientific research purposes under certain conditions. Universities, research institutes and other establishments conducting scientific research may make reproductions of works if they 1. pursue non-commercial purposes, 2. reinvest all their profits in scientific research or 3. act in the public interest based on a state-approved mandate.
This exception does not extend to research organisations cooperating with a private enterprise which exerts a certain degree of influence on the research organisation and has preferential access to the findings of its scientific research.
The Court found that the reproduction was made by LAION for research purposes. Although the creation of the data set as such may not yet be associated with a gain in knowledge, it was considered a fundamental step with the aim of using the data set for the purpose of later knowledge gain. It would be sufficient that the data set was published free of charge and thus made available (also) to researchers in the field of artificial neural networks.
LAION was also pursuing non-commercial purposes according to the ruling. The non-commercial purpose of LAION in relation to the creation of the LAION-5B dataset would be apparent from the fact that the LAION indisputably makes it publicly available free of charge. Regarding this aspect, the organization and financing of LAION would be irrelevant.
Kneschke had argued that there had been a collaboration with Stability AI, which he asserted had direct influence on the LAION through the financing of the dataset in question and the occupation of relevant positions at LAION by its own employees. According to an interview with its founder and managing director, Stability AI financed the LAION-5B dataset, Kneschke claimed.
The Hamburg Court did not find the counter-exception of Section 60d(2)3 UrhG to be met. The fact that two members LAION also worked for a commercial company (likely Stability AI) would not prove that this company had a decisive influence on LAION’s research work. Kneschke had also failed to even assert that a private enterprise had preferential access to the findings of LAION’s scientific research. As a result, LAION could rely on the exception of Section 60d UrhG for the copy made in the creation of the LAION-5B dataset.

Some dicta on opting-out

As the judges were quite aware of the importance of the opt-out question from Section 44b UrhG they did not have to answer, they were kind enough to include some (non-binding) thoughts in their ruling. They held that there are some indications that the exception provided for in Section 44b (2) UrhG does not apply in the present case since a validly declared reservation of use within the meaning of subsection 3 of the provision had been made by Kneschke. In particular, the reservation of use declared on the Bigstockphoto website would be likely to meet the requirements for machine readability within the meaning of Section 44b(3) 2 UrhG:
However, the board tends to also regard a reservation of use formulated in “natural language” as “machine-understandable”. Nevertheless, the question of whether and under what specific conditions a reservation formulated in “natural language” can also be regarded as “machine-understandable” will always have to be answered in relation to the technical development existing at the relevant time of use.
Accordingly, the European legislator has also stipulated in the AI Regulation that providers of AI models must have a strategy in place, in particular, to identify and comply with a reservation of rights asserted in accordance with Art. 4(3) of the DSM Directive “including through state-of-the-art technologies” (Art. 53 (1) c AI Regulation). However, these ‘state-of-the-art technologies’ undoubtedly also include AI applications that are capable of capturing the content of text written in natural language.
The judges found it would be a certain contradiction to enable the providers of AI models to develop ever more powerful text-understanding and – creative AI models, but on the other hand not to demand the use of existing AI models within the scope of the limitation of Section 44b(3) sentence 2 UrhG to find, understand and comply with reservations made by rightsowners.
Kneschke has one month to appeal the judgment. Given its “first-of-its-kind” nature and the unanswered questions, it seems likely the case will go to the next instance.

Content reproduced from The IPKat as permitted under the Creative Commons Licence (UK).