Legal news

The CNIL issues new recommendations on the use of legitimate interest for the development of AI systems

On June 19, 2025, the French data protection authority (CNIL) published two new practical guidelines [1] [2] detailing its recommendations regarding the use of legitimate interest as a legal basis for developing artificial intelligence systems (“AIS”), particularly in cases involving the harvesting of data available online (web scraping). These guidelines, which are part of the CNIL’s ongoing work on the development of AI systems and the creation of databases used for their training, complement the practical guidelines published in 2024 [3].

The CNIL reiterates that legitimate interest will often be the appropriate legal basis for the development of AIS by a private organization. A public body may also rely on this legal basis “only when the activities in question are not strictly necessary for the performance of its specific missions but pertain to other legally implemented activities (such as processing for HR management)”.

The CNIL’s recommendations also emphasize that the use of legitimate interest requires fulfilling three conditions :

  • the interest pursued by the controller must be legitimate. In the context of AIS development, conducting scientific research or developing new systems and features for users of a service will, for example and a priori, be considered as legitimate ;
  • the intended processing must be necessary to achieve the pursued interest (i.e., the interest cannot be achieved through less privacy-intrusive means). This implies, for example, ensuring that the development of the AIS is necessary to meet the controller’s intended goals ;
  • verify that the legitimate interest pursued by the controller does not disproportionately affect the interests, rights and freedoms of the data subjects, by balancing the rights and interests of the parties. In this regard, the CNIL highlights that the expected benefits of the AIS, for the controller and third parties such as end users, the interest of the society (e.g., improving access to essential services or healthcare), can help justify the data processing.
    Concrete examples of measures to limit the impact of the processing on individuals are also provided (e.g., timely anonymization, the use of synthetic data, or offering a prior and discretionary right to object).

If the controller uses web scraping to build training databases, its legitimate interest may also be invoked, provided certain conditions are met. Indeed, given the risks that this practice poses to the rights and interests of the data subjects, who have no control over the reuse of their data accessible online, its implementation requires particular vigilance.

The CNIL thus specifies that the controller must, in particular :

  • define clear data collection criteria in advance ;
  • exclude categories of data that are not necessary from collection (e.g., through filters or by excluding certain websites) ;
  • respect the reasonable expectations of data subjects regarding the processing of their data, taking into account, for example, whether the data is publicly accessible and the nature of the websites being scraped ;
  • exclude from collection websites that oppose the harvesting of their content for the purpose of creating databases for training AIS (e.g., by using robots.txt exclusion protocols or CAPTCHA mechanisms) ;
  • limit collection to freely accessible data (i.e., data viewable without login or account creation) ;
  • inform the data subjects, for example by publishing a list of websites affected by web scraping or by having the information disseminated by the publishers of the websites concerned.

The CNIL will soon publish further recommendations concerning the status of AI models under the GDPR, security aspects in AIS development, and data annotation practices.