AI and federated learning: Privacy enhancing technology for AI drug development
Pharmaceutical innovation increasingly relies on AI to unlock insights from vast and diverse datasets. Yet, traditional AI approaches often require centralizing sensitive patient data: a process fraught with regulatory hurdles and privacy risks, especially under the EU GDPR.
AI thrives on large, high-quality datasets. In the pharmaceutical industry, however, patient data is typically siloed across hospitals, clinical trials, and companies. Centralizing this data for AI training can conflict with GDPR requirements such as data minimization, purpose limitation, and consent management. Cross-border data transfers and fragmented transparency obligations further complicate compliance.
Federated learning (FL) is an innovative approach to AI that reimagines how organizations collaborate on data-driven projects. Rather than pooling raw data in a central location, FL allows each institution, be it a hospital, research center, or pharmaceutical company, to keep its data securely on site. AI models are trained locally, and only the resulting local model updates (not the underlying patient data) are shared with a central server. These updates are then aggregated to refine a global model, which benefits from the collective intelligence of all participants, without ever directly exposing sensitive information.
Federated learning directly addresses key GDPR challenges:
- Data minimization: No raw data leaves its origin, reducing unnecessary data transfers.
- Purpose limitation: Data is used locally for agreed training purposes, limiting secondary use.
- Consent management: If a patient withdraws consent, only the local site is affected.
- International transfers: Only aggregated updates cross borders, minimizing exposure to transfer restrictions.
- Security: No central repository of raw data means reduced risk of large-scale breaches.
While federated learning enhances privacy, it introduces new technical and legal considerations:
- Cybersecurity: Model updates may be vulnerable to adversarial attacks or data leakage.
- Interpretability: Limited access to raw data can reduce model transparency.
- Fairness: Bias may arise if some sites dominate the training process.
- Model content: Locally trained models may still contain residual personal data. Whether model updates can be considered "anonymous" requires a case‑by‑case assessment under EDPB Opinion 28/2024.
- Legal compliance: Local training must still be permitted and meet GDPR standards, and joint controllership questions may arise.
Federated learning is increasingly recognized as privacy enhancing technology in the context of AI development, particularly within the pharmaceutical sector. By enabling organizations to collaborate on AI projects without sharing raw patient data, FL supports both innovation and compliance with data protection requirements. As regulatory expectations and best practices continue to evolve, careful planning and robust privacy measures will remain important for successful adoption.

