Federated Learning

TLDR: Federated learning trains a shared model across many devices without moving their data. The data stays local; only model updates are shared.

Federated learning trains a model across decentralized data. The data never leaves its source. Each device trains locally on its own data. Only model updates travel to a central server. The server combines those updates into a shared model. This protects privacy and cuts data transfer.

How Federated Learning Works

Distribute the Model: The server sends the current model to each device.
Train Locally: Each device trains on its own training data.
Send Updates: Devices return model updates, not raw data.
Aggregate: The server averages updates into a new global model.
Repeat: The cycle continues until the model converges.

Why Use Federated Learning

Privacy: Sensitive data stays on the device.
Compliance: It helps meet GDPR and HIPAA requirements.
Bandwidth: Model updates are far smaller than raw datasets.
Personalization: Models adapt to each device’s local data.

Challenges of Federated Learning

Non-IID Data: Device data distributions differ widely.
Communication Cost: Many update rounds add overhead.
Security: Updates can leak information without protection.
Device Variability: Phones differ in power and connectivity.

Federated vs Centralized Training

Centralized training pools all data in one place. Federated learning keeps data distributed across devices. Centralized is simpler and often more accurate. Federated wins on privacy and regulatory compliance. It often runs inference at the edge too.

Public Data Still Powers the Base Model

Federated learning protects private, on-device data. But models still need broad public data for pre-training. A strong foundation model starts on web-scale data. Bright Data’s datasets, AI data packages, and Web Scraper supply that public web data at scale.

Start free trial Start with Google