PayPal had a problem that sounds like a good problem to have. After 25 years of growth, acquiring Venmo, Braintree, and other services, and processing billions of transactions, the company had accumulated 400 petabytes of data spread across a dozen siloed systems. What started as a mountain of valuable customer insight had become a mountain range with no trails between the peaks.
Providing a unified view of a small business owner who used PayPal for online sales and Venmo for local transactions required complex, costly processes. Fraud detection models, personalization engines, and real-time analytics were all constrained by the same fragmentation. As PayPal describes it, their own success in growth and innovation had created complexity that threatened their next evolution.
The Migration
PayPal consolidated multiple platforms including what is believed to have been the world’s largest Teradata deployment, along with Hadoop clusters, Redshift, Snowflake, and other systems. They chose BigQuery as the destination, citing its fully managed architecture, the ability to scale compute and storage independently, and, critically, its native integrations with AI. The familiar SQL interface also mattered at PayPal’s engineering scale.
The execution team, working with Google Cloud Consulting, migrated more than 300 petabytes of data, decommissioned around 25% of workloads, and did it all with zero downtime. No customer impact. No service interruption. For a payments company operating across approximately 200 markets, that constraint was non-negotiable.
What They Got
The outcomes PayPal reports are specific. Queries run 2.5x to 10x faster, including the complex queries used by data scientists. Data available for model training is 16x fresher. Feature engineering, a step that underpins every AI model PayPal builds, now has instant access to clean, governed data instead of stale exports from fragmented systems.
Those numbers matter because they translate directly into better fraud detection, faster personalization, and more responsive customer experiences. The data foundation and the AI ambition are no longer in tension.
Speed to AI Is the Real Prize
The migration unlocked something that’s hard to put a number on but easy to feel in a product roadmap: velocity. When data for model training is 16x fresher and feature engineering has instant access to clean, governed data, the time between “we want to build this AI feature” and “this AI feature is in production” compresses dramatically. PayPal isn’t just running faster queries. It’s shipping AI capabilities at a pace that was structurally impossible when its data lived in a dozen siloed systems. Every new model, every personalization engine, every fraud detection improvement now builds on the same unified foundation instead of fighting the infrastructure before the work even starts.
Why This Matters Beyond PayPal
The PayPal story is really an argument that data infrastructure is AI infrastructure. You can’t build reliable AI models on fragmented, stale, ungoverned data. The companies that will win the next wave of AI-powered financial services are the ones that solved their data foundation problem first.
For ISVs building on Google Cloud, the pattern is worth noting. BigQuery isn’t just a data warehouse; it’s the platform that makes AI features possible at scale. If your customers’ data lives in silos, or if your own product’s data is fragmented across systems acquired over time, the PayPal playbook applies. The migration is the prerequisite, not the destination.
The ETL pipeline question is related. If you’re still moving data between systems to enable AI, that overhead compounds every time you add a new model or feature. A unified data platform changes that calculus entirely.
If you’re an ISV who has grown through acquisition and is sitting on a fragmented data landscape, and your internal conversation sounds like “we have too much data, it would take too long, and we can’t afford any downtime” , that’s exactly what PayPal said before they did it. One of the world’s largest payment companies, operating across 200 markets, moved 300 petabytes with zero customer impact. The question isn’t whether it’s possible. The question is whether you want to be building AI features on a unified data foundation or still explaining to your board why your models are running on stale, siloed data two years from now.
Want to go deeper?
- PayPal’s historic data migration (Google Cloud Blog), PayPal’s own account of the migration, the technical choices, and the AI outcomes.
- PayPal’s Dataflow migration (Google Cloud Blog), How PayPal replaced their self-managed Flink infrastructure with Google Cloud Dataflow for real-time streaming analytics.
