An award-winning privacy-preserving data collaboration platform that enables companies to analyze and collaborate on consumer data to gain insights, build predictive models, and monetize data without sharing the raw data or compromising consumer privacy, is seeking a Senior Cloud Data Engineer who will be a lead technical contributor responsible for building and optimizing high-performance data processing engines.
Spark Optimization: Act as the internal SME for Spark internals; manage memory, shuffle tuning, and partitioning for cost-effective performance.
Cloud-Agnostic Development: Build pipelines using Python and Delta Lake, decoupling code from specific cloud providers and reducing reliance on GUI tools (e.g., ADF).
Refactoring & Modernization: Migrate complex SQL-based ETL into modular, testable, and maintainable Python libraries.
Lakehouse Engineering...