Key Responsibilities:
·
Pipeline Development: Build and maintain robust ETL/ELT pipelines using Databricks (PySpark/Spark SQL), and related Azure data services. Ensure pipelines handle batch and streaming workloads reliably at scale.
·
Legacy Modernisation: Lead the migration of legacy data pipelines to modern cloud-native solutions on Azure and Databricks. Define migration strategies, manage cutover plans, and ensure zero data loss during transition.
·
Data Quality & Governance: Establish and enforce data quality frameworks, validation rules, and monitoring across all pipelines. Implement data lineage tracking, cataloguing (Unity Catalog), and access governance in line with organisational policies.
·
Data Modelling & Design: Design logical and physical data models for analytical and operational workloads. Define schemas, partitioning strategies, and optimisation patterns for performance and cost efficiency across Delta Lake and Azure SQL.
·
Stakeholder Engagement: : Engage with the client’s Data & Analytics team on technical discussions, solution walkthroughs, and progress updates. Participate in requirement workshops and provide technical input to support estimation and planning.
·
DevOps & Automation: Build and maintain CI/CD pipelines for data workloads using Azure DevOps. Implement automated testing for data pipelines, and deployment automation across environments.
·
Mentorship & Standards: Review code, enforce engineering standards, and mentor offshore and onsite data engineers. Drive adoption of best practices for Spark, SQL, Python, testing, and version control across the team.