Lead PySpark Engineer

As a Lead PySpark Engineer, you will design, develop, and optimise complex data processing solutions on AWS. You will work hands-on with PySpark, modernise Legacy data workflows, and support large-scale SAS-to-PySpark migration programmes. This role requires strong engineering discipline, deep data expertise, and the ability to deliver production-ready data pipelines within a financial services environment.

Skill Profile:

PySpark - P3 (Advanced)
AWS - P3 (Advanced)
SAS - P1 (Foundational)

Key Responsibilities Technical Delivery

Design, develop, and fix complex PySpark code for ETL/ELT and data-mart workloads.
Convert and refactor SAS code into PySpark using SAS2PY tooling and manual optimisation.
Build production-ready PySpark solutions that are scalable, maintainable, and reliable.
Modernise and stabilise Legacy data workflows into cloud-native architectures.
Ensure accuracy, quality, and reliability across data transformation processes.

Cloud & Data Engineering (AWS-Focused)

Build and deploy data pipelines using AWS services such as EMR, Glue, S3, Athena.
Optimise Spark workloads for performance, partitioning, cost efficiency, and scalability.
Use CI/CD pipelines and Git-based version control for deployment and automation.
Collaborate with engineers, architects, and stakeholders to deliver cloud data solutions.

Core Technical SkillsPySpark & Data Engineering

5+ years of hands-on PySpark experience (P3).
Ability to write production-grade data engineering code.
Strong understanding of:
- ETL/ELT
- Data modelling
- Facts & dimensions
- Data marts
- Slowly Changing Dimensions (SCDs)

Spark Performance & Optimisation

Expertise in Spark execution, partitioning, performance tuning, and optimisation.
Troubleshooting distributed data pipelines at scale.

Python & Engineering Quality

Strong Python coding capability with emphasis on clean code and maintainability.
Experience applying engineering best practices including:
- Parameterisation
- Configuration management
- Structured logging
- Exception handling
- Modular design

SAS & Legacy Analytics (P1)

Foundational knowledge of SAS (Base SAS, Macros, DI Studio).
Ability to understand and interpret Legacy SAS code for migration.

Data Engineering & Testing

Understanding of end-to-end data flows, orchestration, pipelines, and CDC.
Experience executing ETL test cases and building unit/data comparison tests.

Engineering Practices

Proficient with Git workflows, branching, pull requests, and code reviews.
Ability to document technical decisions, data flows, and architecture.
Exposure to CI/CD tooling for data engineering pipelines.

AWS & Platform Skills (P3)

Strong hands-on experience with:
- S3
- EMR/Glue
- Glue Workflows
- Athena
- IAM
Understanding of distributed computing and big data processing on AWS.
Experience deploying and operating data pipelines in cloud environments.

Desirable Skills

Experience in banking or financial services environments.
Background in SAS modernisation or cloud migration programmes.
Familiarity with DevOps practices and infrastructure-as-code tools (Terraform, CloudFormation).
Experience working in Agile/Scrum delivery teams.

Loading

PySpark Developer