-
Location
Remote
-
Sector:
-
Job type:
-
Salary/Rate:
400 - 449.64
-
Contact:
Amy Hughes
-
Contact email:
ahughes@skillfindergroup.com
-
Job ref:
19694USER_75
-
Consultant:
Amy Hughes
Lead PySpark Engineer
As a Lead PySpark Engineer, you will design, develop, and optimise complex data processing solutions on AWS. You will work hands-on with PySpark, modernise Legacy data workflows, and support large-scale SAS-to-PySpark migration programmes. This role requires strong engineering discipline, deep data expertise, and the ability to deliver production-ready data pipelines within a financial services environment.
Skill Profile:
- PySpark - P3 (Advanced)
- AWS - P3 (Advanced)
- SAS - P1 (Foundational)
Key Responsibilities Technical Delivery
- Design, develop, and fix complex PySpark code for ETL/ELT and data-mart workloads.
- Convert and refactor SAS code into PySpark using SAS2PY tooling and manual optimisation.
- Build production-ready PySpark solutions that are scalable, maintainable, and reliable.
- Modernise and stabilise Legacy data workflows into cloud-native architectures.
- Ensure accuracy, quality, and reliability across data transformation processes.
Cloud & Data Engineering (AWS-Focused)
- Build and deploy data pipelines using AWS services such as EMR, Glue, S3, Athena.
- Optimise Spark workloads for performance, partitioning, cost efficiency, and scalability.
- Use CI/CD pipelines and Git-based version control for deployment and automation.
- Collaborate with engineers, architects, and stakeholders to deliver cloud data solutions.
Core Technical SkillsPySpark & Data Engineering
- 5+ years of hands-on PySpark experience (P3).
- Ability to write production-grade data engineering code.
- Strong understanding of:
- ETL/ELT
- Data modelling
- Facts & dimensions
- Data marts
- Slowly Changing Dimensions (SCDs)
Spark Performance & Optimisation
- Expertise in Spark execution, partitioning, performance tuning, and optimisation.
- Troubleshooting distributed data pipelines at scale.
Python & Engineering Quality
- Strong Python coding capability with emphasis on clean code and maintainability.
- Experience applying engineering best practices including:
- Parameterisation
- Configuration management
- Structured logging
- Exception handling
- Modular design
SAS & Legacy Analytics (P1)
- Foundational knowledge of SAS (Base SAS, Macros, DI Studio).
- Ability to understand and interpret Legacy SAS code for migration.
Data Engineering & Testing
- Understanding of end-to-end data flows, orchestration, pipelines, and CDC.
- Experience executing ETL test cases and building unit/data comparison tests.
Engineering Practices
- Proficient with Git workflows, branching, pull requests, and code reviews.
- Ability to document technical decisions, data flows, and architecture.
- Exposure to CI/CD tooling for data engineering pipelines.
AWS & Platform Skills (P3)
- Strong hands-on experience with:
- S3
- EMR/Glue
- Glue Workflows
- Athena
- IAM
- Understanding of distributed computing and big data processing on AWS.
- Experience deploying and operating data pipelines in cloud environments.
Desirable Skills
- Experience in banking or financial services environments.
- Background in SAS modernisation or cloud migration programmes.
- Familiarity with DevOps practices and infrastructure-as-code tools (Terraform, CloudFormation).
- Experience working in Agile/Scrum delivery teams.
