Logo Animated3

Loading

Banner Default Image

PySpark Developer

Lead PySpark Engineer

As a Lead PySpark Engineer, you will design, develop, and optimise complex data processing solutions on AWS. You will work hands-on with PySpark, modernise Legacy data workflows, and support large-scale SAS-to-PySpark migration programmes. This role requires strong engineering discipline, deep data expertise, and the ability to deliver production-ready data pipelines within a financial services environment.

Skill Profile:

  • PySpark - P3 (Advanced)
  • AWS - P3 (Advanced)
  • SAS - P1 (Foundational)

Key Responsibilities Technical Delivery

  • Design, develop, and fix complex PySpark code for ETL/ELT and data-mart workloads.
  • Convert and refactor SAS code into PySpark using SAS2PY tooling and manual optimisation.
  • Build production-ready PySpark solutions that are scalable, maintainable, and reliable.
  • Modernise and stabilise Legacy data workflows into cloud-native architectures.
  • Ensure accuracy, quality, and reliability across data transformation processes.

Cloud & Data Engineering (AWS-Focused)

  • Build and deploy data pipelines using AWS services such as EMR, Glue, S3, Athena.
  • Optimise Spark workloads for performance, partitioning, cost efficiency, and scalability.
  • Use CI/CD pipelines and Git-based version control for deployment and automation.
  • Collaborate with engineers, architects, and stakeholders to deliver cloud data solutions.

Core Technical SkillsPySpark & Data Engineering

  • 5+ years of hands-on PySpark experience (P3).
  • Ability to write production-grade data engineering code.
  • Strong understanding of:
    • ETL/ELT
    • Data modelling
    • Facts & dimensions
    • Data marts
    • Slowly Changing Dimensions (SCDs)

Spark Performance & Optimisation

  • Expertise in Spark execution, partitioning, performance tuning, and optimisation.
  • Troubleshooting distributed data pipelines at scale.

Python & Engineering Quality

  • Strong Python coding capability with emphasis on clean code and maintainability.
  • Experience applying engineering best practices including:
    • Parameterisation
    • Configuration management
    • Structured logging
    • Exception handling
    • Modular design

SAS & Legacy Analytics (P1)

  • Foundational knowledge of SAS (Base SAS, Macros, DI Studio).
  • Ability to understand and interpret Legacy SAS code for migration.

Data Engineering & Testing

  • Understanding of end-to-end data flows, orchestration, pipelines, and CDC.
  • Experience executing ETL test cases and building unit/data comparison tests.

Engineering Practices

  • Proficient with Git workflows, branching, pull requests, and code reviews.
  • Ability to document technical decisions, data flows, and architecture.
  • Exposure to CI/CD tooling for data engineering pipelines.

AWS & Platform Skills (P3)

  • Strong hands-on experience with:
    • S3
    • EMR/Glue
    • Glue Workflows
    • Athena
    • IAM
  • Understanding of distributed computing and big data processing on AWS.
  • Experience deploying and operating data pipelines in cloud environments.

Desirable Skills

  • Experience in banking or financial services environments.
  • Background in SAS modernisation or cloud migration programmes.
  • Familiarity with DevOps practices and infrastructure-as-code tools (Terraform, CloudFormation).
  • Experience working in Agile/Scrum delivery teams.