Jaygovind Sahu

Downingtown, PA 19335 · (609) 789-7641 · j@jaygovindsahu.com

I am a Data Engineer with more than 13 years of experience in data modeling / engineering / warehousing, and software development. Currently enabling business intelligence engineers and data scientists by building data pipelines using Python, AWS products, and other data engineering tools like DBT and Airflow.


Experience

Data Engineer

Amazon
  • Architected data pipeline framework for a project which could be reused to ingest data for multiple vendors - using Airflow DAG generators with DBT templating and environment variables. The reusability of this pipeline reduced development effort for the project by at least 60% - resulting in 50% shorter delivery time estimates.
  • Developed a data pipeline to ingest highly unstructured data from an external source, transform, and load it into Redshift - which enabled data scientists to query the data via SQL easily, and reduced the data preparation time for downstream machine learning models by around 40%.
Skills : Amazon Web Services (AWS) · Pandas (Software) · DBT · SQL · Python (Programming Language) · Apache Airflow
March 2022 - Present

Data Engineer

Vanguard
  • Implemented and maintained data pipelines using AWS EMR, Glue and other AWS products for big data analytics - which would enable successful functioning of around 15 Tableau dashboards used by financial / business analysts. These reports and dashboards were used for strategic decision making by leadership teams.
  • Developed a Python script which would package the code (mimicking the existing CI/CD pipeline), automatically spin up an AWS Glue job in a test AWS account from a local environment, and stream logs into the console. This saved the data engineering team, which used Glue extensively for data processing, around 40 hours per week in development effort at one point of time.
Skills : Apache Spark · Amazon Web Services (AWS) · Python (Programming Language) · BMC Control-M · SQL
April 2021 - March 2022

Data Engineer

Jornaya (a Verisk business)
  • Led development and enhancement efforts for a core product of the organization. The data pipeline for this product required successfully processing billions of records from clients and building useful reports for users - using around 70 EMR clusters, around 10 Glue jobs, and AWS Lambda functions for serverless architecture.
  • Optimized Apache Spark and AWS EMR configurations for a data pipeline processing data for around 50 vendors - which saved more than 55% in AWS bill in one year for the product, while also lowering the failure rate by almost 90% - resulting in faster and resilient delivery of results for customer success.
  • Designed and implemented a data pipeline to asynchronously call an external API endpoint, transform the response, and write datasets to Amazon S3 - using AWS SNS (for queuing), Lambda (for API calls with multi-processing, transformation) and Step Functions (for orchestration). This pipeline, which could process 1 billion records in around 25 minutes, helped deliver crucial reports to clients in time.
Skills : Terraform · Apache Spark · Amazon Web Services (AWS) · SQL · Python (Programming Language)
December 2019 - April 2021

Data Developer

Tata Consultancy Services Ltd.
  • Built data solutions for 2 major clients in banking and financial domain, contributed to more than 15 successful projects, and numerous ad-hoc analyses for business users. Developed data pipelines operating in different technologies and platforms - starting from legacy IBM Mainframes using COBOL, JCL and DB2 - to modern tech stacks using Python, Apache Spark and AWS products.
Skills : IBM Mainframe · DB2 · SQL · COBOL II · Scala · Amazon Web Services (AWS) · Python (Programming Language)
February 2010 - December 2019

Education

VSSUT, Burla, India

Bachelor of Technology
Electrical Engineering

CGPA: 7.5 / 10

August 2005 - May 2009

Skills

Programming Languages & Tools
Workflow
  • Programming with Python
  • Data transformation and warehousing using DBT (Data Build Tool)
  • Data pipeline orchestation using Apache Airflow
  • Data transformation and analytics using Apache Spark
  • Data exploration, transformation and analysis using SQL
  • Infrastructure-as-a-service using Amazon Web Services ( AWS )
  • Big Data analytics using AWS EMR and Glue
  • Data analytics and machine learning using R
  • Basic web development using HTML5 and CSS3

Interests

When I am not working as a Data Engineer, I like to spend time on photography and experimenting different techniques with my camera. Here is the link to my Adobe Stock contributor profile. As you may notice from my portfolio, flowers are my favorite subjects for photography. I also love to spend time in the lap of nature - mostly hiking.

If I am not outdoors hiking and clicking away with my camera, I like to watch comedy and action movies. I also like to learn new technologies, so I spend time in experimenting with different languages, frameworks and services.


Awards & Certifications

  • CSE6040x: SP21: Computing for Data Analysis - Georgia Institute of Technology (via edX)
  • PH526x: Using Python for Research via HarvardX (edX)
  • Python Project: pillow, tesseract, and opencv (via Coursera - University of Michigan)
  • Data Collection and Processing with Python (via Coursera - University of Michigan)
  • Applied Plotting, Charting & Data Representation in Python (via Coursera - University of Michigan)

Contact Me

Send a message or just say "hi"!

Loading...
Message sent. Thank you. :)
Something went wrong. Please try again after some time.
Please enter your name to continue.
Please enter a valid email address.
Please enter a message to continue.