10 Essential Open Source Tools for Modern Data Engineering

In the age of big data, open-source tools have become the backbone of modern data engineering. They help build robust, scalable, and flexible data platforms—without the vendor lock-in or high costs of proprietary solutions. In this blog post, I’ll walk you through 10 powerful open-source tools that every data engineer should know and ideally have in their stack. 1. Apache Airflow 🛠️ Category: Workflow Orchestration Apache Airflow is the industry standard for orchestrating complex data workflows using DAGs (Directed Acyclic Graphs). It lets you schedule and monitor data pipelines, and has a vibrant ecosystem with providers for AWS, GCP, Spark, and more. ...

June 15, 2025

PySpark vs Pandas: Choosing the Right Tool for Your Data Workflow

“The big-data world rewards the right abstraction at the right scale.” After more than a decade of wrestling with datasets that range from a few megabytes to multiple terabytes, I’m often asked “Should I use Pandas or PySpark?” Both libraries live in the Python ecosystem, both expose a DataFrame API, and—on the surface—they feel remarkably similar. Yet they’re built for very different contexts. This post breaks down the key differences so you can pick the right hammer for the job. ...

June 11, 2025

Charting My FinTech Journey — From Credit Risk to Asset Management

“The best way to predict the future is to create it.” — Peter Drucker Welcome to singhprashant.in! 🎉 After more than a decade immersed in the fast-evolving world of financial technology, I’m excited to open this space where I can share lessons learned, industry insights, and the projects that keep me curious. Quick Facts About Me 11 years in FinTech & Banking Analytics Credit risk analytics & scorecard development Branch banking business analytics across 70+ branches Credit analytics for retail & MSME portfolios Data engineering for regulatory and risk data pipelines Asset-management analytics in an investment-banking environment What Drives Me ...

June 9, 2025