Built and optimized ETL pipelines using PySpark, Apache Spark, and Hive to efficiently process large datasets, significantly improving data processing speed and reliability.
Designed and implemented scalable, cost-effective data pipelines on Google Cloud Platform (GCP) leveraging BigQuery, Dataproc, Cloud Storage, and Cloud Composer.
Developed real-time and batch processing solutions within the Hadoop ecosystem, integrating structured and semi-structured data sources for unified analytics.
Created interactive dashboards and reports using Looker Studio to visualize key performance indicators (KPIs) and deliver actionable business insights to stakeholders.
Collaborated effectively in an agile environment, building end-to-end data engineering solutions while adhering to best practices for data quality, transformation, and performance optimization.