Category: AWS

  • Caching in AWS Glue Spark: Boosting Performance with Efficient Data Reuse

    As a developer working with AWS Glue and Apache Spark, one of the most powerful tools in your performance optimization toolkit is caching. Caching can significantly reduce computation time and resource usage, especially in complex ETL (Extract, Transform, Load) pipelines where the same data is reused across multiple operations. In this blog, I’ll explain how…

  • Speeding Up Date-Based Queries in Amazon Athena: Simple Partition Tips

    Amazon(AWS) Athena is a handy tool for digging into data stored in Amazon S3. When your data is split into partitions (like folders), writing smart queries can save time and money. This blog explains how to filter date-based partitions effectively, showing what works best for speed and efficiency. What’s Partition Pruning? Partition pruning is like…