With over 30 years of digital design, development, and delivery under our belts, if you’ve got a digital challenge, we’ll work with you to get game-changing results.
Home · Insights · Using AWS Glue to query AWS CloudWatch Logs
Date posted
6 July 2020
Reading time
21 Minutes
Alyas Gul
Using AWS Glue to query AWS CloudWatch Logs
Recently I was asked to provide a quick, efficient and streamlined way of querying AWS CloudWatch Logs via the AWS Console. These logs were already being streamed to an AWS S3 bucket, and so I initially thought of simply interrogating the logs via AWS Insights. However, upon further investigation, I quickly saw some drawbacks to this option:
AWS Insights has an output row limit of 10,000. This would cause problems in my case, as I was expecting certain reports to generate over 65,000 rows.
The raw logs contained JSON fields, which would necessitate overly complicated queries to generate useful output.
My solution should use standard SQL.
In the end, AWS Glue was chosen as a valid way to tackle the problem. Below is a step by step guide on the process.
What is AWS Glue?
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (table definition and schema) in the AWS Glue Data Catalogue. Once catalogued, your data is immediately searchable, can be queried, and available for ETL.
For more information specifically on AWS Glue, click here.
Pre-requisites
A summary of the upcoming steps is listed below:
Create a database
Using an AWS Crawler, generate a table to store the raw JSON data
Define an ETL script, this will be used to re-structure the raw data
Using a second AWS Crawler, create a Parquet formatted table
The Parquet formatted table is now ready to be queried via AWS Athena
As part of the process, 2 S3 buckets were required. These are detailed here:
An S3 bucket where the transform script and Parquet table are stored = s3://<glue-data-my-bucket>/
A temporary location for AWS Glue config = s3://<glue-temp-my-bucket>/temp