My Path To AWS Certified Big Data Specialty

Amazon Web Services certifications are few of the most reputed in the field of Software Engineering. I successfully completed the AWS Big Data Speciality certification on Nov 25, 2019. This certification tests the candidate on two of the most wanted skills right now – Cloud and Big Data technologies. Prior to taking this certification, I had completed 3 AWS Cloud Certifications (Solution Architect Associate, Developer Associate and SysOps Associate) and 1 Cloudera Big Data Certification (CCA Spark and Hadoop Developer). AWS Big Data Specialty Certification was the logical next step as it marries the worlds of Cloud and Big Data technologies. In this blog post, I share my preparation process and everything you need to know about this certification.

AWS Certified Big Data Specialty

Background

I have been working with AWS and Big Data technologies for few years now. I had working knowledge with AWS services like EMR, EC2, S3, Lambda, DynamoDB, Redshift, QuickSight, and Big Data technologies such as Hadoop, Spark, Hive, Hue, HBase. Furthermore, I’m not new to preparing and taking examinations. I have huge interests in Cloud and Big Data technologies. I wanted to expand my knowledge on Big Data Services on AWS. My employer, CME Group, has been very supportive of taking AWS certifications and provides me with subscriptions to education websites such as Oreilly, Lynda, Qwiklabs and ACloudGuru. Most importantly, I love challenging myself to learn new things with clear end goals such as completing certification. For these reasons, I decided to take the AWS Big Data Speciality certification.

Exam Experience

The exam was 3 hours/180 minutes long and the registration price was $300. There were 65 questions, most of which were lengthy and scenario based. Many of the questions had single choice answer while few of them had more multiple choice answers. The questions described a technical need and each of the choices in the answer offer solutions. Typically, 2 out of the 4 choices could be eliminated as they are distractors or make no sense if you’re familiar with those services. The other 2 choices would be possible answers to the question and we have to choose the best one among them. I had to re-read the questions to catch the key words such as cost-effective, performance, minimal maintenance, latency, throughput to choose the best solution among all the possible solutions. It took me 150 minutes to answer all the questions. I had flagged few of the questions that I was not fully confident. I used the remaining 30 minutes to review the flagged ones.

As a bonus, I took the exam at College of Dupage (COD), my alma mater. This was the first time I visited COD in 7 years as I hadn’t gone back since I transferred to University of Illinois at Chicago (UIC). I was overwhelmed with nostalgic memories. I even checked out the Computer lab where I had my first programming course.

The Computer Lab where I had my first programming course

Preparation

It’s hard to quantify the time I took in preparing for the exam. I already had working experience with some of the services and familiarity with most other services. I started my preparation by taking different courses on the AWS Big Data Speciality, getting hands-on experience with the big data services, watching official AWS videos, reading the FAQs/Whitepapers and taking the practice exams.

The starting point was the official webpage of the AWS Certified Big Data Speciality. I kept coming back to this page to look at the Exam Blue Print for all the topics covered in the exam.

Tutorials

For any certification exams, I don’t rely on just one source of learning. I took three different courses on AWS Big Data Specialty.

I found all these three courses very valuable in my preparation process. These courses have different teaching styles and teach the same concepts with different perspectives. While most of the contents of these 3 courses are overlapping, they help in reinforcing the core concepts covered the certification exam. Also, each of these courses include some exclusive information and tips.

Hands-on Practice

My goal has always been to learn not just the theory but also get practical hands-on experience with the services. So, I played around with all of the services covered in the certification. I strongly believe having hands-on experience helps passing the exam.

  • Completed Qwiklabs‘ labs related to Big Data services such as Kinesis, Redshift, DynamoDB, EMR
  • Completed Linux Academy‘s hands-on labs on their sandbox environment
  • Followed A Cloud Guru‘s labs on personal AWS account
  • Followed Frank Kane‘s labs on personal AWS account

AWS Study Materials

AWS provides a ton of learning materials on all their study materials. The below materials are a must to learn the architecture, best practices, and most importantly, the intricate details of each service.

  • AWS YouTube videos on Big Data services. Especially the deep dive videos from re:invent.
  • FAQs on AWS Big Data services
  • AWS Whitepapers related to Big Data services

Practice Exams

I spent my the last week before the actual examination date on taking practice exams to gauge my confidence. I retook them until I got 100%. The questions in the official sample questions and exam readiness questions had similar pattern and difficulty level as those in the actual exam.

Topics

Below are all the AWS services and concepts one should be familiar with to pass this certification exam. This list is just for reference and it may not be comprehensive. I will keep updating the list. Feel free to post comments about any of the topics that’s missing below.

  • Amazon Kinesis
    • Architecture & Terminology
    • Kinesis Data Streams
    • Kinesis Data Firehose
    • Kinesis Data Analytics
    • Kinesis Video Streams
    • APIs
    • Kinesis Agents
    • Kinesis Producer Library & Kinesis Consumer Library
    • Scaling
    • Windowing and continuous functions
    • Best practices
    • Integration with other services
    • Limitations
  • Amazon DynamoDB
    • Architecture & Terminology
    • Table design
    • Creating Table and configurations
    • On-demand provisioning & auto-scaling
    • Write Capacity Units (WCU) & Read Capacity Units
    • Burst capacity
    • Adaptive capacity
    • Read consistencies
    • Partitions
    • Streams
    • Replication
    • Erros/Exceptions
    • DynamoDB Accelerator (DAX)
    • Local Secondary Indexes
    • Global Secondary Indexes
    • Best practices
    • Integration with other services
    • Limitations
    • Encryption & Security
  • Amazon Redshift
    • Architecture & Terminology
    • Table design
    • Distribution key
    • Sort key
    • Copy
    • Unload
    • Instance types
    • Scaling
    • Workload management
    • Compression
    • Snapshots
    • Best practices
    • Access Control
    • Integration with other services
    • Limitations
    • Encryption & Security
  • Amazon S3
    • Different tiers
    • Version control
    • Bucket policies
    • Access Control List
    • Lifecycle Policies
    • Vault Lock
    • Encryption & Security
  • Amazon QuickSight
    • Visuals
    • Data preparation
    • Data refresh
    • Dashboards
    • Story
    • Row level security
    • Data sources
    • File formats
    • Third parts APIs
  • Amazon Elasticsearch Service
    • Architecture & Terminology
    • Shards
    • Access Policies
    • Integration with other services
    • Limitations
    • Encryption & Security
  • Amazon EC2
    • Instance Types
    • AMIs
    • Encryption & Security
    • Access
  • Amazon EMR
    • Creation and configuration
    • Install and setup applications/tools
    • Node types
    • Instance types
    • HDFS
    • Local storage
    • Instance store
    • EMRFS
    • Consistent view
    • Encryption & Security
  • Amazon IoT Core
    • IoT Device SDK
    • Device Gateway
    • Message Broker
    • Authentication and Authorization
    • Registry
    • Device Shadow
    • Rules Engine
    • Alexa Voice Service (AVS) Integration
  • Amazon Machine Learning
    • ML Models
    • Data sources
    • Limitations
  • Amazon SageMaker
  • Amazon SQS
  • Amazon Athena
    • Data sources
    • Data format
    • Compression
  • AWS Glue
    • Data Catalog
    • Crawlers
    • Scheduling
    • Use cases
  • AWS Database Migration Service
  • AWS Schema Conversion Tool
  • AWS Migration Hub
  • AWS Data Pipeline
    • Datanodes
    • Activities
    • Preconditions
    • Schedules
  • AWS Snowball
  • AWS Storage Gateway
    • Tape Gateway
    • File Gateway
    • Volume Gateway
  • AWS Direct Connect
  • Amazon Virtual Private Cloud
  • AWS Key Management Service (KMS)
  • AWS CloudHSM
  • AWS Identity and Access Management (IAM)
  • Visualization Tools
    • D3.JS
    • Chart.js
    • Highcharts.js
  • Big Data Technologies
    • Apache Hadoop
    • Apache Spark
    • Apache Hive
    • Apache HBase
    • Apache Presto
    • Apache Phoenix
    • Apache Sqoop
    • Hue
  • Data Analysis/Data Exploration
    • Jupyter Notebook
    • Apache Zeppelin
  • Business Intelligence Tools
    • MicroStrategy

Final Words

AWS Big Data Specialty Certification, being a specialty exam, is quite challenging. It requires knowledge on vast number of big data services provided on AWS. I strongly recommend getting hands-on experience with the core big data services such as EMR, S3, Kinesis, Redshift, DynamoDB, Quicksight. If you follow the preparation guide and are familiar with all the topics listed, you should be in a good place to take the exam. I am obliged to add a disclaimer that I don’t guarantee successful completion of the certification by following this guide. Each person is different and you’d have to improvise this guide to fit your needs.

Good luck with your certification jounrey. Feel free to ask any questions or provide your thoughts on the comment section.

If you’re into taking certifications or interested in Big Data technology, checkout my blog post on Cloudera CCA Spark and Hadoop Developer (CCA175) Certification.

8 Comments

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top