AWS Certified Big Data Specialty

AWS Certified Big Data Specialty
AWS Certified Big Data Specialty

Amazon Web Services certifications are a few of the most reputed in the field of Software Engineering. I successfully completed the AWS Big Data Speciality certification on Nov 25, 2019. This certification tests the candidate on two of the most wanted skills right now – Cloud and Big Data technologies. Prior to taking this certification, I had completed 3 AWS Cloud Certifications (Solution Architect Associate, Developer Associate, and SysOps Associate) and 1 Cloudera Big Data Certification (CCA Spark and Hadoop Developer). AWS Big Data Specialty Certification was the logical next step as it marries the worlds of Cloud and Big Data technologies. In this blog post, I share my preparation process and everything you need to know about this certification.


I have been working with AWS and Big Data technologies for a few years now. I have a working knowledge of AWS services like EMR, EC2, S3, Lambda, DynamoDB, Redshift, QuickSight, and Big Data technologies such as Hadoop, Spark, Hive, Hue, and HBase. Furthermore, I’m not new to preparing and taking examinations. I have a huge interest in Cloud and Big Data technologies. I wanted to expand my knowledge of Big Data Services on AWS. My employer, CME Group, has been very supportive of taking AWS certifications and provides me with subscriptions to education websites such as Oreilly, Lynda, Qwiklabs, and ACloudGuru. Most importantly, I love challenging myself to learn new things with clear end goals such as completing certification. For these reasons, I decided to take the AWS Big Data Speciality certification.

Exam Experience

The exam was 3 hours/180 minutes long and the registration price was $300. There were 65 questions, most of which were lengthy and scenario-based. Many of the questions had single-choice answers while few of them had more multiple-choice answers. The questions described a technical need and each of the choices in the answer offered solutions. Typically, 2 out of the 4 choices could be eliminated as they are distractors or make no sense if you’re familiar with those services. The other 2 choices would be possible answers to the question and we have to choose the best one among them. I had to re-read the questions to catch the keywords such as cost-effective, performance, minimal maintenance, latency, and throughput to choose the best solution among all the possible solutions. It took me 150 minutes to answer all the questions. I had flagged a few of the questions that I was not fully confident. I used the remaining 30 minutes to review the flagged ones.

As a bonus, I took the exam at the College of Dupage (COD), my alma mater. This was the first time I visited COD in 7 years as I hadn’t gone back since I transferred to the University of Illinois at Chicago (UIC). I was overwhelmed with nostalgic memories. I even checked out the Computer lab where I had my first programming course.

The Computer Lab where I had my first programming course


It’s hard to quantify the time I took to prepare for the exam. I already had working experience with some of the services and familiarity with most other services. I started my preparation by taking different courses on the AWS Big Data Speciality, getting hands-on experience with the big data services, watching official AWS videos, reading the FAQs/Whitepapers, and taking the practice exams.

The starting point was the official webpage of the AWS Certified Big Data Speciality. I kept coming back to this page to look at the Exam Blueprint for all the topics covered in the exam.


For any certification exams, I don’t rely on just one source of learning. I took three different courses on AWS Big Data Specialty.

I found all three courses very valuable in my preparation process. These courses have different teaching styles and teach the same concepts with different perspectives. While most of the contents of these 3 courses overlap, they help in reinforcing the core concepts covered in the certification exam. Also, each of these courses includes some exclusive information and tips.

Hands-on Practice

My goal has always been to learn not just the theory but also get practical hands-on experience with the services. So, I played around with all of the services covered in the certification. I strongly believe having hands-on experience helps pass the exam.

  • Completed Qwiklabs‘ labs related to Big Data services such as Kinesis, Redshift, DynamoDB, EMR
  • Completed Linux Academy‘s hands-on labs on their sandbox environment
  • Followed A Cloud Guru‘s labs on personal AWS account
  • Followed Frank Kane‘s labs on personal AWS account

AWS Study Materials

AWS provides a ton of learning materials on all their study materials. The below materials are a must to learn the architecture, best practices, and most importantly, the intricate details of each service.

  • AWS YouTube videos on Big Data services. Especially the deep dive videos from re:invent.
  • FAQs on AWS Big Data services
  • AWS Whitepapers related to Big Data services

Practice Exams

I spent the last week before the actual examination date on taking practice exams to gauge my confidence. I retook them until I got 100%. The questions in the official sample questions and exam readiness questions had similar patterns and difficulty levels as those in the actual exam.


Below are all the AWS services and concepts one should be familiar with to pass this certification exam. This list is just for reference and it may not be comprehensive. I will keep updating the list. Feel free to post comments about any of the topics that are missing below.

  • Architecture & Terminology
  • Kinesis Data Streams
  • Kinesis Data Firehose
  • Kinesis Data Analytics
  • Kinesis Video Streams
  • APIs
  • Kinesis Agents
  • Kinesis Producer Library & Kinesis Consumer Library
  • Scaling
  • Windowing and continuous functions
  • Best practices
  • Integration with other services
  • Limitations
  • Architecture & Terminology
  • Table Design
  • Creating Table and configurations
  • On-demand provisioning & auto-scaling
  • Write Capacity Units (WCU) and read Capacity Units
  • Burst capacity
  • Adaptive capacity
  • Read consistencies
  • Partitions
  • Streams
  • Replication
  • Errors/Exceptions
  • DynamoDB Accelerator (DAX)
  • Local Secondary Indexes
  • Global Secondary Indexes
  • Best practices
  • Integration with other services
  • Limitations
  • Encryption & Security
  • Architecture & Terminology
  • Table design
  • Distribution key
  • Sort key
  • Copy
  • Unload
  • Instance types
  • Scaling
  • Workload management
  • Compression
  • Snapshots
  • Best practices
  • Access Control
  • Integration with other services
  • Limitations
  • Encryption & Security
  • Different tiers
  • Version control
  • Bucket policies
  • Access Control List
  • Lifecycle Policies
  • Vault Lock
  • Encryption & Security
  • Visuals
  • Data preparation
  • Data refresh
  • Dashboards
  • Story
  • Row-level security
  • Data sources
  • File formats
  • Third-party APIs
  • Architecture & Terminology
  • Shards
  • Access Policies
  • Integration with other services
  • Limitations
  • Encryption & Security
  • Instance Types
  • AMIs
  • Encryption & Security
  • Access
  • Creation and configuration
  • Install and setup applications/tools
  • Node types
  • Instance types
  • HDFS
  • Local storage
  • Instance store
  • Consistent view
  • Encryption & Security
  • IoT Device SDK
  • Device Gateway
  • Message Broker
  • Authentication and Authorization
  • Registry
  • Device Shadow
  • Rules Engine
  • Alexa Voice Service (AVS) Integration
  • ML Models
  • Data sources
  • Limitations
  • Data sources
  • Data format
  • Compression
  • Data Catalog
  • Crawlers
  • Scheduling
  • Use cases
  • Datanodes
  • Activities
  • Preconditions
  • Schedules
  • Tape Gateway
  • File Gateway
  • Volume Gateway
  • D3.JS
  • Chart.js
  • Highcharts.js
  • Apache Hadoop
  • Apache Spark
  • Apache Hive
  • Apache HBase
  • Apache Presto
  • Apache Phoenix
  • Apache Sqoop
  • Hue
  • Jupyter Notebook
  • Apache Zeppelin


Final Words

AWS Big Data Specialty Certification, being a specialty exam, is quite challenging. It requires knowledge of a vast number of big data services provided on AWS. I strongly recommend getting hands-on experience with the core big data services such as EMR, S3, Kinesis, Redshift, DynamoDB, and Quicksight. If you follow the preparation guide and are familiar with all the topics listed, you should be in a good place to take the exam. I am obliged to add a disclaimer that I don’t guarantee the successful completion of the certification by following this guide. Each person is different and you’d have to improvise this guide to fit your needs.

Good luck with your certification journey. Feel free to ask any questions or provide your thoughts in the comment section.

If you’re into taking certifications or interested in Big Data technology, check out my blog post on Cloudera CCA Spark and Hadoop Developer (CCA175) Certification.


Leave a Reply

Your email address will not be published. Required fields are marked *