AWS Certified Big Data Specialty

Amazon Web Services certifications are a few of the most reputed in the field of Software Engineering. I successfully completed the AWS Big Data Speciality certification on Nov 25, 2019. This certification tests the candidate on two of the most wanted skills right now – Cloud and Big Data technologies. Prior to taking this certification, I had completed 3 AWS Cloud Certifications (Solution Architect Associate, Developer Associate, and SysOps Associate) and 1 Cloudera Big Data Certification (CCA Spark and Hadoop Developer). AWS Big Data Specialty Certification was the logical next step as it marries the worlds of Cloud and Big Data technologies. In this blog post, I share my preparation process and everything you need to know about this certification.
Background
I have been working with AWS and Big Data technologies for a few years now. I have a working knowledge of AWS services like EMR, EC2, S3, Lambda, DynamoDB, Redshift, QuickSight, and Big Data technologies such as Hadoop, Spark, Hive, Hue, and HBase. Furthermore, I’m not new to preparing and taking examinations. I have a huge interest in Cloud and Big Data technologies. I wanted to expand my knowledge of Big Data Services on AWS. My employer, CME Group, has been very supportive of taking AWS certifications and provides me with subscriptions to education websites such as Oreilly, Lynda, Qwiklabs, and ACloudGuru. Most importantly, I love challenging myself to learn new things with clear end goals such as completing certification. For these reasons, I decided to take the AWS Big Data Speciality certification.
Exam Experience
The exam was 3 hours/180 minutes long and the registration price was $300. There were 65 questions, most of which were lengthy and scenario-based. Many of the questions had single-choice answers while few of them had more multiple-choice answers. The questions described a technical need and each of the choices in the answer offered solutions. Typically, 2 out of the 4 choices could be eliminated as they are distractors or make no sense if you’re familiar with those services. The other 2 choices would be possible answers to the question and we have to choose the best one among them. I had to re-read the questions to catch the keywords such as cost-effective, performance, minimal maintenance, latency, and throughput to choose the best solution among all the possible solutions. It took me 150 minutes to answer all the questions. I had flagged a few of the questions that I was not fully confident. I used the remaining 30 minutes to review the flagged ones.
As a bonus, I took the exam at the College of Dupage (COD), my alma mater. This was the first time I visited COD in 7 years as I hadn’t gone back since I transferred to the University of Illinois at Chicago (UIC). I was overwhelmed with nostalgic memories. I even checked out the Computer lab where I had my first programming course.

Preparation
It’s hard to quantify the time I took to prepare for the exam. I already had working experience with some of the services and familiarity with most other services. I started my preparation by taking different courses on the AWS Big Data Speciality, getting hands-on experience with the big data services, watching official AWS videos, reading the FAQs/Whitepapers, and taking the practice exams.
The starting point was the official webpage of the AWS Certified Big Data Speciality. I kept coming back to this page to look at the Exam Blueprint for all the topics covered in the exam.
Tutorials
For any certification exams, I don’t rely on just one source of learning. I took three different courses on AWS Big Data Specialty.
- Stephane Maarek’s and Frank Kane’s course on Udemy
- Fernando Medina Corey’s course on Linux Academy
- Sanjay Kotecha’s course on A Cloud Guru
I found all three courses very valuable in my preparation process. These courses have different teaching styles and teach the same concepts with different perspectives. While most of the contents of these 3 courses overlap, they help in reinforcing the core concepts covered in the certification exam. Also, each of these courses includes some exclusive information and tips.
Hands-on Practice
My goal has always been to learn not just the theory but also get practical hands-on experience with the services. So, I played around with all of the services covered in the certification. I strongly believe having hands-on experience helps pass the exam.
- Completed Qwiklabs‘ labs related to Big Data services such as Kinesis, Redshift, DynamoDB, EMR
- Completed Linux Academy‘s hands-on labs on their sandbox environment
- Followed A Cloud Guru‘s labs on personal AWS account
- Followed Frank Kane‘s labs on personal AWS account
AWS Study Materials
AWS provides a ton of learning materials on all their study materials. The below materials are a must to learn the architecture, best practices, and most importantly, the intricate details of each service.
- AWS YouTube videos on Big Data services. Especially the deep dive videos from re:invent.
- FAQs on AWS Big Data services
- AWS Whitepapers related to Big Data services
Practice Exams
I spent the last week before the actual examination date on taking practice exams to gauge my confidence. I retook them until I got 100%. The questions in the official sample questions and exam readiness questions had similar patterns and difficulty levels as those in the actual exam.
- Official AWS Big Data Specialty sample questions
- Official AWS Big Data Specialty exam readiness
- Linux Academy practice exam
- A Cloud Guru quizzes
Topics
Below are all the AWS services and concepts one should be familiar with to pass this certification exam. This list is just for reference and it may not be comprehensive. I will keep updating the list. Feel free to post comments about any of the topics that are missing below.
Amazon Kinesis
- Architecture & Terminology
- Kinesis Data Streams
- Kinesis Data Firehose
- Kinesis Data Analytics
- Kinesis Video Streams
- APIs
- Kinesis Agents
- Kinesis Producer Library & Kinesis Consumer Library
- Scaling
- Windowing and continuous functions
- Best practices
- Integration with other services
- Limitations
Amazon DynamoDB
- Architecture & Terminology
- Table Design
- Creating Table and configurations
- On-demand provisioning & auto-scaling
- Write Capacity Units (WCU) and read Capacity Units
- Burst capacity
- Adaptive capacity
- Read consistencies
- Partitions
- Streams
- Replication
- Errors/Exceptions
- DynamoDB Accelerator (DAX)
- Local Secondary Indexes
- Global Secondary Indexes
- Best practices
- Integration with other services
- Limitations
- Encryption & Security
Amazon Redshift
- Architecture & Terminology
- Table design
- Distribution key
- Sort key
- Copy
- Unload
- Instance types
- Scaling
- Workload management
- Compression
- Snapshots
- Best practices
- Access Control
- Integration with other services
- Limitations
- Encryption & Security
Amazon S3
- Different tiers
- Version control
- Bucket policies
- Access Control List
- Lifecycle Policies
- Vault Lock
- Encryption & Security
Amazon QuickSight
- Visuals
- Data preparation
- Data refresh
- Dashboards
- Story
- Row-level security
- Data sources
- File formats
- Third-party APIs
Amazon Elasticsearch Service
- Architecture & Terminology
- Shards
- Access Policies
- Integration with other services
- Limitations
- Encryption & Security
Amazon EC2
- Instance Types
- AMIs
- Encryption & Security
- Access
Amazon EMR
- Creation and configuration
- Install and setup applications/tools
- Node types
- Instance types
- HDFS
- Local storage
- Instance store
- EMRFS
- Consistent view
- Encryption & Security
Amazon IoT Core
- IoT Device SDK
- Device Gateway
- Message Broker
- Authentication and Authorization
- Registry
- Device Shadow
- Rules Engine
- Alexa Voice Service (AVS) Integration
Amazon Machine Learning
- ML Models
- Data sources
- Limitations
Amazon Athena
- Data sources
- Data format
- Compression
Amazon SQS
Amazon SageMaker
AWS Glue
- Data Catalog
- Crawlers
- Scheduling
- Use cases
AWS Database Migration Service
AWS Schema Conversion Tool
AWS Migration Hub
AWS Data Pipeline
- Datanodes
- Activities
- Preconditions
- Schedules
AWS Snowball
AWS Storage Gateway
- Tape Gateway
- File Gateway
- Volume Gateway
AWS Direct Connect
Amazon Virtual Private Cloud
AWS Key Management Service (KMS)
AWS CloudHSM
AWS Identity and Access Management (IAM)
Visualization Tools
- D3.JS
- Chart.js
- Highcharts.js
Big Data Technologies
- Apache Hadoop
- Apache Spark
- Apache Hive
- Apache HBase
- Apache Presto
- Apache Phoenix
- Apache Sqoop
- Hue
Data Analysis/Data Exploration
- Jupyter Notebook
- Apache Zeppelin
Business Intelligence Tools
MicroStrategy
Final Words
AWS Big Data Specialty Certification, being a specialty exam, is quite challenging. It requires knowledge of a vast number of big data services provided on AWS. I strongly recommend getting hands-on experience with the core big data services such as EMR, S3, Kinesis, Redshift, DynamoDB, and Quicksight. If you follow the preparation guide and are familiar with all the topics listed, you should be in a good place to take the exam. I am obliged to add a disclaimer that I don’t guarantee the successful completion of the certification by following this guide. Each person is different and you’d have to improvise this guide to fit your needs.
Good luck with your certification journey. Feel free to ask any questions or provide your thoughts in the comment section.
If you’re into taking certifications or interested in Big Data technology, check out my blog post on Cloudera CCA Spark and Hadoop Developer (CCA175) Certification.
3,744 Comments
Thanks for the great insight. It’s really useful and congratulations on your achievement!! You totally deserve it.
Thank you, Mukesh! Really appreciate it.
Great post! Going to be sitting this myself in a few weeks! Thanks for the tips!
Thank you, John. Hope you found it useful. Good luck with your exam!
Just hats off to you for the wonderful explanation god bless you !!!
Thank you, Siddu. Hope you found the post useful!
Thank you very much. Everything has been very well laid out. Congratulations! Indeed you put in the work.
Thank you, Felix. Hope you found the post useful!
Hey Ashwin,
Nice content. Hope you’ll do videos on some of the core data services and technologies, if you’ll get some time. I like you keytar videos too.
Cheers
Hi Roy, Thank you very much for the feedback. Will definitely make videos on cloud and big data technologies soon. Best wishes!
Very good! Thanks for share!
Thank you. I hope you found it useful!
Thank you for your posting about aws blogs. It is useful to freshers who start to get knowledge about the aws. I got basic idea with the help of your details..
I’m glad you found it useful. More to come!
Hi,
I am new to AWS. I have experience with Spark, Impala, Hive, Map Reduce, Hadoop etc. Any suggestion/tips. I am planning to familiarize myself, study for Solution Architect. Let me know.
Great! What an amazing journey you had passing the exam. I’m also looking forward to get this certification this year.