Cloudera CCA Spark and Hadoop Developer (CCA175) Certification – Preparation Guide
Cloudera’s CCA Spark and Hadoop Developer (CCA175) exam validates the candidate’s ability to employ various Big Data tools such as Hadoop, Spark, Hive, Impala, Sqoop, Flume, Kafka, etc to solve hands-on problems. I passed CCA175 certification exam on May 13, 2019 and wanted to share my experience. This article has everything you should know about CCA175 exam.
CCA175 exam has a time limit of 2 hours to solve 8-12 hands-on tasks on Cloudera Enterprise cluster. Each task has to be solved using Big Data tools such as Hadoop, Spark, Hive, Sqoop, Flume, Kafka, etc. The passing score is 70% and the exam costs USD $295. There are no prerequisites for this exam. The exam can be taken from your remote location. All you need is a computer with a webcam and good internet connection.
Official certification page – https://www.cloudera.com/about/training/certification/cca-spark.html
Register for CCA175 – https://university.cloudera.com/content/cca175
Why Cloudera CCA175?
CCA175 requires good knowledge and hands-on experience with technologies such as Hadoop, HDFS, Spark, Scala, PySpark, Sqoop, Hive, Flume, Kafka and Avro. I enjoy setting goals and working towards them. I wanted to force myself to properly learn and practice these technologies. I tend to look through the User Guide and Documentation only when I face issues during my development. By preparing for a certification exam, I’m forced to learn the topics formally and read through the Documentation pages within the tight deadline. Unlike other Spark certification exams, CCA175 tests not just on Spark but also on other Big Data technologies. Furthermore, certifications help with showcasing that you posses the required knowledge in the domain.
One should be familiar with the following technologies to pass the exam.
- Apache Hadoop – Hadoop, HDFS, Yarn
- Apache Spark – Spark RDD, Spark Datasets, Spark SQL, Spark Streaming using both Scala and Python
- Apache Sqoop – Import, import-all-tables, export, job, eval, list-tables, list-databases, create-hive-table, merge, codegen
- Apache Hive – DDL, DML, Partitioning, Windowing and Analytical functions
- Cloudera Impala
- Apache Avro
- Apache Flume
- Apache Kafka
I believe the main objective should be to learn the topics thoroughly instead of learning the bare minimum to pass the certification. I strongly recommend to practice all the contents from the below URLs. This plan not only helps you complete the certification but also makes you proficient in these technologies.
- Cloudera Quickstart VM
- Install Cloudera Quickstart VM and get familiar- https://www.cloudera.com/downloads/quickstart_vms/5-13.html
- Itversity’s Courses
- Take the following courses if you don’t have any prior experience with Spark, Sqoop and Hive.
- CCA175 course with Scala – https://www.udemy.com/cca-175-spark-and-hadoop-developer-certification-scala/
- CCA175 course with Python – https://www.udemy.com/cca-175-spark-and-hadoop-developer-python-pyspark/
- Official Programming Guides
- Read and practice the official getting started and programming guides
- Apache Hadoop
- HDFS Architecture – http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
- File System Shell – http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
- YARN – http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
- Apache Hive
- Language Manual – https://cwiki.apache.org/confluence/display/Hive/LanguageManual
- Command Line Interface – https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
- Data Definition Language – https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
- Data Manipulation Language – https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
- Windowing and Analytical – https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
- Apache Spark
- Quick Start – https://spark.apache.org/docs/latest/quick-start.html
- RDD Programming Guide – https://spark.apache.org/docs/latest/rdd-programming-guide.html
- Spark SQL, DataFrames and Datasets Guide – https://spark.apache.org/docs/latest/rdd-programming-guide.html
- Spark Streaming Programming Guide – https://spark.apache.org/docs/latest/streaming-programming-guide.html
- Submitting Applications – https://spark.apache.org/docs/latest/submitting-applications.html
- Apache Sqoop
- Apache Avro
- Specifications – https://avro.apache.org/docs/current/spec.html
- Apache Flume
- Apache Kafka
- Introduction – https://kafka.apache.org/intro
- Quickstart – https://kafka.apache.org/quickstart
- Cloudera Impala
- Arun’s Practice problems
- Arun Kumar Pasuparthi has a good set of questions covering all the main topics.
- Problem Scenario 1 – http://arun-teaches-u-tech.blogspot.com/p/cca-175-prep-problem-scenario-1.html
- Problem Scenario 2 – http://arun-teaches-u-tech.blogspot.com/p/cca-175-prep-problem-scenario-2.html
- Problem Scenario 3 – http://arun-teaches-u-tech.blogspot.com/p/cca-175-hadoop-and-spark-developer-exam_28.html
- Problem Scenario 4 – http://arun-teaches-u-tech.blogspot.com/p/cca-175-hadoop-and-spark-developer-exam_5.html
- Problem Scenario 5 – http://arun-teaches-u-tech.blogspot.com/p/cca-175-hadoop-and-spark-developer-exam_9.html
- Problem Scenario 6 – http://arun-teaches-u-tech.blogspot.com/p/problem-6.html
- Problem Scenario 7 – http://arun-teaches-u-tech.blogspot.com/p/problem-7.html
- PRACTICE, PRACTICE, PRACTICE – Practice each topic until you are very comfortable. Refer to the documentations whenever you have any doubts.
Things to Remember Before Taking CCA175 Exam
- Have a computer with a webcam and good internet connection.
- Make use you have Google Chrome installed along with ExamLocal’s add-on. Verify your computer is compatible to take the exam by using the self check – https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck.
- Keep an identification card like Driver’s license or Passport to verify your identity to the proctor.
- Keep the desk and room void of any electronics and papers. The proctor would ask you to show the desk and room with your webcam to verify this.
- If you’re planning to take the exam on laptop, connect it to an external monitor as the laptop screen may be too small to view the remote desktop.
- Ensure no one else is in the room before starting the exam. Keep the doors locked if possible to prevent any disturbances.
- If you’re taking exam from your workplace or library, make sure the firewall is configured to allow connections to ExamLocal.
- Drink water and eat food before the exam as you’re not allowed any drinks or food during the exam.
- Use the restroom just before the exam starts as you’re not allowed any breaks during the exam.
Things to Remember During CCA175 Exam
- Be patient and remain calm. There’s no need to panic.
- Read all the questions before starting on the solutions. Start with the easy ones.
- Verify each solution after solving them. Check the output location and format of the output. You may not have time at the end of the exam to verify again.
- Be cognizant of the time. Skip the problem and come back later if you’re stuck.
- Keep in mind that you don’t need to score 100% as the passing score is 70%. It’s okay to miss a problem. Don’t let one hard problem impact your ability to solve other problems.
- Don’t wait when the program is running to generate the output. Let it run in the background and start working on the next problem.
- Always look towards the monitor and do not chew, talk or cover your mouth during the exam. Proctor may disconnect you from the exam if they feel suspicious about your activities.
Things to Remember After Taking CCA175 Exam
- Make note of the things that you found challenging during the notes. You can come back to this list later and close out your gaps.
- Relax and be patient. You would receive the exam results within 24 hours. I received mine within 2 hours after the completion of the exam.
- If you pass the exam, you’d receive your digital certificate and license within 48 hours. I received mine after 40 hours.
- If you didn’t pass the exam, remind yourself that this is not an easy exam and it’s okay if you didn’t make it. Practice the topics that you found challenging and come back stronger. DO NOT GIVE UP!
CCA175 is not an easy exam. Preparation requires at least a couple of months if your intention is to learn the topics thoroughly during the process. Keep reading and practice every scenario. If you follow the above plan, you’ll not only complete your certification but will also become proficient in these topics.
Please feel free to post your questions/thoughts below and share your success stories. All the best!
Good post. Was avro package included or did you import it?
Rahul, all the libraries were already included. I didn’t have to import them explicitly.
when you said avro libraries were included does that mean you don’t have to mentioned “–packages org.apache.spark:spark-avro_xxxxx” when you start spark shell ?
Without mentioning above package option we can read avro files ?
Sumit, that’s correct. Avro package would already be installed in the spark’s lib directory. So we don’t have to specify it with spark-shell command.
Hi, Did you get any questions on Flume?
Hey Radhika. No, I didn’t get any questions on Flume. But I would recommend being familiar with the basics.
How much time would you allocate to go from the beginning of the course to the exam
Ashwin – Great blog, with nice tips and suggestions for the preparation. I’m not able to see the Cloudera sandbox VM download link. looks like recently in Feb, 2020 cloudera discontinued it, correct me if I’m wrong.
Spent a lot of time to download and install Cloudera quickstart vm. But no luck.
Link I’ve tried https://www.cloudera.com/downloads/quickstart_vms/5-13.html
Appreciate if you route me to get a sandbox installed with all required technologies.
Thank you, Kiran. Yes, Cloudera Quickstart VM has been discontinued.
You can try Hortonworks sandbox. It comes with all the big data tools like Hadoop, Spark and Hive.
I read your blog on AWS Big Data Specialty and Cloudera CCA175. Both are very precise and would be helpful in exam preparation. However, I wanted to which one of the two would help for getting a better job as Data Engineer. I understand skills from both the certifications are required to become a data engineer. I intent to prepare myself for both the area but wish to take only one certification exam. In this case which one would you recommend.
I am currently working as Data Analyst and have 6-7 month’s of AWS experience.
Thank you for your feedback!
I would recommend pursuing AWS certification and doing some side project using Cloudera/Big data technologies.
In my opinion, AWS Certification is valued more than Cloudera Certification. AWS certification has 3 years of validity while Cloudera’s has just 2 years of validity. Most importantly, with big data technologies, side projects demonstrate your skillset and knowledge more than certification.
Regarding Spark, do you still need to learn RDDs for this certification or will Datasets suffice?
Hi Pavi. The exam doesn’t require any specific api. You can solve the problems using RDD/Dataset/Spark SQL based on your preference.
Do we need to know scala/java for the exam or its fine if we are good at python ?
Hi Neelima. The exam doesn’t require us to use any particular language. We can use either Scala or Python. I would recommend being familiar with both because they are very similar and it’s easy to pick up the other if you already know one.
Thanks for the post and congrats on getting certified. I am preparing for the cloudera spark exam and wondering if I explicitly need to learn scala for it as I am fairly comfortable in writting python code.
This is specifically in regards to datasets that aren’t available in python.
Secondly , I read the exam requirements on the cloudera website that doesn’t mention anything on flume,swoop and Kafka . It mentions only 3 broader skill sets to be tested i.e data analysis , configuration and ETL. I was wondering if I need to prepare for flume , sqoop and Kafka too or just simply can skip them
Hi Khurram. Thank you. You can use either Python or Scala. The exam just cares about the output. Sqoop, flume and kafka are not included in the new syllabus and they can be skipped. Good luck!
Do you need both Python and Scala, or it is up to the exam taker to decide which programming language will be used, or can you for example solve one with Scala and another one with Python.
Hi Nikola. The exam doesn’t require us to use any particular language. We can use either Scala or Python. I would recommend being familiar with both because they are very similar and it’s easy to pick up the other if you already know one.
I just want get experience or see how real CCA175 exam environment looks like.. Is it possible.
Right now ,I am practicing on cloudxlab..but I have heard that the actual environment look like cloudera VM..Do I need install cloudera VM..any suggestion.
Cloudera has made it difficult to install their new sandbox VMs. You’ll be fine if you’re familiar with command line.
You can install the older version of Cloudera sandbox (cloudera-quickstart-vm-5.13) to get an idea.