Did you know that every minute:
50,000 photos are posted on Instagram,
500,000 photos are shared on Snapchat,
1,000,000 swipes are done on Tinder and
4,00,000 videos are watched on YouTube
Category: Apache Hadoop
Amazon EMR and Google Cloud Dataproc: Top 10 Common Features
Amazon Web Services and Google Cloud Platform are the two of the three market leaders in cloud computing. They both offer similar kind of cloud-native big data platforms to filter, transform, aggregate and process data at scale. Amazon EMR and Google Cloud Dataproc are Amazon Web Service’s and Google Cloud Platform’s managed big data platforms […]
Cloudera CCA Spark and Hadoop Developer (CCA175) Certification – Preparation Guide
Cloudera’s CCA Spark and Hadoop Developer (CCA175) exam validates the candidate’s ability to employ various Big Data tools such as Hadoop, Spark, Hive, Impala, Sqoop, Flume, Kafka, etc to solve hands-on problems. I passed CCA175 certification exam on May 13, 2019 and wanted to share my experience. This article has everything you should know about […]
Apache Spark on a Single Node/Pseudo Distributed Hadoop Cluster in macOS
This article describes how to set up and configure Apache Spark to run on a single node/pseudo distributed Hadoop cluster with YARN resource manager. Apache Spark comes with a Spark Standalone resource manager by default. We can configure Spark to use YARN resource manger instead of the Spark’s own resource manager so that the resource […]
Single Node/Pseudo Distributed Hadoop Cluster on macOS
This article walks through setting up and configuring a single node Hadoop Cluster or pseudo-distributed cluster on macOS. A single node cluster is very useful for development as it reduces the need for an actual cluster for running quick tests. At the end of this tutorial, you’ll have a single node Hadoop cluster with all […]