Amazon Web Services (AWS) certifications corroborate cloud expertise and facilitate effective cloud initiatives. AWS offers certifications on several streams such as Solutions Architect, Networking, Database, etc. I completed the AWS Certified Database Specialty on Feb 21, 2022. This certification requires expertise with on-premises and AWS cloud-based relational and non-relational databases. Currently, I work predominantly in […]



- 1
- 2
- 3
- 4
Cloud Computing - Similar to Car Rental?
Cloud Virtual Machine - Cloud Server
Serverless Computing - Function as a Service (FaaS)
Cloud Regions and Availability Zones
Cloud Block Storage as a Service
Cloud File Storage as a Service
Cloud Object Storage as a Service
Covid-19 hooks Zoom up with Oracle Cloud - Explained with Memes
This is Big Data
Cloud Desktop as a Service
Tag: big data
Find the version of Apache Hive from Command Line Interface (CLI)
The version of Apache Hive can be retrieved from Command Line. You don’t have to navigate through the configuration files or browse through the User Interface. There are two commands that can be used from Command Line to obtain the version of Apache Hive. COMMAND #1 This command follows the popular convention used by other […]
This is Big Data
Did you know that every minute:
50,000 photos are posted on Instagram,
500,000 photos are shared on Snapchat,
1,000,000 swipes are done on Tinder and
4,00,000 videos are watched on YouTube
What is Cloud Computing?
We hear the term “Cloud Computing” a lot in the media, advertisements, news and in memes. Cloud Computing has continuously been a trending term throughout the last decade. But what does cloud computing mean? Cloud computing is the offering of computing as a service. Consumers can pay the cloud computing service for on-demand use of […]
My Path To AWS Certified Big Data Specialty
Amazon Web Services certifications are few of the most reputed in the field of Software Engineering. I successfully completed the AWS Big Data Speciality certification on Nov 25, 2019. This certification tests the candidate on two of the most wanted skills right now – Cloud and Big Data technologies. Prior to taking this certification, I […]
Execute Linux Commands from Spark Shell and PySpark Shell
Linux commands can be executed from Spark Shell and PySpark Shell. This comes in handy during development to run some Linux commands like listing the contents of a HDFS directory or a local directory. These methods are provided by the native libraries of Scala and Python languages. Hence, we can even use these methods within […]
Apache Spark: Repartition vs Coalesce
Repartition can be used for increasing or decreasing the number of partitions. Whereas Coalesce can only be used for decreasing the number of partitions. Coalesce is a less expensive operation than Repartition as Coalesce reduces data movement between the nodes while Repartition shuffles all data over the network. Partitions What are partitions? The dataset in […]
Apache Spark on a Single Node/Pseudo Distributed Hadoop Cluster in macOS
This article describes how to set up and configure Apache Spark to run on a single node/pseudo distributed Hadoop cluster with YARN resource manager. Apache Spark comes with a Spark Standalone resource manager by default. We can configure Spark to use YARN resource manger instead of the Spark’s own resource manager so that the resource […]
Single Node/Pseudo Distributed Hadoop Cluster on macOS
This article walks through setting up and configuring a single node Hadoop Cluster or pseudo-distributed cluster on macOS. A single node cluster is very useful for development as it reduces the need for an actual cluster for running quick tests. At the end of this tutorial, you’ll have a single node Hadoop cluster with all […]