Amazon QuickSight is a managed business analytics service that’s part of the Amazon Web Services suite. Amazon QuickSight offers capabilities to create dashboards with visualizations and perform ad hoc analysis to obtain insights from the data. Amazon QuickSight works with several AWS data sources such as RDS, Aurora and Redshift, and also other data sources […]



- 1
- 2
- 3
- 4
Cloud Computing - Similar to Car Rental?
Cloud Virtual Machine - Cloud Server
Serverless Computing - Function as a Service (FaaS)
Cloud Regions and Availability Zones
Cloud Block Storage as a Service
Cloud File Storage as a Service
Cloud Object Storage as a Service
Covid-19 hooks Zoom up with Oracle Cloud - Explained with Memes
This is Big Data
Cloud Desktop as a Service
Execute Linux Commands from Spark Shell and PySpark Shell
Linux commands can be executed from Spark Shell and PySpark Shell. This comes in handy during development to run some Linux commands like listing the contents of a HDFS directory or a local directory. These methods are provided by the native libraries of Scala and Python languages. Hence, we can even use these methods within […]
Course Review – Machine Learning A-Z: Hands-On Python & R In Data Science
I completed Machine Learning A-Z: Hands-On Python & R In Data Science course from Udemy on Aug 1, 2019. I would say “Machine Learning A-Z for Programmers” is a more apt title for the course. It’s a beginner friendly course aimed towards programmers that covers a wide range of topics with hands-on programming with Python […]
Lean Six Sigma White Belt
I received my Lean Six Sigma White Belt on July 25, 2019 through my employer, CME Group. White Belt was a great way to get my feet wet with Lean Six Sigma. In this post, I provide a gist of what Lean Six Sigma is and share my experience. Lean Six Sigma Lean Six Sigma […]
Amazon EC2 Instances: M5 vs M5d vs M5a vs M5ad
Amazon Elastic Compute Cloud (EC2) is a service that offers compute capacity in Amazon Web Services (AWS) cloud. Amazon EC2 M5 Instances are the fifth generation EC2 instances that are ideal for General Purpose computing as they offer a balance of compute, memory and networking resources. M5 can be used as servers, caching fleets, app […]
Amazon EC2 Spot Instances: Most and Least Interrupted Instance Types
Amazon EC2 Spot Instances are one type of purchasing the EC2 instances, the other two types being on-demand and reserved instances. Spot instances are the cheapest among the three types and they are cost effective for running fault-tolerant workloads. Before starting to use the Spot instances, it’s important to understand that Spot instances will be […]
Apache Sqoop: Import data from RDBMS to HDFS in ORC Format
Apache Sqoop import tool offers capability to import data from RDBMS (MySQL, Oracle, SQLServer, etc) table to HDFS. Sqoop import provides native support to store data in text file as well as binary format such as Avro and Parquet. There’s no native support to import in ORC format. However, it’s still possible to import in […]
Cloudera CCA Spark and Hadoop Developer (CCA175) Certification – Preparation Guide
Cloudera’s CCA Spark and Hadoop Developer (CCA175) exam validates the candidate’s ability to employ various Big Data tools such as Hadoop, Spark, Hive, Impala, Sqoop, Flume, Kafka, etc to solve hands-on problems. I passed CCA175 certification exam on May 13, 2019 and wanted to share my experience. This article has everything you should know about […]
Apache Spark: Repartition vs Coalesce
Repartition can be used for increasing or decreasing the number of partitions. Whereas Coalesce can only be used for decreasing the number of partitions. Coalesce is a less expensive operation than Repartition as Coalesce reduces data movement between the nodes while Repartition shuffles all data over the network. Partitions What are partitions? The dataset in […]
Apache Spark on a Single Node/Pseudo Distributed Hadoop Cluster in macOS
This article describes how to set up and configure Apache Spark to run on a single node/pseudo distributed Hadoop cluster with YARN resource manager. Apache Spark comes with a Spark Standalone resource manager by default. We can configure Spark to use YARN resource manger instead of the Spark’s own resource manager so that the resource […]