Machine Learning A-Z: Hands-On Python & R In Data Science

Certificate of Completion from Udemy
Certificate of Completion from Udemy

I completed the Machine Learning A-Z: Hands-On Python & R In Data Science course from Udemy on Aug 1, 2019. I would say “Machine Learning A-Z for Programmers” is a more apt title for the course. It’s a beginner-friendly course aimed towards programmers that covers a wide range of topics with hands-on programming with Python and R.

In this blog post, I share everything you need to know about this course and provide a gist of each section. I had prior experience with Machine Learning, so this course was a refresher for me and provided hands-on practice with various Machine Learning libraries.

What to expect in the course?

The course starts with an introduction to Machine Learning and Data processing. Then, it covers a comprehensive list of Machine Learning algorithms. For each of the algorithms, a brief and simplified explanation is provided followed by the implementation in Python and R using popular libraries. Finally, the course ends by going over various tools for evaluating the performances of models and finding the right model to use.

What does the course lack?

This course doesn’t provide an in-depth explanation of the math and stats for many of the algorithms. Also, the programming part uses pre-existing libraries instead of implementing the algorithms.

Prerequisites

You need to be familiar with some programming language, preferably Python or R. This course is fully hands-on and involves writing code on Python and R. If you do not have experience with any programming language, I would recommend taking a course of Python.

Price

The actual listed price for this course on Udemy is $199.99. However, you can almost always find it on sale for much cheaper with coupons. I was able to get the course for $9.99 during a promotion. I would attest that I got way more than what I paid for.

Duration

The course has 41.5 hours of lecture videos. For every hour of video, I would add 2 hours for practicing and understanding. In total, this course can be completed in 125 hours. You can also watch the videos at a faster playback speed (e.g.: 1.25x or 1.5x speed).

Popularity

This course is one of the most popular Machine Learning courses available online. At the time of writing this post, this course had more than 400K students enrolled and had a 4.5 average rating from about 90K ratings.

Should you take the course?

You should take a course if you are looking for a beginner-friendly course on Machine Learning or to get a feel of using popular libraries such as scikit-learn or keras in Python.

Contents of the course

  • Introduction
  • Data Preprocessing
  • Regression
    • Simple Linear Regression
    • Multiple Linear Regression
    • Polynomial Regression
    • Support Vector Regression (SVR)
    • Decision Tree Regression
    • Random Forest Regression
    • Evaluating Regression Models Performance
  • Classification
    • Logistic Regression
    • K-Nearest Neighbors (K-NN)
    • Support Vector Machine (SVM)
    • Kernel SVM
    • Naive Bayes
    • Decision Tree Classification
    • Random Forest Classification
    • Evaluating Classification Models Performance
  • Clustering
    • K-Means Clustering
    • Hierarchical Clustering
  • Association Rule Learning
    • Apriori
    • Eclat
  • Reinforcement Learning
    • Upper Confidence Bound (UCB)
    • Thompson Sampling
  • Natural Language Processing
    • Bag of Words
    • Stop words
    • Porter Stemmer
    • Feature Extraction
  • Deep Learning
    • Artificial Neural Networks
    • Convolutional Neural Networks
  • Dimensionality Reduction
    • Principal Component Analysis (PCA)
    • Linear Discriminant Analysis (LDA)
    • Kernel PCA
  • Model Selection and Boosting
    • Model Selection
    • XGBoost
Introduction

Machine Learning is a subset of Artificial Intelligence that deals with the application of statistical models and algorithms to automatically learn and perform with minimal human intervention. Usually, Machine Learning involves computer programs that learn from historical data, identify patterns, and make decisions or predictions.

Data Preprocessing

This section covers all the things that are required before applying any Machine Learning algorithm. All the steps starting from setting up the program, importing the program, importing the dataset, dealing with the missing data, splitting the data into train/test datasets, and feature scaling are covered. Finally, a template with all the data preprocessing steps is provided. This template comes in handy throughout the course and can also be used for Machine Learning projects outside the course.

  • Importing the libraries
  • Importing the dataset
  • Python OOP basics
  • Missing Data
  • Categorical Data
  • Feature Scaling
  • Preprocessing template
Regression

Regression is a group of Machine Learning models used to predict continuous values from given data. For example, Regression can be used to predict the salary (a continuous value) of employees based on independent variables such as age and years of experience.

This section starts off with the intuition behind Regression and goes over several models such as Linear Regression and Random Forest Regression. The section ends with several tools and statistics to measure the performance of Regression models.

  • Simple Linear Regression
  • Multiple Linear Regression
  • Polynomial Regression
  • Support Vector Regression (SVR)
  • Decision Tree Regression
  • Random Forest Regression
  • Evaluating Regression Models Performance
Classification

Classification, as the name suggests, is the process of classifying into different categories. In other words, classification is a group of Machine Learning models used to predict categorical values or non-continuous values from given data. For example, classification can be used to predict whether a customer is going to buy a product or not based on their past purchase history.

Similar to Regression, this section starts off with the intuition behind classification, then goes over classification models such as Logistic regression, K-nearest neighbors, and Random Forest Classification. The section ends with tools to evaluate the performance of Classification models.

  • Logistic Regression
  • K-Nearest Neighbors (K-NN)
  • Support Vector Machine (SVM)
  • Kernel SVM
  • Naive Bayes
  • Decision Tree Classification
  • Random Forest Classification
  • Evaluating Classification Models Performance
Clustering

Clustering or Cluster Analysis is the process of grouping or clustering the data points into groups or clusters. The members of a group or cluster would have similar traits or exhibit similar behavior. For example, Clustering can be used to group customers with similar purchase histories. Then the customers in the same group would have similar interests and each group can be served with targeted ads based on the shared traits. Two clustering techniques – K-Means and Hierarchical are covered in this course.

  • K-Means Clustering
  • Hierarchical Clustering
Association Rule Learning

Association rule learning is a machine learning method to discover relationships between multiple variables in a dataset. An application of Association Rule Learning would be to identify that whenever customers buy bread, they are more likely to buy ketchup as well. Apriori and Eclat methods of Association rule learning are discussed in this section.

  • Apriori
  • Eclat
Reinforcement Learning

Reinforcement Learning is a Machine Learning method that involves providing a list of possible actions, and a required outcome and rewarding the program for progressing towards the outcome. The program can get efficient after each round by learning from previous iterations and trying to maximize the rewards earned. Two Reinforcement Learning algorithms – Upper Confidence Bound (UCB) and Thompson Sampling are covered in this section.

  • Upper Confidence Bound (UCB)
  • Thompson Sampling
Natural Language Processing

Natural Language Processing is an area of Machine Learning concerned with understanding and dealing with human languages. Speech recognition, natural language understanding, and natural language generation are the biggest challenges of Natural Language Processing. Siri or Google Assistant’s speech recognition are the prime examples of Natural Language Processing. This section of the course provides a demo of creating a program to classify the reviews for a restaurant as positive or negative.

  • Bag of Words
  • Stop words
  • Porter Stemmer
  • Feature Extraction
Deep Learning

Deep Learning was my favorite section in this course. Deep Learning is a family of machine learning algorithms based on artificial neural networks. Intuitively, deep artificial neural networks try to mimic human neural networks for machine learning. Two deep learning architectures – Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN) are explained in this course.

  • Artificial Neural Networks
  • Convolutional Neural Networks
Dimensionality Reduction

Dimensionality Reduction is the process of combining multiple factors or variables into fewer variables. Some variables may be correlated to each other and it may be redundant to include them separately. Also, Machine Learning algorithms may be faster with fewer variables and it’s also easier to visualize them. For example, if the dataset has separate variables such as actual temperature and feels like temperature, they can both be combined into one variable as they are correlated. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Kernel PCA are three of the Dimensionality Reduction techniques that are explained in this course.

  • Principal Component Analysis (PCA)
  • Linear Discriminant Analysis (LDA)
  • Kernel PCA
Model Selection

Model Selection is probably the most important part of the Machine Learning process. This section shows how to choose the right model and compare the model with different hyper-parameters (argument values).

  • K-Fold Cross Validation
  • Grid Search
Boosting

Boosting is the final section of the course. It’s a bonus section and the concept of boosting is introduced with an example is shown on using the XGBoost library. XGBoost is a scalable and distributed gradient-boosting library. In simple words, it executes the machine learning algorithms in a faster way by utilizing the hardware.

  • XGBoost

TL;DR

Machine Learning A-Z: Hands-On Python & R In Data Science is a great course for programmers to get introduced to Machine Learning. All the major concepts of Machine Learning are introduced without going deeper into the mathematical foundations. The course provides walk-throughs to program a comprehensive list of Machine Learning algorithms using popular libraries in Python and R.

1,540 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *