18 Types of Predictive Models in Data Science

7 min readAug 19, 2020

An Attempt to Categorize Supervised Learning Models (1.0)

There are many types of Predictive Models for forecasting and it is a struggle to find a short write up on the subject. Below is a walkthrough, of several major types of forecasting with discussions on methodology and data descriptions below . . . whether you are predicting a category, a number, a text or an image.

The oldest model is (1) Multiple Linear Regression or Ordinary Least Squares Regression, which is likely to be the first model a Data Scientist would learn from . . . it Feature Engineers a dataset to drive a forecasted number and leads to various other models. Closely related is (2) Ridge Regression and (3) Lasso Regression. Both improve efficiency and accuracy on Linear Regression forecasts.

Ridge and Lasso Regression: L1 and L2 Regularization

Ridge and Lasso Regression: L1 and L2 Regularization

Complete Guide Using Scikit-Learn

towardsdatascience.com

Also Ridge Regression helps with Multicollinearity, when features or factors have a relationship that reduces the accuracy of a forecast. For more, check out my prior article on one way to correct Multicollinearity.

How to Improve your Multi-Factor Model (by John Foxworthy)

How to Improve your Multi-Factor Model

How can you improve your multi-factor model? If you are comfortable with your feature selection to explain a target…

medium.com

There are more variations of the Multiple Linear Regression, . . . a lot more, but these are the most commonly used in Data Science.

Moving forward, a contemporary model that was an inflection point for Data Science is the outlier capturing of the (4) Support Vector Machine framework. The link below provides great detail that will build on more models further below.

Support Vector Regression Or SVR

Support Vector Regression Or SVR

This post is about SUPPORT VECTOR REGRESSION. Those who are in Machine Learning or Data Science are quite familiar with…

medium.com

Models to Forecast Values (Image by Author)

The next inflection point in Data Science is the (5) Random Forest model that leads to using Probability Methods for larger datasets, imperfect datasets that are missing or have imbalance to name a few . . .

Switching gears . . . to Trend Following a Time Series . . . you can fine tune a single array of data for the next future value with (6) ARIMA or Autoregressive Integrated Moving Average. There are variant models that handle seasonality and other data attributes but, for more, take a look at my link below . . .

Price Prediction Evolution with ARIMA (by John Foxworthy)

Price Prediction Evolution in No Factor Modeling with ARIMA

An explanation and walk — through with Python Programming

medium.com

Also, a more contemporary model that is an improvement of ARIMA, is the Facebook’s Prediction Model, (7) the Prophet, as described in my link below . .

The Facebook Prophet Prediction Model and Product Analytics (by John Foxworthy)

The Facebook Prophet Prediction Model and Product Analytics

Humans have always wanted to know what the future holds, but in today’s information — intensive society, we have much…

medium.com

At the same time, if you want to forecast a category rather than a value . . . you can use (8) Logistic Regression that is similar to Multiple Linear Regression and (9) Support Vector Classifier that is similar to Support Vector Regression.

Categories can be binary like . . . pass / fail, churn a website or not, up / down, customer demographic one or two, etc. The base model for predicting a category is essentially a very small Neural Network, which I will revisit later . . .

Models to Forecast a Category (Image by Author)

The other models, (10) Random Forest and (11) Decision Trees, rely on the simplest answers as the best decision making . . . which is Occam’s Razor. Each decision is a single tree, a binary process, when expanded to more trees, becomes a set of trees or a forest. The stack of easy answers generates an overall complexity and captures different attributes of the data, unlike so many other models. A good introduction is below.

Random Forest — A powerful Ensemble Learning algorithm

Random Forest - A powerful Ensemble Learning algorithm

Now plot and count the target variable to check if the target class is balanced or not…

towardsdatascience.com

To go further into different models, then we have to talk about methodology.

Deterministic, Probabilistic or Bayesian? (Image by Unknown Artist)

Will history help us find an accurate target of the future?

Would the assignment of new observations in our forecast . . . to a past group of values help us understand the nature of our data?

Determinism, like so many methods, has its limits . . .

To change our approach and moving away from historical dependency, we change our methodology to Bayesian and Probabilistic Methods.

Both Random Forest and Decision Trees above are probabilistic methods, but Bayesian is a framework and a philosophy that has been around for a long time. A good introduction is below . . .

Chapter 1 : Supervised Learning and Naive Bayes Classification — Part 1 (Theory)

Chapter 1 : Supervised Learning and Naive Bayes Classification — Part 1 (Theory)

Welcome to the stepping stone of Supervised Learning. We first discuss a small scenario that will form the basis of…

medium.com

If your datasets are small and you have domain knowledge from a subject matter expert, then non — deterministic methods may help like the classical (12) Naïve Bayes. The subjective intuition of experience can lead to many powerful forecasts . . .

Is there more? Yes, . . . so far, we have relied on Machine Learning, but a layered version with depth is Deep Learning. This is where my knowledge fades as I begin my Master of Science in Data Science at Northwestern University next month. Below are the survey results showing the Base Models of regression and classification are the most used in production, i.e. (1) Multiple Linear Regression and (8) Logistic Regression.

Most Popular Data Science Models (Image by Author)

Please note, (11) Gradient Boosting Machine was originally introduced by Leo Breiman in 1997, and it is like a Decision Tree, but with reduced variance. A detailed explanation is below . . .

Gradient Boosting from scratch

Simplifying a complex algorithm

medium.com

Back to Deep Learning, . . .

Models to Forecast a Value, Category, Image or Text (Above Image by Author)

If we repeated our base models for classification and regression, like (1) Multiple Linear and (8) Logistic, then we would have a Base Model Neural Network. The most popular version is the (13) Convolutional Neural Network, which is mostly used for facial recognition because of its image accuracy . . . and is also the Base Model for Computer Vision. Check out the CNN link below . . .

A Comprehensive Guide to Convolutional Neural Networks

A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Artificial Intelligence has been witnessing a monumental growth in bridging the gap between the capabilities of humans…

towardsdatascience.com

I do not have knowledge about (14) Dense Neural Networks (DNN) and (18) Evolutionary Methods, but for (15) Recurrent Neural Networks (RNN), it is critical for Deep Learning Forecasting of future values, categories, text and images. Natural Language Processing (NLP), in particular, utilizes RNN such as for sentiment analysis . . . and here is a helpful link below.

A Beginner’s Guide on Sentiment Analysis with RNN

A Beginner’s Guide on Sentiment Analysis with RNN

Sentiment analysis probably is one the most common applications in Natural Language processing. I don’t have to…

towardsdatascience.com

Lastly, a recent improvement in language translation is (17) Transformer Works with some logical improvements on prior model shortcomings.

How Transformers Work

Transformers

Transformers are a type of neural network architecture that have been gaining popularity. Transformers were recently…

towardsdatascience.com

All the models above assume the (final) output is for prediction, but this can also be for other (possible) objectives that do NOT predict. Explanations and Indicators are the other objectives.

For example, . . . The Fama — French 3 factor model of 1992 is a Multiple Linear Regression that attempts to explain the nature of U.S. Equity Returns. Eugene Fama would later win the Nobel prize in Economics in 2013 . . . and Fama has NEVER changed his position to be against all forms of forecasting and predictions in the U.S. stock market. Eugene is only an explainer . . .

Separately, models can also be used as indicators to superset a prediction. Indicators imply a prediction such as Linguists defining a psychological profile from text, however, inexact, . . . or a Financial Economist about a possible Financial Market crash in the near future.

Finally, you are probably thinking there are too many models to figure out which one to use? Data Scientists usually lean on Methodological Pluralism as your model type is a variable in itself.

The subject of Data Science, after all, is a re — packaging of Applied Statistics and the recent exponential growth on the quantity of data, coupled with more computing resources, leaves little loyalty to any particular model.

(Image by Author)

18 Types of Predictive Models in Data Science

Ridge and Lasso Regression: L1 and L2 Regularization

Complete Guide Using Scikit-Learn

How to Improve your Multi-Factor Model

How can you improve your multi-factor model? If you are comfortable with your feature selection to explain a target…

Support Vector Regression Or SVR

This post is about SUPPORT VECTOR REGRESSION. Those who are in Machine Learning or Data Science are quite familiar with…

Price Prediction Evolution in No Factor Modeling with ARIMA

An explanation and walk — through with Python Programming

The Facebook Prophet Prediction Model and Product Analytics

Humans have always wanted to know what the future holds, but in today’s information — intensive society, we have much…

Random Forest - A powerful Ensemble Learning algorithm

Now plot and count the target variable to check if the target class is balanced or not…

Chapter 1 : Supervised Learning and Naive Bayes Classification — Part 1 (Theory)

Welcome to the stepping stone of Supervised Learning. We first discuss a small scenario that will form the basis of…

Gradient Boosting from scratch

Simplifying a complex algorithm

A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Artificial Intelligence has been witnessing a monumental growth in bridging the gap between the capabilities of humans…

A Beginner’s Guide on Sentiment Analysis with RNN

Sentiment analysis probably is one the most common applications in Natural Language processing. I don’t have to…

Transformers

Transformers are a type of neural network architecture that have been gaining popularity. Transformers were recently…

Written by John T Foxworthy