18 Types of Predictive Models in Data Science
An Attempt to Categorize Supervised Learning Models (1.0)
There are many types of Predictive Models for forecasting and it is a struggle to find a short write up on the subject. Below is a walkthrough, of several major types of forecasting with discussions on methodology and data descriptions below . . . whether you are predicting a category, a number, a text or an image.
The oldest model is (1) Multiple Linear Regression or Ordinary Least Squares Regression, which is likely to be the first model a Data Scientist would learn from . . . it Feature Engineers a dataset to drive a forecasted number and leads to various other models. Closely related is (2) Ridge Regression and (3) Lasso Regression. Both improve efficiency and accuracy on Linear Regression forecasts.
Ridge and Lasso Regression: L1 and L2 Regularization
Also Ridge Regression helps with Multicollinearity, when features or factors have a relationship that reduces the accuracy of a forecast. For more, check out my prior article on one way to correct Multicollinearity.
How to Improve your Multi-Factor Model (by John Foxworthy)
There are more variations of the Multiple Linear Regression, . . . a lot more, but these are the most commonly used in Data Science.
Moving forward, a contemporary model that was an inflection point for Data Science is the outlier capturing of the (4) Support Vector Machine framework. The link below provides great detail that will build on more models further below.
Support Vector Regression Or SVR
The next inflection point in Data Science is the (5) Random Forest model that leads to using Probability Methods for larger datasets, imperfect datasets that are missing or have imbalance to name a few . . .
Switching gears . . . to Trend Following a Time Series . . . you can fine tune a single array of data for the next future value with (6) ARIMA or Autoregressive Integrated Moving Average. There are variant models that handle seasonality and other data attributes but, for more, take a look at my link below . . .
Price Prediction Evolution with ARIMA (by John Foxworthy)
Also, a more contemporary model that is an improvement of ARIMA, is the Facebook’s Prediction Model, (7) the Prophet, as described in my link below . .
The Facebook Prophet Prediction Model and Product Analytics (by John Foxworthy)
At the same time, if you want to forecast a category rather than a value . . . you can use (8) Logistic Regression that is similar to Multiple Linear Regression and (9) Support Vector Classifier that is similar to Support Vector Regression.
Categories can be binary like . . . pass / fail, churn a website or not, up / down, customer demographic one or two, etc. The base model for predicting a category is essentially a very small Neural Network, which I will revisit later . . .
The other models, (10) Random Forest and (11) Decision Trees, rely on the simplest answers as the best decision making . . . which is Occam’s Razor. Each decision is a single tree, a binary process, when expanded to more trees, becomes a set of trees or a forest. The stack of easy answers generates an overall complexity and captures different attributes of the data, unlike so many other models. A good introduction is below.
Random Forest — A powerful Ensemble Learning algorithm
To go further into different models, then we have to talk about methodology.
Will history help us find an accurate target of the future?
Would the assignment of new observations in our forecast . . . to a past group of values help us understand the nature of our data?
Determinism, like so many methods, has its limits . . .
To change our approach and moving away from historical dependency, we change our methodology to Bayesian and Probabilistic Methods.
Both Random Forest and Decision Trees above are probabilistic methods, but Bayesian is a framework and a philosophy that has been around for a long time. A good introduction is below . . .
Chapter 1 : Supervised Learning and Naive Bayes Classification — Part 1 (Theory)
If your datasets are small and you have domain knowledge from a subject matter expert, then non — deterministic methods may help like the classical (12) Naïve Bayes. The subjective intuition of experience can lead to many powerful forecasts . . .
Is there more? Yes, . . . so far, we have relied on Machine Learning, but a layered version with depth is Deep Learning. This is where my knowledge fades as I begin my Master of Science in Data Science at Northwestern University next month. Below are the survey results showing the Base Models of regression and classification are the most used in production, i.e. (1) Multiple Linear Regression and (8) Logistic Regression.
Please note, (11) Gradient Boosting Machine was originally introduced by Leo Breiman in 1997, and it is like a Decision Tree, but with reduced variance. A detailed explanation is below . . .
Back to Deep Learning, . . .
If we repeated our base models for classification and regression, like (1) Multiple Linear and (8) Logistic, then we would have a Base Model Neural Network. The most popular version is the (13) Convolutional Neural Network, which is mostly used for facial recognition because of its image accuracy . . . and is also the Base Model for Computer Vision. Check out the CNN link below . . .
A Comprehensive Guide to Convolutional Neural Networks
I do not have knowledge about (14) Dense Neural Networks (DNN) and (18) Evolutionary Methods, but for (15) Recurrent Neural Networks (RNN), it is critical for Deep Learning Forecasting of future values, categories, text and images. Natural Language Processing (NLP), in particular, utilizes RNN such as for sentiment analysis . . . and here is a helpful link below.
A Beginner’s Guide on Sentiment Analysis with RNN
Lastly, a recent improvement in language translation is (17) Transformer Works with some logical improvements on prior model shortcomings.
How Transformers Work
All the models above assume the (final) output is for prediction, but this can also be for other (possible) objectives that do NOT predict. Explanations and Indicators are the other objectives.
For example, . . . The Fama — French 3 factor model of 1992 is a Multiple Linear Regression that attempts to explain the nature of U.S. Equity Returns. Eugene Fama would later win the Nobel prize in Economics in 2013 . . . and Fama has NEVER changed his position to be against all forms of forecasting and predictions in the U.S. stock market. Eugene is only an explainer . . .
Separately, models can also be used as indicators to superset a prediction. Indicators imply a prediction such as Linguists defining a psychological profile from text, however, inexact, . . . or a Financial Economist about a possible Financial Market crash in the near future.
Finally, you are probably thinking there are too many models to figure out which one to use? Data Scientists usually lean on Methodological Pluralism as your model type is a variable in itself.
The subject of Data Science, after all, is a re — packaging of Applied Statistics and the recent exponential growth on the quantity of data, coupled with more computing resources, leaves little loyalty to any particular model.