Predictive analytics involves various algorithms and techniques to forecast future outcomes based on historical data. They are the backbone of operations research professionals, and now with AI are available to larger parts of an organization. These algorithms have been available for 50+ years yet not often are not deployed by a corporation. If used, most often they deployed in Excel, which is limited in the amount of data that can be ingested.

These algorithms need be deployed on large data sets, Hana, S4/HANA, BW, and Hadoop with BTP integration from multiple sources to be fully effective. With modern compute power and infrastructure, and with the integration of these algorithms to an AI tool, they can be effectively deployed.TekMetrix analyticA architectue uses AI LLM models optimized for software development to deploy these algorithms for highly value add process scenarios. Here are some of the key algorithms and techniques used.

**What is a Sales Forecast Example?**

**Linear Regression **

**Description**: A statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.**Use Case**: Predicting sales volume, price, labor costs, material costs, commodity prices, stock prices, or**any continuous outcome**.

**Logistic Regression**

**Description**: Used for binary classification problems, it predicts the probability of a binary outcome (e.g., yes/no, true/false).**Use Case**: Fraud detection, customer churn prediction, product quality

**Decision Trees**

**Description**: A tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.**Use Case**: Classification and regression tasks, such as determining whether a customer will buy a product, new product introduction.

**Random Forest**

**Description**: An ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes (classification) or mean prediction (regression) of the individual trees.**Use Case**: Improving accuracy and robustness over single decision trees.

**Gradient Boosting Machines (GBM)**

**Description**: An ensemble technique that builds models sequentially, each new model correcting errors made by the previous ones.**Use Case**: High-performance tasks like ranking, classification, and regression.

**Support Vector Machines (SVM)**

**Description**: A supervised learning model that analyzes data for classification and regression analysis by finding the hyperplane that best divides a dataset into classes.**Use Case**: Image recognition, text categorization.

**Neural Networks**

**Description**: Computing systems inspired by the biological neural networks that constitute animal brains, capable of pattern recognition and learning from data.**Use Case**: Complex tasks like image and speech recognition, natural language processing.

**K-Nearest Neighbors (KNN)**

**Description**: A non-parametric method used for classification and regression, where the input consists of the k closest training examples in the feature space.**Use Case**: Recommender systems, anomaly detection.

**Time Series Analysis**

**Description**: Techniques that analyze time-ordered data points to extract meaningful statistics and other characteristics.**Use Case**: Forecasting stock prices, weather prediction.**ARIMA (AutoRegressive Integrated Moving Average)**: Combines autoregression, differencing, and moving average models.**Exponential Smoothing**: Applies weighted averages of past observations to forecast future values.

**Clustering Algorithms**

**Description**: Grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups.**Use Case**: Market segmentation, image compression.**K-Means Clustering**: Partitions data into k clusters, each represented by the mean of the points in the cluster.**DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**: Finds core samples of high density and expands clusters from them

### Data Modeling

Data in the CPG industry comes from a variety of sources including SAP sales and supply chain transaction data, marketing sources and customer behavior. The CPG industry is challenged with creating analytics on this data and using the data for predictive and optimizing purposes. The complexity lies in creating a data model. Without a proper knowledge of HANA columnar databases, open source databases and POS integration technology forecasting and analyzing SKU data is a challenge. Predictions that need be made are which customer is going to buy what product next, how many purchases are they going to make and what is the customer churn? What is my customer life time value, how often to they buy and when was the last time they bought a product. Other types of predictions include trade and promotions, pricing, conversion costs and marketing (sg&a) costs. TekMetrix tools are readily available for making these predictions and more. SKU level predictions can be made and structured into a P&L (SKU) level, cash flow and balance sheet. Analysis of the P&L and other financial documents can be done by SKU, customer, geography, brand and other attributes, including aggregations. We can forecast into the future your customer lifetime value (CLV).

The question becomes, how to design and develop an underlying data model supporting a variety of analytics and predictive metrics including embedding AI and ML capabilities into the underlying data model? **TekMetrix data AI ML models and experience will accelerate your project timelines and add significant value to your business.**

**Defining the Right Questions**

If you want to run your business most effectively, we need to make predictions about the future. Predictive analyical forecasting takes historical data and creates a one or multi period forecast. TekMetrix data analysis helps examine past performance and also helps to predict future performance, for example predicting questions like:

- Customer purchases
- Customer life time value
- Warehouse inventory
- Cost of goods sold
- Revenue
- Customer returns
- Pricing
- Trade promotion costs
- Contribution margin impact of incremental sales due to trade promotions

We want to make "when will" predictions for a fixed period and multiple periods in the future.

- Inherently granular
- Customer behavior
- Foward looking
- Multi-platform
- Broadly applicable
- Multidisciplinary

**Predictive Statistics**

Predictive statistics uses statistics for prediction and forecasting generally one period ahead of the current period. The mean of the prediction is the sample mean which is an unbiased estimation of the mean of true demand distribution. Standard deviation for prediction needs to be adjusted if there is insufficient data. If the data is normally distributed than the data can be adjusted for predictive purposes.

If there is a trend in the data, than moving averages, mean and standard deviation computations used for predictive purposes will **lag** the trend. Therefore, linear regression or exponential smoothing are additional forecast options.

**Regression Example - Predict 1 Period Ahead of the Current Period**

The regression equation and its variants are used in Artificial Intelligence (AI) and Machine Learning (ML) modeling. These equations help us understand the relationship between independent variables (features) and a dependent variable (target) by fitting a mathematical function to the data. Various types of regression models, such as linear regression, polynomial regression, and logistic regression, are commonly employed for prediction, classification, and modeling tasks. These models can be developed in SAP HANA, SAC, PaPM. Additional tools could be MatLab, R, Python, or for smaller models in Excel.

- Revenue growth management, as an example can be optimized using regression analysis. Optimal pricing is the price which optimizes overall profit. Models are built to:
- Quantify sales demand at different prices
- Find the optimal price
- Optimization performed with the general regression formula: Y(n) = a + b1X1(n) + b2X2(n) + .... bjXj(n) + E(n) , X values are independent of Y and E(n) it error
- Multiple independent variables can be modeled, for example Sales = a + b1(Price) + b2(Advertising) + E
- The regression equation and variants of the regression equation are used for AI and ML modeling

**Key Performance Indicators Use In Forecasting Beyond Period 2:**

- Direct marketing example using regression analysis can be used to predict future, multiperiod, customer behavior:
- Use key performance indicators of past customer behavior to predict future behavior
- Regression models are also useful for this type of modeling and forecasting
- Regression model predictions using RFM models are used to forecast customer behavior (recency, frequency, monetary value)
- Recency - what were the recent customer purchases (more important than frequency)?
- Frequency - how many purchases did the customer make (more important than monetary value)?
- Monetary value - what is the value of each of the purchases?

- Probability models can be used to forecast longer term horizons
- Buy till you die models (BTYD) are a powerful probabilistic model used to make long range projections, to answer when type questions, when will a customer churn?
- Customer lifetime value modeling using Pareto/NBD and BG/BB models

- Limitations of regression models
- Forecasting more future periods than period 1 beyond the current period, regression models are limited because they need input data

- Making predictions for period 3 than period 2 data can be used as the independent variable
- Regression models are limited based on the data that is available to forecast multiple periods into the future

**Sales Forecast Example**

AI integration with Corporate Data and Linear Regression:

- Data Collection: Gather your historical sales data, including variables that might influence sales, such as marketing spend, seasonality, and economic indicators.
- Data Preprocessing: Clean and prepare the data for analysis. This involves handling missing values, encoding categorical variables, and scaling numerical features, unnecessary data from the data model, billions of records are sufficient.
- Feature Selection: Identify the independent variables (predictors) that are most relevant to predicting the sales. This might be total advertising spend, average product price, or number of holiday promotions.
- Model Training: Use the cleaned and preprocessed data to train the linear regression model. Copilot can help automate this by fitting the linear equation y=b0+b1x1+b2x2+...+bnxn, where y is the dependent variable (sales), and x1,x2,...,xn are the independent variables.
- Model Evaluation: Assess the performance of the model using metrics like R-squared and Mean Squared Error (MSE) to ensure it accurately predicts sales based on the input features.
- Prediction: Use the trained model to make predictions on new data. This is where AI can shine, taking the input variables and outputting a sales forecast.

By managing these steps, TekMetrix AI integration can streamline the entire process of creating and continuously updating a sales forecast. Let’s say we have billions of records in S4-BW4/HANA. We input the data along with variables like marketing spend, holiday seasons, supply chain cycle times, availabilities and econometrics. TekMetrix AI preprocesses the data, selects the best predictors, trains the forecast model, evaluates model performance and then uses it to forecast next month’s sales.