Intellicus Enterprise Reporting and Business Insights 19.1

Predictive Analytics

0 views June 28, 2020 0

Data Science capabilities in Intellicus allow you to get predictions on your data to know the future trends and possibilities. You can add Data Science engine step at Query Object level or perform predictive analytics at report level.

Please refer “WorkingwithQueryObject” to guide you on how to add Data Science engine step at Query Object level.

Once you create the necessary steps for a report to be generated from the Query Object, you can run the report and visualize your data with predictions.

You will see the following once you run the report:

Machine Learning Operations toolbar
Figure 46: Machine Learning Operations toolbar while running a report in Smart View

Note: The Machine Learning Operations will be visible if you add Data Science engine step with the necessary modular script at Query Object level.

You can choose between Prediction only or Training and Prediction from here. Prediction only will use a last trained model to bring out predictions, whereas, Training and Prediction will perform retraining based on the latest datasets before giving prediction.  After selecting your choice, click Apply.

You can save your choice as default option every time you run a report by checking the box for Save Values for Next Run.

Performing Predictive Analytics

With Intellicus, business users can perform predictive analytics to get predictions on their data. Predictive Analytics helps you to input your script directly at report level and bring out predictions on your data. Adding script at report level is most useful when your predictions are not forming new variables or columns in your data reports.

Turn on the edit mode to view option for Predictive Analytics. You can perform predictive Analytics in Smart View Reports.

Predictive Analytics and What-If Analysis
Figure 47: Tabs for Predictive Analytics and What-If Analysis

Predictive Analytics box will give you the options as shown in the image below:

predictive analytics
Figure 48: Performing predictive analytics

Fields

This will show the fields present in your report and you can choose on which fields you want predictions. Clicking any field will give you the ability to write Data Science script for that field.

Data Science Connection

Here you need to choose the Data Science Engine you want to use for generating predictions. In this case, PyServer.

Prediction Script

Here you can write the script for the field(s) you choose.

Prediction Data source

Here you can select if you want the Data Science engine to analyze the variations in independent variables itself by selecting Auto or you can provide the data by selecting Data Source. Independent variables help to bring out predictions on dependent variables.

For example, if you want to predict Sales (dependent variable) your company would achieve in the coming years, you will have to provide marketing expenditure (independent variable), investment in infrastructure (independent variable), number of probable hires (independent variable) etc.

You can select Auto to let the Data Science engine learn the trend by reading your historic data and predict the values of independent variables. If you have pre-decided values, you can provide it using the Data Source option.

 Auto

In Auto, you need to give the prediction data point in numeric value, for instance if you keep the value as 4, the predictions will be made for 4 units as per the intervals in your chart.

Auto in What-if variable source
Figure 49: Select Auto in What-if variable source

Data Source

Here you need to specify the query object that has the prediction data (Independent variable values), to get predictions for the fields you choose and provide script for the same.

options in data source
Figure 50: Options if you select Data Source

Upon adding the script, you can verify if the script is error free. Click OK if the verification process succeeds.

Guidelines for writing script while viewing reports

  • The script needs to have Meta section for declaring independent variables, categorical variables and sections for Training and Prediction. These sections should start with #. These place holders should be surrounded by <%%> for Intellicus to be able to parse and understand the modularization. For e.g., #<% TRAINING.SECTION %>
  • Designer/ Data Scientist should specify comma separated independent variables, categorical variables in the comment section at the top of the script (line starting with #) under META.SECTION.
  • Appropriate aggregation functions need to be defined while defining the independent variables. E.g. #SUM(Marketing_Spent),SUM(R&D_Spent)
  • Independent Variables can either be numeric or categorical data. In case of categorical data, designers/ data scientists should write the script to handle the categorical data (encoding, feature scaling and decoding) in training as well as prediction script.
  • If the encoders created in case of categorical data need to persist, the same has to be written by the designer/ data scientist in the script.
  • The order of specifying Independent Variables should be considered as the schema of the dataset being used for training and prediction. User should consider this order while writing the script (in case of indexes and ‘.’).
  • It is assumed that data for training is fetched from a csv data-source therefore the first line of the training section should be to read that csv into a dataset. Argument passed in the csv reading function should be <%ThisControl.ReadData%> Ex. pd.read_csv(‘<%ThisControl.ReadData %>’)
  • Like the training section, the first line of prediction section must be to fetch the data to be used for prediction from a csv data-source. Argument passed in the reading function should be <%ThisControl.PredictionData%>. Ex. predictionData = pd.read_csv(‘<%ThisControl.PredictionData%>’)
  • The predicted data in the script must be referred by a place holder <%ThisControl.PredictedData%> and should ideally be the last line of prediction section. For eg. <%ThisControl.PredictedData%> = myModel.predict(predictionData.iloc[:,0:7])
  • However, if prediction script is not specified on the script editor, then Intellicus shall run an automatically generated prediction script.
  • The model created is by default saved as ‘myModel.’ This is a mandatory name to the model you create.
  • The training will only happen if the training script is provided, otherwise it will be assumed that a trained model is used.
  • If a trained model is used, it is mandatory for user to provide a prediction script.

Once you have added a script, you can Verify if it is appropriately written and click OK.

 

An example script for your reference –

#<%META.SECTION%>

SUM(Rndspend),SUM(Marketingspend),State

#<%TRAINING.SECTION%>

import pandas as pd

import numpy as n

dataset = pd.read_csv(‘<%ThisControl.ReadData%>’)

X = dataset.iloc[:, :3].values

y = dataset.iloc[:, 3].values

from sklearn.preprocessing import OneHotEncoder,LabelEncoder
labelEncoder = LabelEncoder();

X[:, 2] = labelEncoder.fit_transform(X[:,2])

oneHotEncoder = OneHotEncoder(categorical_features=[2])
X = oneHotEncoder.fit_transform(X).toarray()

import pickle

pickle.dump(oneHotEncoder, open(‘<%ThisControl.Dir%>/oneHotEncoder.pkl’, ‘wb’))

pickle.dump(labelEncoder, open(‘<%ThisControl.Dir%>/labelEncoder.pkl’, ‘wb’))

from sklearn.ensemble import RandomForestRegressor

myModel = RandomForestRegressor(n_estimators=20, random_state=0)

myModel.fit(X,y)

#<%PREDICTION.SECTION%>

import pandas as pd

predictionData = pd.read_csv(‘<%ThisControl.PredictionData%>’)

import pickle

from sklearn.preprocessing import OneHotEncoder,LabelEncoder

onHotEncoderPred = pickle.load(open(‘<%ThisControl.Dir%>/oneHotEncoder.pkl’, ‘rb’))

labelEncoderPred = pickle.load(open(‘<%ThisControl.Dir%>/labelEncoder.pkl’, ‘rb’))