Intellicus Enterprise Reporting and Business Insights 19.0

Guidelines to Write Script

0 views June 28, 2020 0

Following are the guidelines to write script to perform predictive analytics while viewing reports.

  • The script needs to have Meta section for declaring independent variables, categorical variables and sections for Training and Prediction. These sections should start with #. These place holders should be surrounded by <%%> for Intellicus to be able to parse and understand the modularization. For e.g., #<% TRAINING.SECTION %>
  • Designer/ Data Scientist should specify comma separated independent variables, categorical variables in the comment section at the top of the script (line starting with #) under META.SECTION.
  • Appropriate aggregation functions need to be defined while defining the independent variables. E.g. #SUM(Marketing_Spent),SUM(R&D_Spent)
  • Independent Variables can either be numeric or categorical data. In case of categorical data, designers/ data scientists should write the script to handle the categorical data (encoding, feature scaling and decoding) in training as well as prediction script.
  • If the encoders created in case of categorical data need to persist, the same has to be written by the designer/ data scientist in the script.
  • The order of specifying Independent Variables should be considered as the schema of the dataset being used for training and prediction. User should consider this order while writing the script (in case of indexes and ‘.’).
  • It is assumed that data for training is fetched from a csv data-source therefore the first line of the training section should be to read that csv into a dataset. Argument passed in the csv reading function should be <%ThisControl.ReadData%> Ex. pd.read_csv(‘<%ThisControl.ReadData %>’)
  • Like the training section, the first line of prediction section must be to fetch the data to be used for prediction from a csv data-source. Argument passed in the reading function should be <%ThisControl.PredictionData%>. Ex. predictionData = pd.read_csv(‘<%ThisControl.PredictionData%>’)
  • The predicted data in the script must be referred by a place holder <%ThisControl.PredictedData%> and should ideally be the last line of prediction section. For eg. <%ThisControl.PredictedData%> = myModel.predict(predictionData.iloc[:,0:7])
  • However, if prediction script is not specified on the script editor, then Intellicus shall run an automatically generated prediction script.
  • The model created is by default saved as ‘myModel.’ This is a mandatory name to the model you create.
  • The training will only happen if the training script is provided, otherwise it will be assumed that a trained model is used.
  • If a trained model is used, it is mandatory for user to provide a prediction script.

Once you have added a script, you can Verify if it is appropriately written and click OK.

An example script for your reference –

#<%META.SECTION%>
SUM(Rndspend),SUM(Marketingspend),State
#<%TRAINING.SECTION%>
import pandas as pd
import numpy as np
dataset = pd.read_csv(‘<%ThisControl.ReadData%>’)
X = dataset.iloc[:, :3].values
y = dataset.iloc[:, 3].values
from sklearn.preprocessing import OneHotEncoder,LabelEncoder
labelEncoder = LabelEncoder();
X[:, 2] = labelEncoder.fit_transform(X[:,2])
oneHotEncoder = OneHotEncoder(categorical_features=[2])
X = oneHotEncoder.fit_transform(X).toarray()
import pickle
pickle.dump(oneHotEncoder, open(‘<%ThisControl.Dir%>/oneHotEncoder.pkl’, ‘wb’))
pickle.dump(labelEncoder, open(‘<%ThisControl.Dir%>/labelEncoder.pkl’, ‘wb’))
from sklearn.ensemble import RandomForestRegressor
myModel = RandomForestRegressor(n_estimators=20, random_state=0)
myModel.fit(X,y)
#<%PREDICTION.SECTION%>
import pandas as pd
predictionData = pd.read_csv(‘<%ThisControl.PredictionData%>’)
import pickle
from sklearn.preprocessing import OneHotEncoder,LabelEncoder
onHotEncoderPred = pickle.load(open(‘<%ThisControl.Dir%>/oneHotEncoder.pkl’, ‘rb’))
labelEncoderPred = pickle.load(open(‘<%ThisControl.Dir%>/labelEncoder.pkl’, ‘rb’))