Intellicus Enterprise Reporting and Business Insights 19.1

Guidelines to Write Machine Learning Script

0 views June 28, 2020 0

You can write your scripts inside Intellicus’ environment for Python environment. There are few guidelines that you need to follow while creating/adding scripts.

The guidelines are laid down so as Intellicus can understand and process your script and transmit it to Python environment.

Intellicus suggests creating Python script in modular fashion at Query Object level that will help you to get options like Training and Prediction and Prediction only at the time of report execution as Machine Learning Operations Toolbar. Training can then be scheduled and will save time if an end-user just wants to view the predictions.

For instance, if you schedule the training let’s say in later hours of the day, and next day a user wants to view prediction based on the trained model, he/she just needs to select option to predict only parameter while running reports.

Following are the guidelines for writing/adding script at Query Object level.

  • The script needs to have sections for Training and Prediction. These sections should start with #. These place holders should be surrounded by <%%> for Intellicus to be able to parse and understand the modularization. For e.g., #<% TRAINING.SECTION %>
  • The first line of the Training and Prediction script should be for reading the CSV and the last line of Prediction script should be for writing. Argument passed in the reading section should be <% Stepname.data %> For eg. pd.read_csv(‘<% Train.data%>’) /  predictionData.to_csv(‘<%ThisStep.Data%>’)
  • Previous step data should be referred as ‘StepName.data.’ For e.g., in the transformation area if you created the step as Train, the input must be ‘Train.data.’
  • The model created is by default saved as ‘myModel.’ This is a mandatory name to the model you create as it is referred to while communicating with Python environment.
  • The training will only happen if the training script is provided, otherwise it will be assumed that a trained model is used.
  • If a trained model is used, it is mandatory for user to provide a prediction script.

Once you have added a script, you can Verify if it is correctly written and click Ok. Save or Save As your query object to use it in reporting.

An example script for your reference –

#<%TRAINING.SECTION%>

import pandas as pd

import numpy as np

dataset = pd.read_csv(‘<%Training.Data%>’)

X = dataset.iloc[:, :3].values

y = dataset.iloc[:, 3].values

from sklearn.preprocessing import OneHotEncoder,LabelEncoder

labelEncoder = LabelEncoder();

X[:, 2] = labelEncoder.fit_transform(X[:,2])

oneHotEncoder = OneHotEncoder(categorical_features=[2])

X = oneHotEncoder.fit_transform(X).toarray()

import pickle

pickle.dump(oneHotEncoder, open(‘<%ThisStep.Dir%>/oneHotEncoder.pkl’, ‘wb’))

pickle.dump(labelEncoder, open(‘<%ThisStep.Dir%>/labelEncoder.pkl’, ‘wb’))

from sklearn.ensemble import RandomForestRegressor

myModel = RandomForestRegressor(n_estimators=20, random_state=0)

myModel.fit(X,y)

#<%PREDICTION.SECTION%>

import pandas as pd

predictionData = pd.read_csv(‘<%Prediction.Data%>’)

import pickle

from sklearn.preprocessing import OneHotEncoder,LabelEncoder

onHotEncoderPred = pickle.load(open(‘<%ThisStep.Dir%>/oneHotEncoder.pkl’, ‘rb’))

labelEncoderPred = pickle.load(open(‘<%ThisStep.Dir%>/labelEncoder.pkl’, ‘rb’))

predictionData = predictionData.iloc[:,:-1].values

predictionData[:,2] = labelEncoderPred.transform(predictionData[:,2])

finalPredictedData = onHotEncoderPred.transform(predictionData).toarray()

predictedData = myModel.predict(finalPredictedData[:,:])

predictionData[‘Predicted’] = predictedData

predictionData.to_csv(‘<%ThisStep.Data%>’)