Intellicus provides Data Science engine step while transformation of data and Predictive Analytics at Report level. In this section we will be mainly discussing on how you add Data Science Engine step at Query Object and what are its benefits.
You must have connection(s) to the database(s) to extract data for transformation and Data Science step.
To start transforming data, login to Intellicus – Navigate – Design – Query Object
Query object is the step where you extract your data from different databases and transform it to load and/or to use in reporting. You can learn more about working with a Query Object here – “WorkingwithQueryObjects”
Adding Data Science engine step at Query Object level helps you when predictions on your data are adding new variables and columns in tables. For example, in a market basket analysis, the clusters that would form may require new columns & variables in the table. This can be achieved while data preparation and hence such algorithms need to be defined at Query Object level.
You can also perform Data Cleansing and other Data Science engine related transformation tasks by creating script at Query Object level.
Data Science engines train on your data to bring out predictions. You can input Training as well as Prediction data based on the below conditions.
- If you have separate data to train and predict you need to add data for training as well as prediction.
- If you want training and prediction on the same data, only one data source can be added.
- If you already have a trained model in your script, you need not add training data.
Figure 3: Adding Data
Training and prediction data consist of Independent and dependent variables. Independent variables are the fields in the data source that help in doing predictions i.e. determining the values of dependent variables. For instance, if you want to predict your sales for next year, then sales being a dependent variable will depend on independent variable fields like marketing expenditure, support expenditure, talent acquisition etc. to be predicted. Hence, you need to make sure you provide adequate information in your data.
Adding Data Science Engine step
Like adding steps for Data Source, Join, union etc., you need to drag and drop the Data Science Engine step from the left pane in the transformation area and create necessary links.
The Data Science Engine step takes 2 inputs. This step helps you to transmit your data to Data Science Engines to perform machine learning and modelling. You can add Data Science Engine step before or after adding any other transformation step.
You can add data science step before performing functions like join, union, formula fields etc. so as you can perform different functions on predicted data to further prepare it.
Figure 4: Adding Data Science Engine step before transforming with other steps
Or you can add the step in between or after preparing your data if you want to make predictions on your transformed data.
Figure 5: Adding Data Science Engine step after transforming the data with other steps
Adding Data Science script
Select the Data Science Engine step from the Query Object transformation area, you will see the following fields-
Figure 6: Data Science Engine properties
This inbuilt editor gives you the flexibility to write your own Data Science scripts inside Intellicus. This helps you save time and gives you option to verify your script by running it on a small set of data so as you can detect errors and correct them.
Data Science Engine Properties
|Data Source Engine||R Job||Here you need to select the Data Science Engine you want to use|
|Script||Sample Script||Here you can see the Data Science Engine script you have created|
|Edit||Type Yourself||Click the Edit button to create Data Science Engine script or edit an already created script.
When you click the Edit button, the script editor box will open.
Here you can view the fields in your script and write R script for relevant fields. You can also verify your script to check if it is error-free.
Guidelines for writing script at Query Object level
You can write your scripts inside Intellicus’ environment for Data Science engines. There are few guidelines that you need to follow while creating scripts.
The guidelines are laid down so as Intellicus can understand and process your script and transmit it to Data Science engine for predictions.
Intellicus suggests creating R script in modular fashion at Query Object level that will help you to get options like Training only, Training and Prediction, and Prediction only at the time of report execution as Machine Learning Operations Toolbar. Training can then be scheduled and will save time if an end-user just wants to view the predictions.
For instance, if you schedule the training let’s say in late hours, and next day a user wants to view prediction based on the trained model, he/she just needs to select predict only parameter while creating reports.
There are different sets of guidelines for writing scripts while creating script at Query Object level and while using Predictive Analytics at Report level.
- The script needs to have sections for Training and Prediction. These sections should start with #. These place holders should be surrounded by <%%> for Intellicus to be able to parse and understand the modularization. For e.g., #<% TRAINING.SECTION %>
- The first line of the Training and Prediction script should be for reading the CSV and the last line of Prediction script should be for writing. Argument passed in the reading section should be <% Stepname.data %> For e.g., Read.csv(‘<% Train.data %>’)
- Previous step data should be referred as ‘StepName.data.’ For e.g., in the transformation area if you created the step as Train, the input must be ‘Train.data.’
- The model created is by default saved as ‘myModel.’ This is a mandatory name to the model you create as it is referred to while communicating with Data Science engines.
- The training will only happen if the training script is provided, otherwise it will be assumed that a trained model is used.
- If a trained model is used, it is mandatory for user to provide a prediction script.
Once you have added a script, you can Verify if it is correctly written and click on ok. Save or Save As your query object to use it in reporting.
An example script for your reference –
trainingDataset = read.csv(‘<%Train.Data%>’)
myModel = randomForest(x = trainingDataset[1:15], y = trainingDataset$TEMP,ntree = 500)
predictionDataset = read.csv(‘<%Predict.Data%>’)
y_pred = predict(myModel,data.frame(predictionDataset[1:15]))
predictionDataset$ExpectedTemp <- y_pred
write.csv(predictionDataset , file='<%Predict.Data%>’)