Like adding steps for Data Source, Join, union etc., you need to drag and drop the Data Science Engine step from the left pane in the transformation area and create necessary links.
The Data Science Engine step takes maximum 2 inputs. This step helps you to transmit your data to Data Science Engines to perform machine learning and modelling. You can add Data Science Engine step before or after adding any other transformation step.
You can add Data Science step before performing functions like join, union, formula fields etc. so as you can perform different functions on processed data to further prepare it.
Figure 3: Adding Data Science Engine step before transforming with other steps
Or you can add the step in between or after preparing your data if you want to make predictions on your transformed data.
Figure 4: Adding Data Science Engine step after transforming the data with other steps
You can use Data Science Engine step to do training on your historical data to bring out predictions for future tuples. You can input Training as well as Prediction data based on the below conditions.
- If you have separate data to train and predict you need to add data for training as well as prediction.
- If you want training and prediction on the same data, only one data source can be added.
- If you already have a trained model in your script, you need not add training data.
Figure 5: Adding Data Science Engine step having two inputs, one for training and another for prediction
Training and prediction data consist of Independent and dependent variables. Independent variables are the fields in the data source that help in doing predictions i.e. determining the values of dependent variables. For instance, if you want to predict your sales for next year, then sales being a dependent variable will depend on independent variable fields like marketing expenditure, support expenditure, talent acquisition expenditure and more. Hence, you need to make sure you provide adequate information in your data.