Summary Results
Model results
A detailed summary of all the performance metrics of a particular model can be viewed by clicking on a specific model name.
Models can also be sorted and filtered based on multiple criteria. This can be useful in ranking models based on performance on a chosen metric. For example, models can be sorted by highest Average Recall in the testing dataset. Other ranking methods to evaluate and compare the models are to sort by Gini, or sort and then compare the differences between the whole dataset values versus the testing set. A small difference between the two indicates that the model is good at generalising to unseen data and thus indicative of better real-world performance in a live environment. On the other hand, a significantly lower test set Gini indicates that the model is over-fitting to the training set.
Filters can be applied to model types, for example, if only fuzzy logic models need to be displayed. Thresholds can also be applied, for example, to filter models that have a recall of minimum 70% etc. Models can also be filtered by the model names to display only a specific batch of models.
The summary for each binary model provides:
- Details on the dataset and its instances
- Overall precision and accuracy scores
- Confusion matrices for the training populations
- Accuracy of each population split by target value
- Gini and Kolmogorov Smirnov (KS) metrics
- Receiver Operating Characteristic (ROC) graphs
- Details of model build, and the job log
- The recall is the proportion of correctly classified instances for a given class.
- The precision is the proportion of correctly classified instances for each predicted class.
- The ROC curve is a graph of the true positive rate against the false-positive rate, with the area under the curve (AUC) being the key metric. A diagonal line indicates random selection.
- The Gini is a derived metric from the AUC score, where Gini = 2 * AUC - 1.
- The KS stat is the distance between the empirical and cumulative distribution functions of the underlying data.
- The display is different for a fuzzy logic model where the Gini metric and the ROC Curve are excluded.
Things to look out for:
- If the average recall is very different between the two classes, it is an indicator of an imbalanced model.
- In the ROC curve, If the green line (training data) is a lot closer to the top left corner than the red line (testing data), it indicates that your model is overfitting to the training data.
- If the metrics for the testing/whole set are much smaller than the equivalent metrics for the training, it is a model that is likely not to generalise well with live data.
The metrics in the detailed view of a regression model are different:
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- RMSE/Range and MAE/Range
- Receiver Error Characteristic (REC) curve instead of ROC.
The Actual vs Predicted tab will show the level of disparity between the predictions made by the model and the actual known outcomes.
In a continuous model, the output membership functions are displayed underneath the model results.
Moving the cursor over the linguistic label will display the exact boundaries in a hover tooltip.
Switching to the grid view displays each membership function in detail.
The summary for the multiclass model is similar to the binary models as shown below:
Features that were used in a model can be viewed by clicking on the Feature tab.
Clicking on a feature name will bring up a chart of the data within it. The chart can be viewed as grouped, stacked, or scaled.
Figure 1: Distribution of feature with respect to the target feature in a binary classification model.
Figure 2: Distribution of feature with respect to the target feature in a regression model.
Figure 3: Distribution of feature with respect to the target feature in a multiclass classification model.