Model Building
In this article
Create New Model
Models can be created from the "New Model" page under the "Intelligence Task" section.
Select which model types are desired, the number of models and the configurations.
You can choose which feature set to use for modelling from the drop-down menu under Feature Set. Otherwise, the model will be built using the default feature set for the project.
Existing models can be included as inputs, whereby the score of the other model counts as a new feature of the new model alongside all the features of the new model to create. Care must be taken to ensure that the features sets are compatible.
Logistic Regression
Perform Embedded Feature Selection: Performs a step-wise feature selection to select a subset of the feature set for building the model. This is enabled by default because LR models generally perform best with a small number of inputs.
Threshold Probability: Represents the probability to accept or reject a feature within the embedded feature selection.
Probability to Accept/Reject Features: Determines the rate at which features are accepted for modelling within the embedded feature selection. The acceptance rate must be lower than the rejection rate in order to reduce the feature space.
Click here for more information on step-wise regression
Neural Network
- Hyper Auto: The method of tuning and the hyper-parameters. It is set to automatic+comprehensive auto-tuning by default, but others can be selected for more precise tuning.
- Sampling: Decides whether the neural network is built on a subset of the data as opposed to the full portion.
- Boosting: This decides how the neural network model should be created. By default, it is a single model, but you can choose to build a set of models and ensemble them using either a genetic algorithm of AdaCost.
- Threshold: Binary neural network models produce a score between 0 and 1, and most commonly use 0.5 as the threshold value when predicting a class for each instance. However, the settings here allow the threshold value to be automatically adjusted to achieve a user-specified pass rate, the default rate on population passed, or accuracy between the two classes
- Hidden Node Properties/NN Properties: These adjust various hyper-parameters related to the learning and tuning process of the Neural Network
- Sampling Properties: These relate to how the data is split onto batches of instances for the modelling, please read here for further information.
- Boosting Properties: Disabled if the method used is not AdaCost
- Ensemble GA Properties: Disabled if the method used is not Genetic Algorithm Ensemble
For further information about the hyperparameters, please refer to the manual: Neural-Networks-manual.pdf
GLM
- GLM Type: Select the type of GLM to be generated. It allows us to select family function along with its link function.
- Solver: The solver helps in model optimization by coordinating parameter updates that attempt to improve the loss. IRLS (Iteratively Reweighted Least Squares) and Newton are the two solvers available to choose.
- Max Iterations: Number of training iterations to perform.
- Perform Embedded Feature Selection: Performs a step-wise feature selection to select a subset of the feature set for building the model. This is enabled by default because GLM models generally perform best with a small number of inputs.
- Probability of Accepting/Rejecting Feature: Determines the rate at which features are accepted for modelling within the embedded feature selection. The acceptance rate must be lower than the rejection rate in order to reduce the feature space.
- Remove Reference Category: For categorical variables, we can choose to remove the reference category for each variable which might be helpful to avoid numerical instability.
Please note that GLM models are only available for regression problems.
Fuzzy Logic Binary classifier
- No. Membership Function Per Dimension: Selects the number of membership functions to be generated from a continuous or mixed variable. The default is three - low, medium, and high. Increasing it to high will add very low and very high as possible antecedents.
- Weights for Minority Class: This will assign the importance the minority class should be given in predictability, increasing it above 1 will favour it more than normal, which may be useful if the dataset is extremely skewed, and/or there is a small sample size.
- Default Membership Function Shape: The trapezium membership function shape is selected by default, the alternative is triangular fuzzy membership functions.
- Rule filter threshold per 10,000 instances: This parameter tunes how "demanding" the system is to consider a rule valid. If set to 0, the system will consider rules with lower dominance to be valid. Increasing this value will make the system demand a slightly higher dominance to accept rules as potential members of the system. In any case, it is highly recommended to keep this value very low (in the order of magnitude of the default value).
- Type of dominance: In general, for standard dominance, it is useful to know that this figure increases when
- The frequency of the rule increases: it is, the more frequent the pattern is, the higher the dominance.
- The confidence of the rule increases: that is, the more frequently the pattern points to the right decision, the more we trust the rule, hence the higher the dominance.
When using soft-capped dominance, how much frequency and confidence contribute changes drastically: the importance of frequency is drastically reduced, hence the value of confidence becomes much more relevant. This impacts the final rule base in a couple of ways:
- Some rules that are very accurate but happen scarcely, will be given much more relevance/importance; so it is likely that some novel rules will show up as compared with the standard dominance approach.
- It will be easier for rules to reach high level of dominances, hence rule bases will, in general, contain more high-dominance rules.
The value "Support soft cap threshold" represents how many times a pattern has to happen to stop considering its frequency as a weighting factor in the dominance. Intuitively, it means that, as the pattern represented by a rule happens more times, the value of its dominance will increase; but as soon as it reaches the number specified in this parameter (20 in the default value), it is, as soon as it happens 20 times, then the dominance will no longer increase due to frequency. It will only grow due to an increase in confidence.
This approach will in general avoid relying in rules that just happens too frequently, giving much more importance to accuracy (i.e. how frequently the rule is correct).
- Rule learning: the type of algorithm that will choose the best rule base:
- All rules: this option is deprecated and should not be used. It will be removed in future versions.
- Genetic Algorithm: uses a standard single-objective GA.
- Multi-Objective GA: uses a 2-objective GA, to maximise performance and minimise the size of the rule base at the same time.
- Genetic Algorithm Parameters: These decide various factors regarding the GA optimisation process.
The fuzzy logic model configuration is slightly different for continuous output predictions.
Fuzzy Logic Estimator (Regression projects)
- Fitness Function Type: Selects the function the model should use for optimisation. Squared Error is the RMSE (Root Mean Square Error) and Relative Error is MAPE (Mean Average Percentage Error)
- Downscale highest MF: If the extremely high output membership function has long tails, it may be better to select this option to prevent outliers from skewing the fuzzy set centroid.
- Downscaling Type: Decide the method to apply downscaling, if checked. Self-distribution will scale it according to how the points are denser, which can reduce accuracy on the sparse region but will increase the performance of the model overall. Linear decreasing will scale the membership function down by a straight line from the lower to the upper thresholds.
Fuzzy Logic Multiclass classifier
The fuzzy logic model configuration is slightly different for continuous output predictions.
Genetic Algorithm Parameters: The GA Parameters are slightly different for multiclass output predictions. We use an extra penalty term to identify all classes roughly equally, regardless of their proportion or data split in the population which accounts for imbalance in the recall between classes.
There are two extra parameters for penalty in fuzzy configurations of multiclass projects:
- Lambda for Penalty: This is a new parameter that controls the importance of the penalty term.
- Exponent for Penalty: This parameter controls the importance of individual differences between classes.
XGBoost
XGBoost stands for e Xtreme Gradient Boosting. XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. It is a powerful opaque box model and it is available on the Temenos XAI platform for Regression, Multi Class Classification as well as binary classification models.
To run an XgBoost model, just click on New Model and then the XgBoost icon and Start a New Model
You can change the default configuration by clicking on the Configuration button and you can change many parameters to build the xgboost model. The majority of the model parameters are same for binary, multiclass and regression.
- Max Depth: Maximum depth of a tree. Complexity of model increases with the depth, with model prone to overfitting with increase in depth.
- Min Child Weight: Limits partitioning if sum of instance weights in a leaf node is less than the 'Min Child Weight'. The larger 'Min Child Weight' is, the more conservative the model will be.
- ETA: ETA shrinks the feature weights at each step to make the boosting process conservative like learning rate.
- Subsample: Denotes the fraction of observations to be randomly samples for each tree. Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
- Colsample by: Denotes the fraction of columns to be randomly samples for each tree.
- No of rounds: represents number of rounds of boosting.