Encoding
After the dataset has been uploaded and processed, you may fix the encoding settings. Please note: this configuration cannot be altered once any models have been created.
Logistic Regression
- Split categorical columns: If checked, dummy variables will be generated for modelling from categorical features. If unchecked, the strategy selected under the Encoding Strategy option will be used for modelling.
- Use continuous bins if they are configured: If binnings have been created for some continuous features then these binnings may be used in place of the raw feature.
- Encoding strategy: This selects the way features will be internally encoded into numbers:
- Weight of Evidence Based: This is a logarithmic measure of the separation between classes. For more information on weight of evidence read here.
- Frequency Based: Encodes categorical data according to how frequent the different values are.
Neural Network
The squeeze technique is a method that helps the model handle long-tailed distributions better by contracting the extreme values. The rest of the available options are the same as those presented in Logistic Regression.
Fuzzy Logic
- Default Membership Function Type: This option establishes how fuzzy sets are generated from the data for continuous features:
- Equally Spaced: The feature range is divided into equally sized ranges to define the different regions for the fuzzy sets.
- Distribution based: The fuzzy set boundaries are derived using the distribution of the underlying feature.
You must click ' Save Encoding Options' for your changes to be applied.
If models have been built the options will be greyed out: