Encoding

After the dataset has been uploaded and processed, you may fix the encoding settings. Please note: this configuration cannot be altered once any models have been created.

Logistic Regression

Split categorical columns: If checked, dummy variables will be generated for modelling from categorical features. If unchecked, the strategy selected under the Encoding Strategy option will be used for modelling.
Use continuous bins if they are configured: If binnings have been created for some continuous features then these binnings may be used in place of the raw feature.
Encoding strategy: This selects the way features will be internally encoded into numbers:
- Weight of Evidence Based: This is a logarithmic measure of the separation between classes. For more information on weight of evidence read here.
- Frequency Based: Encodes categorical data according to how frequent the different values are.

Neural Network

The squeeze technique is a method that helps the model handle long-tailed distributions better by contracting the extreme values. The rest of the available options are the same as those presented in Logistic Regression.

Fuzzy Logic

Default Membership Function Type: This option establishes how fuzzy sets are generated from the data for continuous features:
- Equally Spaced: The feature range is divided into equally sized ranges to define the different regions for the fuzzy sets.
- Distribution based: The fuzzy set boundaries are derived using the distribution of the underlying feature.

You must click ' Save Encoding Options' for your changes to be applied.

If models have been built the options will be greyed out: