Encoding

After the dataset has been uploaded and processed, you may fix the encoding settings. Please note: this configuration cannot be altered once any models have been created.

Logistic Regression

  • Split categorical columns: If checked, dummy variables will be generated for modelling from categorical features. If unchecked, the strategy selected under the Encoding Strategy option will be used for modelling.
  • Use continuous bins if they are configured: If binnings have been created for some continuous features then these binnings may be used in place of the raw feature.
  • Encoding strategy: This selects the way features will be internally encoded into numbers: 
    • Weight of Evidence Based: This is a logarithmic measure of the separation between classes. For more information on weight of evidence read here.
    • Frequency Based: Encodes categorical data according to how frequent the different values are.

Neural Network

The squeeze technique is a method that helps the model handle long-tailed distributions better by contracting the extreme values. The rest of the available options are the same as those presented in Logistic Regression.

Fuzzy Logic

  • Default Membership Function Type: This option establishes how fuzzy sets are generated from the data for continuous features:
    • Equally Spaced: The feature range is divided into equally sized ranges to define the different regions for the fuzzy sets.
    • Distribution based: The fuzzy set boundaries are derived using the distribution of the underlying feature.

You must click ' Save Encoding Options' for your changes to be applied. 

If models have been built the options will be greyed out:

Still need help? Contact Us Contact Us