Audit

After you upload your dataset, you must review any warnings that have been displayed before you are allowed to start with modelling.

The first blocking warnings are those that deal with preventing PII (Personally Identifiable Information) from getting onto the platform. This is detected when the number of unique categorical values exceeds either 200 distinct values or 70% of the number of records in the dataset (whichever is lower).

After resolving those, you can choose to exclude or include the features in question in the Audit page.

There are several reasons why the exclusion of a feature will be suggested or enforced:

  • Too many distinct categorical values: If there are over 200 distinct values then the user is asked to exclude or retain the feature, unless there are over 10000 distinct values in which case the feature may not be retained.
  • Features only having one unique value provides no discriminatory or predictive power and are excluded.
  • If a group of features are highly correlated with each other, i.e. exceeding the correlation threshold as defined in the encoding section, then the user needs to select which one to retain.
  • There are also warnings for misleading values, such as numbers in scientific notation such as 6.2e+8. These are automatically retained.

The "Auto-Fix All" button will resolve all the audit warnings by excluding features that throw errors by default and selecting the first element from each group of correlated features.

Still need help? Contact Us Contact Us