Map > Data Science > Explaining the Past > Data Exploration > Univariate Analysis > Missing Values
 

Missing Values

Missing values are a common occurrence, and you need to have a strategy for treating them. A missing value can signify a number of different things in your data. Perhaps the data was not available or not applicable or the event did not happen. It could be that the person who entered the data did not know the right value, or missed filling in. Data mining methods vary in the way they treat missing values. Typically, they ignore the missing values, or exclude any records containing missing values, or replace missing values with the mean, or infer missing values from existing values.

Missing Values Replacement Policies:
  • Ignore the records with missing values.
  • Replace them with a global constant (e.g., “?”).
  • Fill in missing values manually based on your domain knowledge.
  • Replace them with the variable mean (if numerical) or the most frequent value (if categorical).
  • Use modeling techniques such as nearest neighbors, Bayes’ rule, decision tree, or EM algorithm.
 
Exercise