Map > Data
Science > Explaining the Past
> Data Exploration > Univariate
Analysis > Missing Values |
|
|
|
|
|
Missing Values
|
|
|
Missing values are a common occurrence, and you
need to have a strategy for treating them. A missing value can signify a number of different things in your data. Perhaps
the data was not available or not applicable or the event did not happen. It could be that the person who entered the data did not know the right value, or
missed filling in. Data mining methods vary in the way they treat missing values. Typically, they ignore the missing values, or
exclude any records containing missing values, or replace missing values with the
mean, or infer missing values from existing values. |
|
|
|
|
|
Missing Values Replacement Policies: |
|
|
- Ignore the records with missing values.
- Replace them with a global constant (e.g., “?”).
- Fill in missing values manually based on your domain knowledge.
- Replace them with the variable mean
(if numerical) or the most frequent value (if categorical).
- Use modeling
techniques such as nearest neighbors, Bayes’ rule, decision tree, or EM algorithm.
|
|
|
|
|
|
Exercise
|
|
|
|
|
|