Map > Data Mining > Explaining the Past > Data Exploration > Bivariate Analysis > Categorical & Numerical  
Bivariate Analysis  Categorical & Numerical 

Line Chart with Error Bars 

A line chart with error bars displays information as a series of data points connected by straight line segments. Each data point is average of the numerical data for the corresponding category of the categorical variable with error bar showing standard error. It is a way to summarize how pieces of information are related and how they vary depending on one another (iris_linechart.xlsx).  


Combination Chart 

A combination chart uses two or more chart types to emphasize that the chart contains different kinds of information. Here, we use a bar chart to show the distribution of a binned numerical variable and a line chart to show the percentage of the selected category from the categorical variable. The combination chart is the best visualization method to demonstrate the predictability power of a predictor (Xaxis) against a target (Yaxis).  


Ztest and ttest 

Ztest and ttest are basically the same. They assess whether the averages of two groups are statistically different from each other. This analysis is appropriate for comparing the averages of a numerical variable for two categories of a categorical variable.  


If the probability of Z is small, the difference between two averages is more significant.  
ttest 

When the n_{1} or n_{2} is less than 30 we use the ttest instead of the Ztest.  


Example:  
Is there a significant difference between the means (averages) of the numerical variable (Temperature) in two different categories of the categorical variable (ORing Failure)?  






The low probability (0.0156) means that the difference between the average temperature for failed ORing and the average temperature for intact ORing is significant.  
Analysis of Variance (ANOVA) 

The ANOVA test assesses whether the averages of more than two groups are statistically different from each other. This analysis is appropriate for comparing the averages of a numerical variable for more than two categories of a categorical variable.  




Example:  
Is there a significant difference between the averages of the numerical variable (Humidity) in the three categories of the categorical variable (Outlook)? 









There is no significant difference between the averages of Humidity in the three categories of Outlook.  

