Tuesday, April 16, 2013

Star Plot

 
 
A star plot is a graphic representation of the relative behavior of all variables in a multivariate data set. They are used to examine the relative values for a single data point and to locate similar or dissimilar points.
 
The image above is an example star plot from NASA, with some of the most desirable design results represented in the center.

Correlation Matrix

 
A correlation matrix describes correlation among sets of variables. Correlation Matrices list the variable names down the first column and across the first row. The diagonal of a correlation matrix (the numbers that go from the upper left to the lower right) always consist of ones because these are the correlations between each variable and itself (and a variable is always perfectly correlated with itself).
  
The above image is a correlation matrix showing the correlation between the cluster moments for electrons and pions.

Similarity Matrix

 
A similarity matrix is a matrix of values which show the similarities between two data points. The more discolored the data is, the greater the difference is between the two data points. Similarity matrices are heavily related to their counterparts, distance matrices and substitution matrices.
 
The image above is a similarity matrix that shows the analysis of  a group of human genes involved in colon and colorectal cancer.

Stem and Leaf Plot

 
 
A stem & leaf plot is a device for presenting quantitative data in a graphical format. A basic stem plot contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves. Stem and leaf plots show the frequency with which certain classes of values occur. These values in turn can be used to create a histogram.
 
The image above depicts a stem-and-leaf plot displaying that data on new housing starts in the USA.
 
 
 

Box Plot

 
Box plots, used in descriptive statistics, are a convenient way of depicting data through their five-number summaries: the median, the variability of the data around the median, the skew of the data, the range of the data, and the size of the data set. The plot may be drawn either vertically or horizontally and may also indicate which observations, if any, might be considered outliers. The spacings between the different parts of the box help indicate the degree of dispersion and skewness in the data, and identify outliers.
 
The image above portrays what a typical version of a box plot looks like.

Histogram

 
 
The histogram is used for variables whose values are numerical and measured on an interval scale. It is generally used when dealing with large data sets.A histogram is a table that displays info as adjacent rectangles. Each rectangle is shown over an interval, with an area equal to the frequency of the interval. The height is equal to the frequency density of that interval.
 
The above image is a histogram that maps the range and variability of slope strikes within a grid model.  This typically corresponds to the directions of ridges and valleys.  In this example, there is a slight trend at zero degrees (North).
 
 

Parallel Coordinate Graph

 
 
Parallel Coordinate Graph are maps that provide a way of visualizing high-dimensional geometry and analyzing large sets of multivariate data. One line in the parallel will connect a series of variables of different values.
 
The above image is a Parallel Coordinate Graph that displays a 3D parallel coordinate view of all cells and nine selected genes of the Drosophila species. Using this 3D visualization, spatial and gene expression information are clearly separated while the basic character of spatial gene expression patterns is preserved in one dimension. Besides spatial information we also support display of gene expression information along the third dimension.