Wednesday, August 25, 2010

statistics analysis in Matlab

1) Mode:
The mode of a data sample is the element that occurs most often in the collection.
x=[1 2 3 3 3 4 4]
mode(x) % return 3, happen most
2)Median:
a median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one
median(x) % return 3
3)  mean
Total sum divided by number of values
mean(x) % return 2.8571


4)  quartile
first quartile (25th percentile)
second quartile (50th percentile)
third quartile (75th percentile)
kth percentile
prctile(x, 25) % 25th percentile, return 2.25
prctile(x, 50) % 50th percentile, return 3, i.e. median

5) skewness
Skewness is a measure of the asymmetry of the data around the sample mean. If skewness is negative, the data are spread out more to the left of the mean than to the right. If skewness is positive, the data are spread out more to the right.
skewness(x) % return-0.5954
6)kurtosis
Kurtosis is a measure of how outlier-prone a distribution is.
kurtosis(x) % return2.3594
7)  Standard deviation
the square root of an unbiased estimator of the variance of the population from which X is drawn, as long as X consists of independent, identically distributed samples.
std(x) %return 1.0690
8) Variance
describes how far values lie from the mean
var(x)  %return 1.1429
9)  moment
 a quantitative measure of the shape of a set of points.
moment(x, 2); %return second moment
10. covariance
measure of how much two variables change together
y2=[1 3 4 5 6 7 8]
cov(x,y2) %return 2*2 matrix, diagonal represents variance
11.Linear Regression:
  modeling the relationship between a scalar variable y and one or more variables denoted X. In linear regression, models of the unknown parameters are estimated from the data using linear functions.
polyfit( x,y2,1) %return 2.1667   -1.3333, i.e 2.1667x-1.3333
12. One sample t-test
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported.
[h,p,ci] = ttest(y2,0)% return  1    0.0018 ci =2.6280    7.0863


This time the test rejects the null hypothesis at the default α = 0.05 significance level. The p value has fallen below α = 0.05 and the 95% confidence interval on the mean does not contain 0.
Because the p value of the sample y is greater than 0.01, the test will fail to reject the null hypothesis when the significance level is lowered to α = 0.01:

No comments:

Post a Comment