Google

Time Management

"PHD Life is all about self-motivation .. treat it like a day job. Set strick working hours and study activities, and if you don't complete them in the time alloted then do as you would as a good employee - work overtime" - Duggi Zuram
Showing posts with label Computational Statistics. Show all posts
Showing posts with label Computational Statistics. Show all posts

Wednesday, May 30, 2007

Continuous Distribution

EXPONENTIAL

Used to model the amount of time until a specific event occurs or to model the time between the independent events. Example:
    • the time until the computer locks up
    • the time between arrivals of telephone calls
    • the time until a part fails
MATLAB: expocdf(x,1/λ) where mean E(X)=1/λ.

GAMMA f(x;λ,t)

when t is a positive integer, the gamma distribution can be used to model the amount of time one has to waith until the t events has occurred.

MATLAB: gampdf(x,t,1/λ).

CHI-SQUARE

A gamma distribution where λ=0.5 and t= v/2 where v is a positive integer, is called a chi-square distribution with v degree of freedom. Chi-square distribution is used to derived the distribution of the sample variance and is important for goodness-fit-test in statistical analysis.

WEIBULL

Closely related to Exponential.
Apply for problems of reliability and life testing.
Used to model the distribution of time it takesa for objects to fail.

BETA

Can be used to model ran var that takes on value over a bounded interval from 0 to 1.

MULTIVARIATE NORMAL

Tuesday, May 29, 2007

Normal Distribution [Miller 1985 n Martinez 2001]

History:
  1. Known as Gaussian Distribution.
  2. Studied first in 18th century when scientists observed an astonishing degree of regularity in errors of measurement.
  3. The error distributions observed were approximated by distribution called ' normal curve of errors' (Bell shape) produced by the normal distrbution Eqn. that determined by the expected value and variance for normal distribution.
Properties:
  1. PDF aproaches zero as x approaches + or - inf
  2. centered at mean μ and max value occur at x=μ
  3. PDF for normal distribution is symmetric about mean μ
MATLAB Command:
  1. normcdf(x,mu,sigma)
  2. normpdf(x.mu,sigma)
  3. normspec(specs, mu, sigma)
MATLAB Example:
%Set up parameter for normal distb.
mu = 5;
sigma = 2;
%Set up upper and lower limit specs
specs = [2, 8]prob = normspec(specs, mu, sigma);

Equations:

GAUSSIAN ( NORMAL ) DISTRIBUTION ( PDF ), MEAN, AND VARIANCE



STANDARDIZED RAN VAR, STANDARD NORMAL DISTRIBUTION ( CDF )

Definition: Standard Nornal Distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1

GAUSSIAN APPROXIMATION TO THE BINOMIAL DISTRIBUTION

Use to approximate the binomial distribution when n is large but but is close to 0.5, not small enough to use Poisson Approximation.
Rule of thumb: use the normal approximation to the binomial distribution only when np and (1-np) are both greater than 5.

Theorem: (State without proof) If x is a value of random variable having the binomial distribution with the parameters n and p and if
then the limiting form of the distribution function of this standardized random variables as n --> inf is given by

Friday, May 25, 2007

Uniform Distribution [Miller 1985] MATLAB [Martinez 2001]

Uniform distriobution for Continuous random variables. Random variables are uniformly distributed over the interval (a,b).



















EXAMPLE MATLAB

Thursday, May 24, 2007

Poisson distribution [Miller n Freund] Matlab [Martinez]

Poisson Distribution is approximation for Binomial Distribution when n --> inf and p --> 0, smalll ( so np is moderate).
Derived from Binomial Dist Eqn by sub var p with λ/n .

where λ= np
Expected value E[X] = λ and variance V(X) = λ (replace np=λ p-->0)

Example:

5% of of bounded book at certain bindery centre have defective. Find the probability that 2 of 100 books bounded by this bindery centre is defective:












Poisson Process:

Extending the uses of above formula for process taking place over continuous interval of time. i.e: Events occur at points in time or space

To find the probability of x success during a time interval of length T, we devided the interval T into n equal parts of length t, with the probability of success p = α t. α is the average (mean) of successes per unit time.

Assumption:
  1. The probability of a success during a very small interval, t, is given by p = α t.
  2. The probability of > one success during such a small time interval ∆t is negligible.
  3. The probability of a success during such a time interval does not depend on what happened prior to that time.

The formula for Poisson distribution can be futher expand by expading the parameter λ.
λ = n.p = (T/t) *(αt) = αT

Note: However most of the time we use symbol λ to represent α .

Example:

Bank receives average λ= 6 bad checks per day, what are the probabilty that it will receive:
a: 4 bad checks on any given day.

    • f(x;λT) = f(4;6(1))=0.135
    • MATLAB: prob = poisspdf(4,6) = 0.1339
    • or prob = poisscdf(4,6)-poisscdf(3,6) = 0.1339

b: at most 10 bad checks on any two consecutive day.

    • f(x;λT) = f(x;6(2))=f(0;12) + f(1; 12)+ ...+ f(10;12)= 0.347
    • MATLAB: prob = sum(poisspdf(0:10,12) = 0.3472
    • or prob = poisscdf(10,12) = 0.3472

Wednesday, May 23, 2007

Binomial Distribution [Miller and Freund] Matlab [ Martinez 2001]

Repeated trials with getting x successes in n trials i.e, x successes and n-x failures in n attempts.
Assumption involves:
  1. Only 2 possible outcomes for each trial: success and failure
  2. The probability of success is the same for each trial
  3. There are n trials, where n is constant
  4. The n trial are independent
Trials satisfying this assumptions are reffered to as Bernoulli random variables.

for x = 0, 1, 2, ... , n

Expected Value (mean) E[X]= np ; Variance V(x) = np(1-p)
discriptions:

n : trials
x : successes
n - x : failures
p : probability of success
1-p : probability of failure
| n|
| x| : called binomial coefficient , which is the no of combination of x objects selected from a set of n object = n! / (r!(n-r)!)

MATLAB EXAMPLE on Binomial distribution using both probability mass function and cummulative distribution function.

Tuesday, May 22, 2007

Computational Statistics [Martinez 2001]{Miller] [Mario F. Triola 2000]

2.2 : PROBABILITY

Random Experiment: process or action whose outcome can't be predicted w certainty and would likely change when d exp is repeated or function defined over the elements of sample space (Miller et al.)

Random variable, X
: outcome from random experiment.
x is the observed value of a random variable X
discrete ran var - can take value from a finite or countably infinite
continuous ran var - can take values from an interval of real number

Sample space, S
: set of all outcomes from exp. ex: 6-sided dice : sample space [ 1 2 3 4 5 6 ]
ANXIOM 2: P(S) = 1

Event, E
: subset of outcomes in the sample space
AXIOM 1 : Probability event E mest be between 0 and 1 0 P(E) 1

Mutually exclusive events:
two events that can't occur simaltaneously or jointly. can be extended to any num of events as long as all pairs of events is considered
AXIOM 3 : Mutually exclusive events E1, E2, ... EK
P(E
1 U E2 U...U EK)= i=1 to K P(Ei)

Probability :
Measure of the likelihood that some event will occur

Probability distribution
f(x): describes the probabilities associated w each possible value for the random variables

Cummulative distribution function, cdf,
F(x): probabilty that the ran var X assumes a value
less than or equal to a given x. F(x) take value from zero to one

Probability density function
: Probability distribution for continuous random variables
f(x) = P(a X b) = integrat'n from a to b: total area under the curve = 1
associated cdf : F(x) = P ( X x ) = (integration from -inf to x)

Probability mass function:
Probabilty distribution of discrete random variables
f(xi) = P (X = xi) ; i = 1,2, ... ,
associated cdf: F(a) = xi a f(xi)

Equal likelihood model:
Experiment where each of n outcomes is equally likely,
and assign a probability mass of 1/n

Relative frequency method:
conduct the experiment n times and record d outcomes
probability is assigned by P(E) = f/n

2.3 : CONDITIONAL PROBABILITY AND INDEPENDENCE

Conditional Probability: event E given event F -->

P(E|F) = P(E F)/P(F) ; P(F) > 0
P(E F) = P(F) P(E|F) or P(E) P(F|E)

Independent Events:
P(E F) = P(E) P(F) or P(F) P(E)
P(E) = P(E F)/P(F)= P(E|F)
Therefore P(E) = P(E|F) or P(F) = P(F|E)

if extended to k events

P(E1 E2 ... EK) = ∏ i=1 to K P(Ei)

Bayes Theorem: initial probability is called prior probability.
new info is used to update prior probability to obtained posterior probability.

P(Er|F) = event Er given event F or 'effect' F was 'caused' by the event Er

P(Er|F) = P(Er ∩ F) / P(F) = P(Er) P(F|Er) /P(F)
P (F) = P(E1 ∩ F) + P(E2 ∩ F) +... +P(EK ∩F)
= P(E1) P(F|E1) + ... + P(EK) P(EK|F)
therefore

P(Er|F) = P(Er) P(F|Er) / P(E1) P(F|E1) + ... + P(EK) P(EK|F)

Example:

Tom services 60% of all cars and fail to wash the windshiled 1/10 time.
George services 15% of all cars and fail to wash the windshiled 1/10 time.
Jack services 20% of all cars and fail to wash the windshiled 1/ 20 time.
Peter services 5% of all cars and fail to wash the windshiled 1/20 time.
If customer complains later that her windshield ws not washed,
What is the probability that her car was serviced by jack?


P(Er|F)=P(EF)/P(F)
P(Er|F)=P(Er)P(F|Er)/[P(E1)P(F|E1)+P(E2)P(F|E2)+...+P(E4)P(F|E4)]
P(Er|F)=(0.2)(1/20)/[(0.6)(1/10)+(0.15)(1/10)+(0.2)(1/20)+(0.05)(1/20)]
P(Er|F)=0.114

therefore the probability that the windshield not washed (effect F) caused by Jack (event Er) is 0.114. This shows that even Jack only fail 1 windshield in 20 cars, 11% of windsheild failures are his responsibility.


2.4: EXPECTATION

Mathematical expectations have been playing an increasingly important role in scientific decision making, as it generally considered rational to select which ever alternative has the most promising mathematical expectation.

Mean provides a measure of central tendency of the distribution.
  • Mean of n measurements : Arithmetic mean (data treatment)
    • μ = ∑x / n
  • Mean of group data ( frequency)
    • μ = ∑(f x) / N
  • Mean of probability distribution
    • Mean or expected value of random variable defined using pdf or pmf.
    • Expected value is sum of all possible values of the ran var where each one is weighted by the probability that X will take on a value. i.e: the probability of obtaining a1,a2,...,ai is p1,p2,...,pi
    • μ = E[X] = a1p1 + a2p2 + ... + aipi = (xi f(xi))
Variance is a measureb of dispersion in the distribution ( how much a single random variable varies). Large variance means that the observed value is most likely to be far from mean μ.
  • Variance of n observation
    • V(X) = (x-μ)2 /(n-1) = [nx2 -(∑x)2]/n(n-1)
  • Variance of group data (frequency)
    • V(X) =[nx2f -(∑xf)2]/n(n-1)
  • Variance of probability distribution
    • Variance it the sum of the squared distances
    • each one weighted by the probability that X = x.
    • V(X) = E[(X-μ)2] = (x-μ)2 f(x)
    • V(X) = E[X2]-μ2 = E[X2]-(E[X])2
2.5:COMMON DISTRIBUTION

Discrete distribution: Binomial, Poisson

Continuous distribution: uniform, Normal, Exponential, Gamma. Chi-square, Weibull, beta, Multivariate Normal