Inferential Statistics

Inferential statistics is a branch of statistics that deals with making predictions or inferences about a population based on a sample of data taken from the population. In this section, I'll discuss probability distributions, sampling distributions, estimation, and hypothesis testing.

Probability Distributions

Probability distributions describe the likelihood of different outcomes in an experiment or event. The most commonly used ones in inferential statistics are the normal distribution, the binomial distribution, and the Poisson distribution.

Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. The formula for the normal distribution is:

\(f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)

where \(f(x)\) is the probability density function, \(\mu\) is the mean, \(\sigma\) is the standard deviation, and \(x\) is the random variable.

Binomial Distribution

The binomial distribution is a discrete probability distribution of the number of successes in a sequence of n independent experiments. The formula for the binomial distribution is:

\(P(X=k) = {n \choose k}p^k(1-p)^{n-k}\)

where \(P(X=k)\) is the probability of \(k\) successes in \(n\) trials, \(p\) is the probability of success, and \({n \choose k}\) is the binomial coefficient.

Poisson Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space. The formula for the Poisson distribution is:

\(P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}\)

where \(P(X=k)\) is the probability of \(k\) events occurring in the interval, \(\lambda\) is the rate parameter, and \(k!\) is the factorial of \(k\).

Sampling Distributions

Sampling distributions describe the probability of different outcomes from a sample of data. The central limit theorem states that the sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution.

Point Estimation

Point estimation involves using a single value to estimate the population parameter. The most commonly used point estimator is the sample mean, which is calculated using the formula:

\(\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i\)

where \(\bar{x}\) is the sample mean, \(n\) is the sample size, and \(x_i\) are the individual values in the sample.

Interval Estimation

Interval estimation involves using a range of values to estimate the population parameter. The most commonly used interval estimator is the confidence interval, which is calculated using the formula:

\((\bar{x} - z_{\alpha/2}\frac{s}{\sqrt{n}}, \bar{x} + z_{\alpha/2}\frac{s}{\sqrt{n}})\)

where \(\bar{x}\) is the sample mean, \(s\) is the sample standard deviation, \(n\) is the sample size, and \(z_{\alpha/2}\) is the critical value of the standard normal distribution for a given level of confidence \(\alpha\).

Hypothesis Testing

Hypothesis testing involves testing a hypothesis about a population parameter. The null hypothesis is the hypothesis that there is no difference between the sample and the population, while the alternative hypothesis is the hypothesis that there is a difference. The p-value is the probability of obtaining a sample statistic as extreme as the one observed, assuming that the null hypothesis is true.

The most commonly used hypothesis test is the t-test, which is used to test the difference between two means. The formula for the t-test is:

\(t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\)

where \(\bar{x}_1\) and \(\bar{x}_2\) are the sample means, \(s_p\) is the pooled standard deviation, \(n_1\) and \(n_2\) are the sample sizes, and \(t\) is the t-statistic.

References

Casella, G., & Berger, R. L. (2002). Statistical Inference. Duxbury Press.
Eid, M., Gollwitzer, M., & Schmitt, M. (2017). Statistik und Forschungsmethoden: Lehrbuch. Mit Online-Material [Statistics and research methods: Textbook. With online material] (5th ed.). Beltz.
Navidi, W. (2014). Statistics for Engineers and Scientists. McGraw-Hill Education.
Newbold, P., Carlson, W., & Thorne, B. (2019). Statistics for Business and Economics. Pearson.

inferential indices

by selvastics