Statistics

STATISTICS (JEE and CBSE)

Measures of Central Tendency:

  • Mean: Average of a set of numbers.

  • Median: Middle value of a set of numbers when arranged in ascending order.

  • Mode: Most frequently occurring value in a set of numbers.

Measures of Dispersion:

  • Range: Difference between the largest and smallest values in a set of numbers.

  • Variance: Average of the squared differences between each number and the mean.

  • Standard deviation: Square root of the variance.

Probability:

  • Sample space: Set of all possible outcomes of an experiment.

  • Event: A subset of the sample space.

  • Independent events: Events whose occurrences do not affect each other.

  • Conditional probability: Probability of an event occurring given that another event has already occurred.

  • Bayes’ theorem: Formula for calculating the conditional probability of an event based on its prior probability and the probabilities of its possible causes.

Random Variables:

  • Discrete random variable: Random variable that can take only a finite or countable number of values.

  • Continuous random variable: Random variable that can take any value within a specified range.

  • Probability distribution: Function that gives the probability of each possible value of a random variable.

  • Binomial distribution: Probability distribution of the number of successes in a sequence of independent experiments, each of which has a constant probability of success.

  • Poisson distribution: Probability distribution of the number of events occurring in a fixed interval of time or space if these events occur with a known average rate and independently of the time since the last event.

  • Normal distribution: Also known as a Gaussian distribution, a probability distribution characterized by a bell-shaped curve.

Sampling:

  • Sampling methods: Techniques used to select a sample from a population.
  • Simple random sampling: Each member of the population has an equal chance of being selected.

  • Stratified random sampling: The population is divided into strata, and then a simple random sample is taken from each stratum.

  • Systematic sampling: Members of the population are selected at regular intervals.

Cluster sampling: Groups of elements close together are sampled for greater cost efficiency when elements are widely spread within the sampling frame to be effective.

Hypothesis Testing:

  • Null hypothesis: Statement that there is no significant difference between two groups or populations.

  • Alternative hypothesis: Statement that there is a significant difference between two groups or populations.

  • Type I errors: Error of rejecting a true null hypothesis

  • Type II errors: Error of accepting a false null hypothesis

  • Level of significance: Maximum probability of committing a Type I error.

  • Test statistics: A measure used to determine whether the sample data support the rejection of a null hypothesis.

  • p-value: Probability of obtaining a test statistic as extreme as, or more extreme than, the observed test statistic, assuming that the null hypothesis is true.

  • Chi-square test: A statistical test used to determine whether the observed frequencies of events in one or more categories differ significantly from the expected frequencies.

t-test: Test statistic used to determine whether the means of two independent groups are significantly different from each other.

F-test: Statistical test used to compare the variances of two or more independent groups *

**One-way ANOVA:**  A statistical test that compares the means of three or more independent groups with more than two data points to establish differences between their respective sample means and assess whether all population means may be equal.

Linear Regression:

  • Least squares method: Used for finding a line that best fits a set of data points

  • Regression line: Line that best fits a set of data points based on linear regression.

  • Correlation coefficient: Measures the strength and direction of a linear relationship between two variables.

  • Coefficient of determination: Coefficient of Determination is another measure used to indicate the goodness of fit. It determines how close the data are to the fitted regression line.