Notes from Toppers
Measures of Central Tendency:
- Mean (Arithmetic Mean)
- Definition as sum of observations divided by the number of observations. (Refer NCERT Class 11, Chapter 15, Statistics (Part 1) - Mean of Grouped Data)
- Median:
- Definition as middle value when the observations are arranged in ascending order (Refer to NCERT Class 11, Chapter 15, Statistics (Part 1) - Median of Ungrouped Data)
- Mode:
- Definition as the value that appears most frequently in the data (Refer NCERT Class 11, Chapter 15, Statistics (Part 1) - Mode of Ungrouped Data)
Measures of Dispersion:
- Range:
- Defined as the difference between the largest and smallest observations (Refer NCERT Class 11, Chapter 15, Statistics (Part 1) - Range of Grouped Data)
- Variance:
- Definition as the sum of squared deviations from the mean divided by number of observations minus one (Refer NCERT Class 12, Chapter 25, Probability)
- Standard Deviation (SD):
- Defined as square root of the variance. Represents how spread out the observations are from the mean (Refer NCERT Class 12, Chapter 25, Probability)
- Quartile Deviation (Q.D or H):
- Defined as half of the difference between upper and lower quartiles. (Refer NCERT Class 11, Chapter 15, Statistics (Part 1) - Quartile Deviation)
- Interquartile Range (IQR):
- Defined as the difference between the upper and lower quartiles. (Refer NCERT Class 11, Chapter 15, Statistics (Part 1) - Interquartile Range)
Skewness and Kurtosis:
- Skewness:
- Describes the asymmetry in the data distribution, can be positive or negative (Refer NCERT Class 11, Chapter 15, Statistics (Part 1) - Karl Pearson’s Coefficient of Skewness)
- Kurtosis:
- Describes the peakedness or flatness of the data distribution compared to a normal distribution (Refer NCERT Class 11, Chapter 15, Statistics (Part 1) - Karl Pearson’s Coefficient of Kurtosis)
Correlation and Regression:
- Scatter Diagrams:
- A graphical representation that shows the relationship between two variables where each data point represents a pair of measurements. (Refer NCERT Class 11, Chapter 15, Statistics (Part 2) - Scatter Plot)
- Karl Pearson’s Coefficient of Correlation (r):
- A measure of the linear relationship between two variables, ranges from -1 to 1. (Refer NCERT Class 11, Chapter 15, Statistics (Part 2) - Correlation)
- Spearman’s Rank Correlation Coefficient (rs):
- Used to measure the monotonic relationship between two variables. (Refer NCERT Class 11, Chapter 15, Statistics (Part 2) - Spearman’s Rank Correlation)
- Regression Analysis:
- Used to predict the value of one variable (dependent variable) based on the values of one or more other variables (independent variables).
- Linear Regression Equation:
- An equation in the form y = mx + b, where ’m’ is the slope and ‘b’ is the intercept. Commonly used for linear regression.
- Residuals:
- The difference between observed values and predicted values in regression analysis. (Refer NCERT Class 12, Chapter 25, Probability)
Probability:
- Basic Concepts (Sample Space, Events, Probability) (Refer NCERT Class 12, Chapter 13, Probability)
- Conditional Probability:
- The probability of occurrence of an event given that another event has already occurred. (Refer NCERT Class 12, Chapter 13, Probability)
- Bayes’ Theorem:
- A method to calculate conditional probabilities based on prior probabilities, likelihoods, and total probabilities. (Refer NCERT Class 12, Chapter 13, Probability)
- Independent Events:
- Events are said to be independent if the occurrence of one event does not affect the probability of occurrence of the other.
- Mutually Exclusive Events:
- Events are mutually exclusive if the occurrence of one event precludes the occurrence of the other.
- Addition Rule of Probability:
- If (A) and (B) are two events, the probability of either (A) or (B) occurring is (P(A \cup B) = P(A) + P(B) - P(A \cap B)).
- Multiplication Rule of Probability:
- If (A) and (B) are two events, the probability of both (A) and (B) occurring is (P(A \cap B) = P(A) \cdot P(B|A)).
Random Variables and Probability Distributions:
- Random Variable:
- A variable whose value is determined by the outcome of a random event.
- Discrete Probability Distributions:
- Probability distribution of discrete random variables, commonly Binomial, Poisson and Hypergeometric distributions (Refer NCERT Class 12, Chapter 13, Probability)
- Continuous Probability Distributions:
- Probability distribution of continuous random variables, commonly including the normal and exponential distributions. (Refer NCERT Class 12, Chapter 13, Probability)
- Central Limit Theorem:
- States that the distribution of sample means approaches the normal distribution as sample size increases, regardless of the shape of the population distribution.
Sampling Techniques :
- Simple Random Sampling: Every member of the population has an equal chance of getting selected. (Refer NCERT Class 11, Chapter 15, Statistics (Part 2) - Random Sampling)
- Stratified Random Sampling: The population is divided into groups/strata and then simple random sampling is carried out within each stratum (Refer NCERT Class 11, Chapter 15, Statistics (Part 2) - Stratified Random Sampling)
- Cluster Random Sampling: Instead of selecting individual elements randomly, groups or clusters of individuals are randomly selected from the population (Refer NCERT Class 11, Chapter 15, Statistics (Part 2) - Cluster Sampling)
- Systematic Random Sampling: Selecting individuals at regular intervals from a predetermined starting point (Refer NCERT Class 11, Chapter 15, Statistics (Part 2) - Systematic Sampling)
Hypothesis Testing:
- Null Hypothesis (H0) and Alternative Hypothesis (H1):
- H0 is a statement that there is no significant difference between two populations, while H1 suggests the opposite. (Refer NCERT Class 12, Chapter 15, Probability)
- Type I and Type II Errors:
- Type I Error (Rejecting H0 when it’s true) and Type II Error (Accepting H0 when it’s false).
- Level of Significance:
- The probability at which the null hypothesis is rejected when it is actually true.
- Critical Value:
- The value that separates the rejection region from the acceptance region in hypothesis testing.
- P-value:
- The probability of obtaining a test statistic as extreme as or more extreme than the observed sample result, assuming that the null hypothesis is true. (Refer NCERT Class 12, Chapter 15, Probability)
- One-sample Z-test:
- Used to test whether the sample mean is equal to a specified value when population variance is known.
- Two-sample Z-test:
- Used to test whether the means of two independent normally distributed populations are equal when population variances are known.
- Chi-square test:
- Used to determine if there is a significant difference between observed frequencies and expected frequencies of categories. (Refer NCERT Class 12, Chapter 15, Probability)
- Student’s t-test:
- Used when population variance is unknown, to test whether the mean of a population is equal to a specified value and to compare the means of two populations.
- One-way ANOVA:
- Used to compare means of more than two independent groups/populations.
Confidence Intervals:
- Confidence Interval for Mean:
- Defined as a range of possible values that contains the true population mean with a certain confidence level. (Refer NCERT Class 12, Chapter 15, Probability)
- Confidence Interval for Proportion: (Refer NCERT Class 12, Chapter 15, Probability)
- Confidence Interval for Variance:
- Defined as a range of values that contains the true variance of the population with a specified level of confidence.
- Confidence Interval for Regression Slope:
- Defined as a range of values that contains the true slope of the regression line with a specific confidence level.
Non-Parametric Tests:
- Sign Test:
- Used to compare two matched samples when data is not normally distributed. (Refer NCERT Class 12, Chapter 14, Mathematical Reasoning)
-
Wilcoxon Signed-rank Test: