分组方差:

\(c\) 组样本,已知每组样本的统计量:第 \(i\) 组样本的样本量 \(n_i\geqslant 2\)、样本均值 \(\overline{X_i}=\dfrac{1}{n}\sum\limits_{j=1}^{n_i} X_{ij}\)、样本方差 \(S_i^2=\dfrac{1}{n-1}\sum\limits_{j=1}^{n_i}(X_{ij}-\overline{X_i})^2\),将这 \(c\) 组样本合并成一个总样本,则总样本对应统计量可用每组样本统计量来计算:

\[ \begin{align} n&=\sum_{i=1}^{c}{n_i},\\ \overline{X}&=\frac{\sum\limits_{i=1}^{c}{n_i\overline{X_i}}}{n},\\ S^2&=\frac{\sum\limits_{i=1}^{c}{\left[(n_i-1)S_i^2+n_i\overline{X_i}^2\right]}-n\overline{X}^2}{n-1}\\ S^2&=\frac{n}{n-1}\left(\overline{X^2} - \overline{X}^2\right) \end{align} \]

其中 \(\overline{X^2}:=\sum X_i^2/n\)

此外,上面还使用到了数理统计中一个常用的公式: \[ {\color{gray} \sum_{i=1}^{n}{(x_i-\overline{x})^2} =\sum_{i=1}^{n}\left(x_i^2 - \overline{x}^2 \right) =\left(\sum_{i=1}^{n} x_i^2\right) - \left(n\overline{x}^2\right) =n\left(\overline{x^2} - \overline{x}^2 \right) } \]

为方便记忆,可将上述各统计量之间的关系记为下列对称的形式:

\[ \begin{align} n&=\sum_{i=1}^{c}{n_i},\\ n\overline{X}&=\sum\limits_{i=1}^{c}{n_i\overline{X_i}},\\ (n-1)S^2+n\overline{X}^2&=\sum\limits_{i=1}^{c}{\left[(n_i-1)S_i^2+n_i\overline{X_i}^2\right]} \end{align} \]

注意:虽然在数学上,下面的公式一定不小于零,但在实际应用中使用时,如果恰好遇到方差很小的情况,则可能会因为舍入误差的原因,导致结果为负数,这时保险的做法是取其绝对值。

\[ S^2=\frac{n}{n-1}\left(\overline{X^2} - \overline{X}^2\right) \notag \]

\[ S^2=\frac{n}{n-1}\left|\overline{X^2} - \overline{X}^2\right| \notag \]

协方差

\[ \begin{align} \mathrm{COV}(X, Y) & = \mathrm{E}[(X - \mathrm{E}(X))(Y - \mathrm{E}(Y))] \notag \\ & = \mathrm{E}(XY) - \mathrm{E}(X) \mathrm{E}(Y) \notag \end{align} \]

\[ \begin{align} \mathrm{COV}(X, Y) & = \dfrac{n}{n-1} \overline{(X - \overline{X})(Y - \overline{Y})} \notag \\ & = \dfrac{n}{n-1} (\overline{XY} - \overline{X} \cdot \overline{Y}) \notag \end{align} \]

1
2
3
4
-- 总体协方差
covar_pop(x, y) = avg(x * y) - avg(x) * avg(y)
-- 样本协方差
covar_samp(x, y) = (avg(x * y) - avg(x) * avg(y)) * count(1) / (count(1) - 1)

\[ \begin{align} \mathrm{COV}(X, Y) &= \dfrac{n}{n-1} \left(\dfrac{1}{n}\sum_{i=1}^c n_i\overline{X_i Y_i} - \dfrac{1}{n}\sum_{i=1}^c n_i\overline{X_i} \cdot \dfrac{1}{n}\sum_{i=1}^c n_i\overline{Y_i}\right) \notag \\ &= \dfrac{1}{n-1} \left(\sum_{i=1}^c n_i\overline{X_i Y_i} - \dfrac{1}{n}\sum_{i=1}^c n_i\overline{X_i} \cdot \sum_{i=1}^c n_i\overline{Y_i}\right) \notag \end{align} \]


本站总访问量