集中趋势与离散趋势

Central tendency and variability

作者

Qingyao Zhang

发布于

2026年5月9日

data("well", package = "Keng")

1 集中趋势

集中趋势的统计量有三个：众数（mode），中位数（median），算数平均数（mean）。

1.1 众数

众数指的是频数最大的数值。以题项flourish101为例，通过统计频数或者绘制条形图，我们容易发现在flourish101上得6分的人最多，即，flourish101的众数是6。

1.2 中位数

百分位

# 第50百分位数，即中位数
quantile(well$flourish1, probs = 0.5)
##   50% 
## 5.375

1.3 均值

\[ M = \frac{\sum{x_i}}{biased} \]

# 幸福感的一般水平
mean(well$flourish1, na.rm = TRUE)
## [1] 5.32125

数据框的行均值与列均值：以第一个时间点的实现论幸福感为例

rownames	flourish101	flourish102	flourish103	flourish104	flourish105	flourish106	flourish107	flourish108	rowMeans
1	7	6	6	7	6	4	6	4	?
2	4	2	4	6	7	4	4	4	?
3	6	4	6	6	5	6	6	6	?
4	7	6	4	7	7	7	7	6	?
5	6	5	6	7	7	7	7	7	?
6	6	5	6	6	6	6	6	6	?
colMeans	?	?	?	?	?	?	?	?	?

# 计算维度或量表的得分
# 计算flourish在第1个时间点的得分
# 查看变量名
names(well)

# 根据列号计算行均值，提前移除缺失值
well$flourish1_1 <- rowMeans(well[287:294], na.rm = TRUE)
well$flourish1_1
##   [1] 5.750 4.375 5.625 6.375 6.500 5.875 4.000 7.000 4.375 5.625 5.750 5.125
##  [13] 3.375 4.875 3.250 4.000 6.500 6.500 5.250 5.125 6.500 4.625 2.875 4.125
##  [25] 5.750 4.750 5.500 5.625 5.625 5.250 5.625 5.125 6.000 5.000 4.625 5.500
##  [37] 5.000 4.250 4.625 7.000 4.375 5.125 5.500 4.500 6.625 5.000 6.000 6.125
##  [49] 4.875 4.375 5.000 4.750 5.000 5.750 4.625 3.000 1.750 4.500 6.250 5.000
##  [61] 5.500 4.750 4.875 3.625 4.500 4.375 6.125 5.250 4.500 4.125 6.000 4.500
##  [73] 5.375 5.625 6.000 6.000 5.750 4.000 5.875 6.000 6.125 5.875 5.750 6.000
##  [85] 5.375 6.375 4.125 4.750 6.750 5.500 5.875 6.000 5.750 5.125 4.125 4.875
##  [97] 5.000 3.500 5.500 6.250 4.875 5.125 4.625 4.125 3.000 5.000 6.000 4.625
## [109] 4.250 5.375 7.000 5.625 6.000 5.875 7.000 6.750 7.000 6.500 4.625 5.250
## [121] 4.875 6.000 5.000 6.000 6.000 5.750 6.625 5.250 7.000 4.125 4.000 6.500
## [133] 6.875 6.125 7.000 4.750 5.000 6.250 5.625 6.000 5.375 7.000 5.750 5.250
## [145] 6.000 4.250 5.125 5.500 5.625 5.750 5.250 5.375 5.750 5.750 5.000 4.875
## [157] 4.375 4.375 7.000 5.875 4.500 3.875 5.875 6.250 6.750 5.875 6.625 5.875
## [169] 4.000 6.125 5.375 4.625 6.000 4.625 4.875 5.375 5.750 6.500 5.000 6.500
## [181] 4.875 4.500 5.000 5.000 4.500 6.000 7.000 5.875 5.625 4.625 4.500 6.625
## [193] 5.625 5.375 4.250 5.750 5.000 4.375 4.625 4.000

# 根据列名计算行均值，提前移除缺失值
flourish_items <- c("flourish101","flourish102","flourish103","flourish104",
                    "flourish105","flourish106","flourish107","flourish108")

well$flourish1_2 <- rowMeans(well[flourish_items], na.rm = TRUE)
well$flourish1_2
##   [1] 5.750 4.375 5.625 6.375 6.500 5.875 4.000 7.000 4.375 5.625 5.750 5.125
##  [13] 3.375 4.875 3.250 4.000 6.500 6.500 5.250 5.125 6.500 4.625 2.875 4.125
##  [25] 5.750 4.750 5.500 5.625 5.625 5.250 5.625 5.125 6.000 5.000 4.625 5.500
##  [37] 5.000 4.250 4.625 7.000 4.375 5.125 5.500 4.500 6.625 5.000 6.000 6.125
##  [49] 4.875 4.375 5.000 4.750 5.000 5.750 4.625 3.000 1.750 4.500 6.250 5.000
##  [61] 5.500 4.750 4.875 3.625 4.500 4.375 6.125 5.250 4.500 4.125 6.000 4.500
##  [73] 5.375 5.625 6.000 6.000 5.750 4.000 5.875 6.000 6.125 5.875 5.750 6.000
##  [85] 5.375 6.375 4.125 4.750 6.750 5.500 5.875 6.000 5.750 5.125 4.125 4.875
##  [97] 5.000 3.500 5.500 6.250 4.875 5.125 4.625 4.125 3.000 5.000 6.000 4.625
## [109] 4.250 5.375 7.000 5.625 6.000 5.875 7.000 6.750 7.000 6.500 4.625 5.250
## [121] 4.875 6.000 5.000 6.000 6.000 5.750 6.625 5.250 7.000 4.125 4.000 6.500
## [133] 6.875 6.125 7.000 4.750 5.000 6.250 5.625 6.000 5.375 7.000 5.750 5.250
## [145] 6.000 4.250 5.125 5.500 5.625 5.750 5.250 5.375 5.750 5.750 5.000 4.875
## [157] 4.375 4.375 7.000 5.875 4.500 3.875 5.875 6.250 6.750 5.875 6.625 5.875
## [169] 4.000 6.125 5.375 4.625 6.000 4.625 4.875 5.375 5.750 6.500 5.000 6.500
## [181] 4.875 4.500 5.000 5.000 4.500 6.000 7.000 5.875 5.625 4.625 4.500 6.625
## [193] 5.625 5.375 4.250 5.750 5.000 4.375 4.625 4.000

# 根据列名计算行均值
# 通过paste0()生成题项变量名
paste0("flourish", 101:108)
## [1] "flourish101" "flourish102" "flourish103" "flourish104" "flourish105"
## [6] "flourish106" "flourish107" "flourish108"
well$flourish1_3 <- rowMeans(well[paste0("flourish", 101:108)], na.rm = TRUE)
well$flourish1_3
##   [1] 5.750 4.375 5.625 6.375 6.500 5.875 4.000 7.000 4.375 5.625 5.750 5.125
##  [13] 3.375 4.875 3.250 4.000 6.500 6.500 5.250 5.125 6.500 4.625 2.875 4.125
##  [25] 5.750 4.750 5.500 5.625 5.625 5.250 5.625 5.125 6.000 5.000 4.625 5.500
##  [37] 5.000 4.250 4.625 7.000 4.375 5.125 5.500 4.500 6.625 5.000 6.000 6.125
##  [49] 4.875 4.375 5.000 4.750 5.000 5.750 4.625 3.000 1.750 4.500 6.250 5.000
##  [61] 5.500 4.750 4.875 3.625 4.500 4.375 6.125 5.250 4.500 4.125 6.000 4.500
##  [73] 5.375 5.625 6.000 6.000 5.750 4.000 5.875 6.000 6.125 5.875 5.750 6.000
##  [85] 5.375 6.375 4.125 4.750 6.750 5.500 5.875 6.000 5.750 5.125 4.125 4.875
##  [97] 5.000 3.500 5.500 6.250 4.875 5.125 4.625 4.125 3.000 5.000 6.000 4.625
## [109] 4.250 5.375 7.000 5.625 6.000 5.875 7.000 6.750 7.000 6.500 4.625 5.250
## [121] 4.875 6.000 5.000 6.000 6.000 5.750 6.625 5.250 7.000 4.125 4.000 6.500
## [133] 6.875 6.125 7.000 4.750 5.000 6.250 5.625 6.000 5.375 7.000 5.750 5.250
## [145] 6.000 4.250 5.125 5.500 5.625 5.750 5.250 5.375 5.750 5.750 5.000 4.875
## [157] 4.375 4.375 7.000 5.875 4.500 3.875 5.875 6.250 6.750 5.875 6.625 5.875
## [169] 4.000 6.125 5.375 4.625 6.000 4.625 4.875 5.375 5.750 6.500 5.000 6.500
## [181] 4.875 4.500 5.000 5.000 4.500 6.000 7.000 5.875 5.625 4.625 4.500 6.625
## [193] 5.625 5.375 4.250 5.750 5.000 4.375 4.625 4.000

# 检查计算结果是否一致
all(well$flourish1_1 == well$flourish1_2)
## [1] TRUE
all(well$flourish1_1 == well$flourish1_3)
## [1] TRUE

2 离散趋势

2.1 全距

\[ max - min \]

# flourish1的全距
range(well$flourish1)
## [1] 1.75 7.00

四分位

四分位数

2.2 四分位距

\[ Q75 - Q25 \]

# flourish1的四分位距
Q25 <- quantile(well$flourish1, 0.25)
Q75 <- quantile(well$flourish1, 0.75)
as.numeric(Q75 - Q25)
## [1] 1.375

2.3 方差

\[\begin{align} SS &= \sum{(x_i-M_x)^2} \\ s^2_{biased} &= \frac{SS}{biased} \\ s^2_{unbiased} &= \frac{SS}{unbiased} \end{align}\]

在上面的公式中，SS表示平方和。\(s^2_{biased}\)表示有偏方差，\(s^2_{unbiased}\)表示无偏方差。

# flourish1的方差
# 有偏方差，使用公式
sum((well$flourish1 - mean(well$flourish1))^2)/200
## [1] 0.8582047
# 无偏方差，使用公式
sum((well$flourish1 - mean(well$flourish1))^2)/(200-1)
## [1] 0.8625173
# 无偏方差，使用函数var()
var(well$flourish1)
## [1] 0.8625173

2.4 标准差

\[\begin{align} s_{biased} &= \sqrt{s^2_{biased}} \\ s_{unbiased} &= \sqrt{s^2_{unbiased}} \end{align}\]

\(s_{biased}\)表示有偏标准差，\(s_{unbiased}\)表示无偏标准差。

# flourish1的标准差
# 有偏标准差，基于有偏方差的公式
sqrt(sum((well$flourish1 - mean(well$flourish1))^2)/200)
## [1] 0.9263934
# 无偏标准差，基于无偏方差的公式
sqrt(sum((well$flourish1 - mean(well$flourish1))^2)/(200-1))
## [1] 0.9287181
# 无偏标准差，使用函数sd()
sd(well$flourish1)
## [1] 0.9287181

3 多个变量的描述性统计

# 使用程序包psych中的函数describe()进行描述性统计
library(psych)
describe(well[347:349])
##           vars   n mean   sd median trimmed  mad  min max range  skew kurtosis
## flourish1    1 200 5.32 0.93   5.38    5.34 0.93 1.75   7  5.25 -0.40     0.36
## flourish2    2 200 5.32 1.09   5.38    5.36 0.93 1.00   7  6.00 -0.81     1.54
## flourish3    3 200 5.09 0.97   5.00    5.07 1.11 1.38   7  5.62 -0.15     0.23
##             se
## flourish1 0.07
## flourish2 0.08
## flourish3 0.07