常见的连续随机变量

5.3. 常见的连续随机变量#

5.3.1. 均匀分布#

均匀分布: 设定义在区间 \((a,b)\) 的一个随机变量 \(X\) ，其中 \(a<b\) 为两个未知参数。 \(X\) 的概率密度函数为：

\[ p(x) = \frac{1}{b-a}, a< x<b \]

称 \(X\) 的分布为均匀分布。记 \(X\sim U(a,b)\) 。

import numpy as np
import matplotlib.pyplot as plt


def prompt_float(message: str) -> float:
    while True:
        try:
            return float(input(message))
        except ValueError:
            print("请输入有效的小数。")


def read_parameters():
    print("Uniform Distribution Visualizer")
    while True:
        a = prompt_float("请输入下限 a（可为负，例如 0）: ")
        b = prompt_float("请输入上限 b（需大于 a，例如 1）: ")
        if b <= a:
            print("必须满足 b > a，请重新输入。\n")
            continue
        return a, b


def plot_uniform_distribution(a: float, b: float) -> None:
    width = b - a
    margin = max(width * 0.1, 1.0)
    x_min = a - margin
    x_max = b + margin
    x = np.linspace(x_min, x_max, 1000)
    pdf = np.zeros_like(x)
    mask = (x >= a) & (x <= b)
    pdf[mask] = 1 / (b - a)

    plt.figure(figsize=(10, 6))
    plt.plot(x, pdf, color="skyblue", linewidth=2)
    plt.fill_between(x, pdf, where=mask, color="skyblue", alpha=0.3)
    plt.axvline(x=a, color="gray", linestyle="--", alpha=0.7)
    plt.axvline(x=b, color="gray", linestyle="--", alpha=0.7)
    plt.xlabel("Random variable X")
    plt.ylabel("Density f(X)")
    plt.title(f"Uniform Distribution U({a}, {b})")
    plt.grid(True, alpha=0.3)
    plt.xlim(x_min, x_max)
    plt.ylim(0, max(pdf.max() * 1.2, 0.2))
    plt.tight_layout()
    plt.show()


def main():
    a, b = read_parameters()
    plot_uniform_distribution(a, b)


if __name__ == "__main__":
    main()

5.3.2. 正态分布#

正态分布: 设定义在区间 \((-\infty,\infty)\) 的一个随机变量 \(X\) 。 \(X\) 的概率密度函数为：

\[ p(x) =\frac{1}{\sqrt{2\pi \sigma^2}}\exp\left\{-\frac{1}{2\sigma^2} (x-\mu)^2\right\}, x \in R \]

称 \(X\) 的分布为正态分布。记 \(X\sim N(\mu,\sigma^2)\) ，其中参数 \(\mu \in R, \sigma^2 >0\) 。

Remark 5.3

正态分布是最早由法国数学家棣莫弗（Abraham de Moivre）在近似二项分布时得到的，后由德国数学家高斯（Carolus Fridericus Gauss）在测量误差时导出。因高斯的工作对后世的贡献巨大，所以，正态分布又称高斯分布。
概率密度函数 \(p(x)\) 是一条钟型曲线，特点为：中间高，两边低，左右对称。
正态分布的两个参数 \(\mu\) 和 \(\sigma^2\) 是决定密度函数位置和形状，称 \(\mu\) 为位置参数， \(\sigma^2\) 是尺度参数。

import numpy as np
import matplotlib.pyplot as plt


def prompt_float(message: str) -> float:
    while True:
        try:
            return float(input(message))
        except ValueError:
            print("请输入有效的小数。")


def prompt_positive_float(message: str) -> float:
    while True:
        try:
            value = float(input(message))
        except ValueError:
            print("请输入有效的小数。")
            continue
        if value <= 0:
            print("参数必须大于 0。")
            continue
        return value


def read_parameters():
    print("Normal Distribution Visualizer")
    mean = prompt_float("请输入均值 μ（例如 0）: ")
    variance = prompt_positive_float("请输入方差 σ^2（>0，例如 1）: ")
    return mean, variance


def plot_normal_distribution(mean: float, variance: float) -> None:
    std_dev = np.sqrt(variance)
    span = max(1.0, 4 * std_dev)  # cover most of the density even when mean/variance shift
    x_min = mean - span
    x_max = mean + span
    x = np.linspace(x_min, x_max, 1000)
    y = (1 / (np.sqrt(2 * np.pi * variance))) * np.exp(-(x - mean) ** 2 / (2 * variance))

    plt.figure(figsize=(10, 6))
    plt.plot(x, y, color="skyblue", linewidth=2)
    plt.xlabel("Random variable X")
    plt.ylabel("Density f(X)")
    plt.title(f"Normal Distribution: mean={mean}, variance={variance}")
    plt.grid(True, alpha=0.3)
    plt.xlim(x_min, x_max)
    plt.ylim(0, max(0.1, y.max() * 1.2))
    plt.tight_layout()
    plt.show()


def main():
    mean, variance = read_parameters()
    plot_normal_distribution(mean, variance)


if __name__ == "__main__":
    main()

这里很自然我们构建一个正态分布类，即：

\[ \mathcal{P} = \{N(\mu,\sigma^2):\mu \in R, \sigma^2 >0\} \]

其中有个极为特殊的正态分布——标准正态分布，即 \(\mu = 0,\sigma^2 = 1\) 。下面我们具体讲解。

Remark 5.4

标准正态分布的密度函数为：

\[ \phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{1}{2}z^2\right\} \]

标准正态分布的分布函数为：

\[ \Phi(z) = \int_{-\infty}^z \phi(x)\text{d} x \]

标准正态分布的概率计算常用公式：

a. \(\Phi (-z)=P(Z\le -z)=P(Z\ge z)=1-\Phi(z)\)
b. \(P(Z>z)=1-\Phi(z)\)
c. \(P(a<Z<b)=\Phi(b)-\Phi(a)\)
d. \(P(\left | Z \right | <c)=2\Phi(c)-1， (c\ge 0)\)

Theorem 5.2

若随机变量 \(X\sim N(\mu,\sigma^{2})\) ，则 \(Z=\frac{x-\mu }{\sigma } \sim N(0,1)\) 。

记 \(X\) 和 \(Z\) 的分布函数分别为 \(F_{X}(x)\) 和 \(F_{Z}(z)\) ，密度函数分别为 \(p_{X}(x)\) 和 \(p_{Z}(z)\) 。

则由分布函数的定义可知：

(5.1)#\[\begin{split}\begin{aligned} F_{Z}(z) &=P(Z\le z)\\ &=P\left(\frac{X-\mu }{\sigma } \le z\right)\\ &=P(X\le \mu +\sigma z )\\ % Corrected typo from original LaTeX &=F_{X} (\mu +\sigma z) \end{aligned}\end{split}\]

由于正态分布函数是严格单调递增且处处可导。因此：

(5.2)#\[\begin{split}\begin{aligned} p_{Z}(z) &=\frac{\mathrm{d}}{\mathrm{d} z} F_{Z}(z)\\ &=\frac{\mathrm{d}}{\mathrm{d} z} F_{X}(\mu +\sigma z)\\ &= p_{X}(\mu +\sigma z)\cdot \sigma \\ &=\frac{1}{\sqrt{2\pi \sigma ^{2}} } \exp\left\{-\frac{1}{2\sigma ^{2}}(\mu +\sigma z -\mu )^{2} \right\}\cdot \sigma \\ &=\frac{1}{\sqrt{2\pi \sigma ^{2}} } \exp\left\{-\frac{(\sigma z)^{2}}{2\sigma ^{2}} \right\}\cdot \sigma \\ &=\frac{1}{\sqrt{2\pi \sigma ^{2}} } \exp\left\{-\frac{\sigma^2 z^{2}}{2\sigma ^{2}} \right\}\cdot \sigma \\ &=\frac{1}{\sqrt{2\pi \sigma ^{2}} } \exp\left\{-\frac{z^{2}}{2} \right\}\cdot \sigma \\ &=\frac{\sigma}{\sqrt{2\pi \sigma ^{2}} } \exp\left\{-\frac{z^{2}}{2} \right\}\\ &=\frac{\sigma}{\sigma\sqrt{2\pi } } \exp\left\{-\frac{z^{2}}{2} \right\}\\ &=\frac{1}{\sqrt{2\pi } } \exp\left\{-\frac{z^{2}}{2} \right\}\\ \end{aligned}\end{split}\]

由此可得：

\[ Z=\frac{x-\mu }{\sigma } \sim N(0,1) \]

Remark 5.5

\(3\sigma\) 原则：

\(P(\mu -\sigma <X<\mu +\sigma )=2\Phi(1)-1\approx 0.6826\)
\(P(\mu -2\sigma <X<\mu +2\sigma )=2\Phi(2)-1\approx 0.9545\)
\(P(\mu -3\sigma <X<\mu +\sigma )=2\Phi(3)-1\approx 0.9973\)

5.3.3. 指数分布#

指数分布: 设一随机变量 \(X\) ，其密度函数为：

\[\begin{split} p(x)=\left\{\begin{matrix} \lambda e^{-\lambda x} ,&x\ge 0 \\ 0,&x<0 \end{matrix}\right. \end{split}\]

则称 \(X\) 的分布为指数分布，记 \(X\sim Exp(\lambda)\) ，其中参数 \(\lambda>0\) 。

根据随机变量的密度函数，可以计算其分布函数为：

\[\begin{split} F_{X} (x) = \left\{\begin{matrix} \int_{0}^{x}p(t)dt=\int_{0}^{x}\lambda e^{-\lambda t} dt= e^{-\lambda t}|_{0}^{x} =1-e^{-\lambda x} ,&x\ge 0 \\ 0,&x<0 \end{matrix}\right. \end{split}\]

类似于几何分布，指数分布也具有无记忆性。

Theorem 5.3 (指数分布的无记忆性)

如果随机变量 \(X\sim Exp(\lambda)\) ，则对任意 \(s>0,t>0\) 有：

\[ P(X>t+s|X>s)=P(X>t) \]

import numpy as np
import matplotlib.pyplot as plt


def prompt_positive_float(message: str) -> float:
    while True:
        try:
            value = float(input(message))
        except ValueError:
            print("请输入有效的小数。")
            continue
        if value <= 0:
            print("参数必须大于 0。")
            continue
        return value


def read_parameter():
    print("Exponential Distribution Visualizer")
    lambd = prompt_positive_float("请输入参数 λ（>0，例如 1）: ")
    return lambd


def plot_exponential_distribution(lambd: float) -> None:
    x_max = max(10.0, 5.0 / lambd)
    x = np.linspace(0, x_max, 1000)
    pdf = lambd * np.exp(-lambd * x)
    x_threshold = 2
    exceed_prob = np.exp(-lambd * x_threshold)
    exceed_pct = exceed_prob * 100

    plt.figure(figsize=(12, 8))
    plt.plot(x, pdf, color="skyblue", linewidth=3, label=f"PDF (λ={lambd})")

    mask = x > x_threshold
    plt.fill_between(x[mask], pdf[mask], color="lightgreen", alpha=0.5,
                     label=f"P(X > {x_threshold}) = {exceed_pct:.1f}%")

    peak_y = lambd * np.exp(-lambd * (x_threshold + 0.5))
    plt.annotate(f"{exceed_pct:.1f}%",
                 xy=(x_threshold + 0.5, peak_y),
                 xytext=(x_threshold + 1.5, peak_y + 0.2),
                 arrowprops=dict(facecolor="black", shrink=0.05, width=1.5, headwidth=8),
                 fontsize=12)

    plt.xlabel("Time interval X")
    plt.ylabel("Density f(X)")
    plt.title("Exponential Distribution and Tail Probability")
    plt.grid(True, alpha=0.3)
    plt.xlim(0, x_max)
    plt.ylim(0, max(0.5, pdf.max() * 1.2))
    plt.legend()
    plt.figtext(0.15, 0.01,
                "Notes:\n"
                "1. Memoryless property: P(X>s+t | X>s) = P(X>t).\n"
                f"2. Shaded area represents P(X > {x_threshold}).",
                fontsize=10, bbox=dict(facecolor="white", alpha=0.85))
    plt.tight_layout(rect=[0, 0.08, 1, 0.97])
    plt.show()


def main():
    lambd = read_parameter()
    plot_exponential_distribution(lambd)


if __name__ == "__main__":
    main()

泊松分布与指数分布有非常紧密的关系，我们利用以下一个例子来说明。

Example 5.4

如果某设备在长为 \(t\) 的时间 \((0,t)\) 内发生故障的次数 \(N(t)\) （与时间长度 \(t\) 有关）服从参数为 \(\lambda t\) 的泊松分布，且 \(N(0)=0\) ，则从 \(0\) 时开始首次发生故障的时间 \(T\) 服从参数为 \(\lambda\) 的指数分布。

5.3.4. 伽马分布#

伽马函数: 称

\[ \Gamma (\alpha )=\int_{0}^{+\infty } x^{\alpha -1} e^{-x} dx,\alpha >0 \]

为伽马函数。

根据伽马函数的定义，可以证明伽马函数的一些常用性质。

Property 5.1

\(\Gamma (1 )=1, \quad \Gamma (\frac{1}{2} )=\sqrt{\pi }\) ；
\(\Gamma (\alpha +1 )=\alpha \Gamma(\alpha ), \quad \text{特别地，}\Gamma (n+1 )=n\Gamma(n)=n!\) 。

基于伽马函数，我们来定义伽马分布。

伽马分布: 假设 \(X\) 为一随机变量，其密度函数为：

\[\begin{split} p(x)=\left\{\begin{matrix} \frac{\lambda ^{\alpha } }{\Gamma (\alpha )}x^{\alpha -1}e^{-\lambda x} & ,x\ge 0\\ 0&,x<0\end{matrix}\right. \end{split}\]

则称其分布为伽马分布，记作 \(X\sim Ga(\alpha,\lambda)\) ，其中 \(\alpha>0\) 为形状参数， \(\lambda>0\) 为尺度参数。

Remark 5.6

当 \(\alpha = 1\) 时， \(Ga(1,\lambda)=Exp(\lambda)\) 。

import numpy as np
import matplotlib.pyplot as plt
from scipy.special import gamma as gamma_func


def prompt_positive_float(message: str) -> float:
    while True:
        try:
            value = float(input(message))
        except ValueError:
            print("请输入有效的小数。")
            continue
        if value <= 0:
            print("参数必须大于 0。")
            continue
        return value


def read_parameters():
    print("Gamma Distribution Visualizer")
    alpha = prompt_positive_float("请输入形状参数 α（>0，例如 2）: ")
    lambd = prompt_positive_float("请输入尺度参数 λ（>0，例如 1）: ")
    return alpha, lambd


def plot_gamma_distribution(alpha: float, lambd: float) -> None:
    mean = alpha / lambd
    std = np.sqrt(alpha) / lambd
    x_max = max(5.0, mean + 5 * std)
    x = np.linspace(0, x_max, 1000)
    pdf = (lambd ** alpha * x ** (alpha - 1) * np.exp(-lambd * x)) / gamma_func(alpha)
    pdf[x < 0] = 0

    plt.figure(figsize=(10, 6))
    plt.plot(x, pdf, color="skyblue", linewidth=2)
    plt.xlabel("Random variable X")
    plt.ylabel("Density f(X)")
    plt.title(f"Gamma Distribution: α={alpha}, λ={lambd}")
    plt.grid(True, alpha=0.3)
    plt.xlim(0, x_max)
    plt.ylim(0, max(0.1, pdf.max() * 1.2))
    plt.tight_layout()
    plt.show()


def main():
    alpha, lambd = read_parameters()
    plot_gamma_distribution(alpha, lambd)


if __name__ == "__main__":
    main()

以下例子讲解了泊松分布与伽马分布之间的关系，和之前讲解过的关于泊松分布与指数分布之间的关系的证明过程类似，供学生课后自学。

Example 5.5

若在 \((0,t)\) 内发生冲击的次数 \(N(t)\) 服从参数为 \(\lambda t\) 的泊松分布，试证明第 \(n\) 次冲击来到的时间 \(S_{n}\) 服从伽马分布 \(Ga(n,\lambda)\) 。

5.3.5. 贝塔分布#

贝塔函数: 称

\[ B(a,b)=\int_{0}^{1} x^{a-1} (1-x)^{b-1} dx,a>0,b>0 \]

为贝塔函数。

根据贝塔函数的定义，可以证明贝塔函数的一些常用性质。

Property 5.2

\(B(a,b)=B(b,a)\) ；
\(B(a,b)=\frac{\Gamma (a)\Gamma (b)}{\Gamma (a+b)}\) 。

基于贝塔函数，我们来定义贝塔分布。

贝塔分布: 假设一随机变量 \(X\) ，其密度函数为：

\[\begin{split} p(x)=\left\{\begin{matrix} \frac{\Gamma (a+b)}{\Gamma (a)\Gamma (b)}x^{a-1} (1-x)^{b-1}, &0<x<1, \\ 0,& \text{其他}. \end{matrix}\right. \end{split}\]

则称 \(X\) 的分布为贝塔分布，记 \(X\sim Be(a,b)\) ，其中 \(a>0,b>0\) 均为形状参数。

Remark 5.7

特别地，当 \(a=1,b=1\) 时， \(Be(1,1) = U(0,1)\) 。

import numpy as np
import matplotlib.pyplot as plt


def prompt_float(message: str) -> float:
    while True:
        try:
            return float(input(message))
        except ValueError:
            print("请输入有效的小数。")


def read_parameters():
    print("Uniform Distribution Visualizer")
    while True:
        a = prompt_float("请输入下限 a（可为负，例如 0）: ")
        b = prompt_float("请输入上限 b（需大于 a，例如 1）: ")
        if b <= a:
            print("必须满足 b > a，请重新输入。\n")
            continue
        return a, b


def plot_uniform_distribution(a: float, b: float) -> None:
    width = b - a
    margin = max(width * 0.1, 1.0)
    x_min = a - margin
    x_max = b + margin
    x = np.linspace(x_min, x_max, 1000)
    pdf = np.zeros_like(x)
    mask = (x >= a) & (x <= b)
    pdf[mask] = 1 / (b - a)

    plt.figure(figsize=(10, 6))
    plt.plot(x, pdf, color="skyblue", linewidth=2)
    plt.fill_between(x, pdf, where=mask, color="skyblue", alpha=0.3)
    plt.axvline(x=a, color="gray", linestyle="--", alpha=0.7)
    plt.axvline(x=b, color="gray", linestyle="--", alpha=0.7)
    plt.xlabel("Random variable X")
    plt.ylabel("Density f(X)")
    plt.title(f"Uniform Distribution U({a}, {b})")
    plt.grid(True, alpha=0.3)
    plt.xlim(x_min, x_max)
    plt.ylim(0, max(pdf.max() * 1.2, 0.2))
    plt.tight_layout()
    plt.show()


def main():
    a, b = read_parameters()
    plot_uniform_distribution(a, b)


if __name__ == "__main__":
    main()