大数据小作业 10
原创2025年3月24日大约 2 分钟
代码
import numpy as np
import pandas as pd
# Task 01: 创建 100x6 的 DataFrame,数值在 [-3, 3] 之间
df = pd.DataFrame(np.random.uniform(-3, 3, (100, 6)))
# Task 02: 统计汇总
print("数据统计:")
print(df.describe())
# Task 03: 取前六行
print("\n前6行数据:")
print(df.head(6))
# Task 04: 使用 count() 统计异常值
ab_low = df[df < -2].count()
ab_high = df[df > 2].count()
# 每列总异常值
ab_each_column = ab_low + ab_high
# 异常值总数
ab_total = ab_each_column.sum()
print("\n每列异常值数量:")
print(abnormal_each_column)
print(f"\n异常值总数:{abnormal_total}")
# Task 05: 替换
df_clipped = df.clip(lower=-2, upper=2)
# Task 06: 查看替换后的统计汇总
print("\n替换异常值后的数据统计:")
print(df_clipped.describe())
结果
D:\603\pythonProject\.venv\Scripts\python.exe D:\603\pythonProject\.venv\exel.py
数据统计:
0 1 2 3 4 5
count 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000
mean -0.112085 0.122001 -0.069930 -0.101406 -0.139906 -0.142223
std 1.850131 1.828267 1.617101 1.843482 1.742310 1.706797
min -2.996145 -2.954085 -2.983177 -2.990333 -2.937669 -2.985818
25% -1.876335 -1.301980 -1.175447 -1.596082 -1.518596 -1.656389
50% -0.034308 0.490965 -0.084749 -0.147729 -0.250711 -0.280208
75% 1.637706 1.784660 1.207824 1.661378 1.127157 1.295735
max 2.935492 2.970352 2.872992 2.993210 2.952396 2.782881
前6行数据:
0 1 2 3 4 5
0 0.126067 -2.738518 -0.160284 0.864180 0.357249 2.727832
1 -1.096388 -2.828590 -0.106547 1.106489 -1.701636 2.079736
2 -1.213274 1.050751 -0.915891 2.707627 -1.217941 -2.900689
3 -0.050563 2.716919 -1.750794 -1.579348 2.101684 2.719516
4 -0.018052 -0.825421 0.211574 2.942076 -2.428454 -1.329317
5 0.337108 -0.799997 -2.178480 1.694386 -2.918015 -0.059098
每列异常值数量:
0 38
1 39
2 29
3 38
4 36
5 35
dtype: int64
异常值总数:215
替换异常值后的数据统计:
0 1 2 3 4 5
count 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000
mean -0.051001 0.121375 -0.037447 -0.077101 -0.143032 -0.134711
std 1.584272 1.545277 1.422294 1.531447 1.453672 1.506347
min -2.000000 -2.000000 -2.000000 -2.000000 -2.000000 -2.000000
25% -1.876335 -1.301980 -1.175447 -1.596082 -1.518596 -1.656389
50% -0.034308 0.490965 -0.084749 -0.147729 -0.250711 -0.280208
75% 1.637706 1.784660 1.207824 1.661378 1.127157 1.295735
max 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000
进程已结束,退出代码为 0