🍦 SciPy是在底层使用NumPy的科学计算库，为优化、信号处理等提供实用功能，由NumPy创建者Travis Olliphant创建。

1 SciPy

SciPy
- 命令窗口下使用pip install scipy进行安装。
- 通过import关键字将其导入应用程序：import scipy。
- 检查SciPy版本，版本字符串存在__version__属性下。

import scipy
from scipy import constants                    # 导入

print(constants.liter)
print(scipy.__version__)                       # 检查SciPy版本

2 常量

常量
- PI是科学常数的一个例子，dir()函数可以看到常量模块下所有单元的列表。
- 单位类别
  - 质量：返回以kg为单位的指定单位。
  - 时间：返回以秒为单位的指定单位。
  - 长度：返回以米为单位的指定单位。
  - 角度：返回以弧度为单位的指定单位。
  - 热量：返回以焦耳为单位的指定单位。
  - 功率：返回以瓦特为单位的指定单位。
  - 力量：返回以牛顿为单位的指定单位。
  - 压力：返回以帕斯卡为单位的指定单位。
  - 区域：返回以平方米为单位的指定单位。
  - 体积：返回以立方米为单位的指定单位。
  - 速度：返回以米每秒为单位的指定单位。
  - 温度：返回以开尔文为单位的指定单位。
  - 二进制：返回以字节为单位的指定单位。
  - 公制(SI)：返回以米为单位的指定单位。

from scipy import constants

print(constants.pi)                            # 打印PI的常数值
print(dir(constants))                          # 列出所有常量

2-1 质量与时间

from scipy import constants

print(constants.gram)                         # 0.001，克【质量】
print(constants.carat)                        # 0.0002，克拉
print(constants.metric_ton)                   # 1000.0，公吨
print(constants.long_ton)                     # 1016.0469088，英长吨
print(constants.grain)                        # 6.479891e-05，格令
print(constants.atomic_mass)                  # 1.66053904e-27，原子质量
print(constants.m_u)                          # 1.66053904e-27，质子或原子的原子量(原子质量)
print(constants.u)                            # 1.66053904e-27，相对原子质量
print(constants.short_ton)                    # 907.1847399999999，英短吨
print(constants.stone)                        # 6.3502931799999995，英石
print(constants.troy_pound)                   # 0.37324172159999996，金衡磅
print(constants.lb)                           # 0.45359236999999997，磅
print(constants.pound)                        # 0.45359236999999997，磅
print(constants.oz)                           # 0.028349523124999998，盎司
print(constants.ounce)                        # 0.028349523124999998，盎司
print(constants.troy_ounce)                   # 0.031103476799999998，金衡盎司
print("--------------------")

print(constants.minute)                       # 60.0，分钟【时间】
print(constants.hour)                         # 3600.0，小时
print(constants.day)                          # 86400.0，日
print(constants.week)                         # 604800.0，星期
print(constants.year)                         # 31536000.0，年
print(constants.Julian_year)                  # 31557600.0，儒略年(用于天文学)，与Gregorian year(公历年)略有不同

2-2 长度与角度

from scipy import constants

print(constants.fermi)                        # 1e-15，费米(用于表示原子核大小)【长度】
print(constants.angstrom)                     # 1e-10，埃(用于表示分子和原子的大小)
print(constants.micron)                       # 1e-06，微米
print(constants.nautical_mile)                # 1852.0，海里(用于海上测量)
print(constants.inch)                         # 0.0254，英寸
print(constants.au)                           # 149597870691.0，天文单位(用于表示天体间距离)
print(constants.astronomical_unit)            # 149597870691.0，天文单位(用于表示天体间距离)
print(constants.yard)                         # 0.9143999999999999，码
print(constants.mile)                         # 1609.3439999999998，英里
print(constants.light_year)                   # 9460730472580800.0，光年(用于表示星际距离)
print(constants.survey_foot)                  # 0.3048006096012192，测量英尺(用于土地测量)
print(constants.survey_mile)                  # 1609.3472186944373，测量英里(用于土地测量)
print(constants.foot)                         # 0.30479999999999996，英尺
print(constants.parsec)                       # 3.085677581491367e+16，秒差距(用于表示星际距离)
print(constants.mil)                          # 2.5399999999999997e-05，千分之一英寸
print(constants.pt)                           # 0.00035277777777777776，点
print(constants.point)                        # 0.00035277777777777776，磅
print("----------------------")

print(constants.arcsec)                       # 4.84813681109536e-06，角度的1/3600【角度】
print(constants.arcsecond)                    # 4.84813681109536e-06，角度的1/3600
print(constants.degree)                       # 0.017453292519943295，度
print(constants.arcmin)                       # 0.0002908882086657216，角度的1/60
print(constants.arcminute)                    # 0.0002908882086657216，角度的1/60

2-3 热量与功率

from scipy import constants

print(constants.calorie)                      # 4.184，卡路里【热量】
print(constants.calorie_th)                   # 4.184，热力学卡路里
print(constants.erg)                          # 1e-07，小能量单位(1厘米距离内的力为1达因的物体所具有的能量)
print(constants.calorie_IT)                   # 4.1868，国际标准卡路里
print(constants.ton_TNT)                      # 4184000000.0，爆炸当量单位(一吨三硝基甲苯TNT爆炸时所释放的能量)
print(constants.Btu)                          # 1055.05585262，英国热量单位
print(constants.Btu_IT)                       # 1055.05585262，英国热量单位(国际标准)
print(constants.eV)                           # 1.6021766208e-19，电子伏特
print(constants.electron_volt)                # 1.6021766208e-19，电子伏特
print(constants.Btu_th)                       # 1054.3502644888888，热力学热量单位
print("------------------")

print(constants.hp)                           # 745.6998715822701，马力【功率】
print(constants.horsepower)                   # 745.6998715822701，英制马力

2-4 力量与压力

from scipy import constants

print(constants.dyn)                          # 1e-05，厘米-克-秒制(CGS单位制)【力量】
print(constants.dyne)                         # 1e-05，厘米-克-秒制(CGS单位制)
print(constants.kgf)                          # 9.80665，千克力(公制单位)
print(constants.kilogram_force)               # 9.80665，千克力
print(constants.lbf)                          # 4.4482216152605，英制磅力
print(constants.pound_force)                  # 4.4482216152605，英制磅力
print("------------------")

print(constants.atm)                          # 101325.0，标准大气压【压力】
print(constants.atmosphere)                   # 101325.0，标准大气压
print(constants.bar)                          # 100000.0，巴(用于描述气体、液体和固体的压力)
print(constants.psi)                          # 6894.757293168361，磅力/平方英寸
print(constants.torr)                         # 133.32236842105263，托(用于描述低压气体的压力)
print(constants.mmHg)                         # 133.32236842105263，毫米汞柱

2-5 区域与体积

from scipy import constants

print(constants.hectare)                      # 10000.0，公顷【区域】
print(constants.acre)                         # 4046.8564223999992，英亩(测量土地面积)
print("----------------------")

print(constants.liter)                        # 0.001，升【体积】
print(constants.litre)                        # 0.001，升
print(constants.gallon_imp)                   # 0.00454609，英制加仑
print(constants.fluid_ounce_imp)              # 2.84130625e-05，英制液量盎司
print(constants.barrel)                       # 0.15898729492799998，桶(用于表示原油、石油等其他液体的容量)
print(constants.bbl)                          # 0.15898729492799998，桶(barrel缩写，通常指美国石油桶)
print(constants.gallon)                       # 0.0037854117839999997，加仑
print(constants.gallon_US)                    # 0.0037854117839999997，美制加仑
print(constants.fluid_ounce)                  # 2.9573529562499998e-05，液量盎司
print(constants.fluid_ounce_US)               # 2.9573529562499998e-05，美制液量盎司

2-6 速度与温度

from scipy import constants

print(constants.mach)                         # 340.5，马赫(马赫数为1表物体速度等于声速，即音速)【速度】
print(constants.speed_of_sound)               # 340.5，声速(标准大气压下，空气中的声速约每秒343米)
print(constants.knot)                         # 0.5144444444444445，节
print(constants.kmh)                          # 0.2777777777777778，千米每小时
print(constants.mph)                          # 0.44703999999999994，英里每小时
print("-------------------")

print(constants.zero_Celsius)                 # 273.15，摄氏温标下的零度，即水的冰点温度【温度】
print(constants.degree_Fahrenheit)            # 0.5555555555555556，华氏温标下的温度

2-7 二进制与公制

from scipy import constants

print(constants.kibi)                         # 1024，千字节，KB【二进制】
print(constants.mebi)                         # 1048576，兆字节，MB
print(constants.gibi)                         # 1073741824，吉字节，GB
print(constants.tebi)                         # 1099511627776，太字节，TB
print(constants.pebi)                         # 1125899906842624，拍字节，PB
print(constants.exbi)                         # 1152921504606846976，艾字节，EB
print(constants.zebi)                         # 1180591620717411303424，齐比字节，ZiB
print(constants.yobi)                         # 1208925819614629174706176，尧比字节，YiB
print("-------------------------")

print(constants.deci)                         # 0.1，10的负1次方【公制】
print(constants.centi)                        # 0.01，10的负2次方
print(constants.milli)                        # 0.001，10的负3次方
print(constants.micro)                        # 1e-06，10的负6次方
print(constants.nano)                         # 1e-09，10的负9次方
print(constants.pico)                         # 1e-12，10的负12次方
print(constants.femto)                        # 1e-15，10的负15次方
print(constants.atto)                         # 1e-18，10的负18次方
print(constants.zepto)                        # 1e-21，10的负21次方
print("-------------------------")

print(constants.deka)                         # 10.0，10的1次方【公制】
print(constants.hecto)                        # 100.0，10的2次方
print(constants.exa)                          # 1e+18，10的18次方
print(constants.zetta)                        # 1e+21，10的21次方
print(constants.yotta)                        # 1e+24，10的24次方
print(constants.kilo)                         # 1000.0，10的3次方
print(constants.mega)                         # 1000000.0，10的6次方
print(constants.giga)                         # 1000000000.0，10的9次方
print(constants.tera)                         # 1000000000000.0，10的12次方
print(constants.peta)                         # 1000000000000000.0，10的15次方

3 优化器

优化器
- 在SciPy中定义的一组过程，找到函数的最小值，或找到方程的根。
- NumPy能够为多项式和线性方程求根，但是不能为非线性方程求根。
- 例如：x+cos(x)，使用SciPy的optimize.root功能，接受两个必需的参数。
  - x0，对根的初始猜测。
  - fun，表示方程的函数。
- optimze.root返回一个对象，其中包含有关解决方案的信息。
- 最小化函数
  - 非线性方程下一个函数代表一条曲线。
  - 曲线有高点和低点，高点称为最大值，低点称为最小值。
  - 整个曲线的最高点称为全局最大值，其余的称为局部最大值。
  - 整个曲线的最低点称为全局最小值，其余的称为局部最小值。
- 使用scipy.optimize.minimize()来最小化函数，接受以下参数。
  - x0(根的初始猜测)、fun(方程函数)、options(定义额外参数的字典)、callback(每次优化迭代后调用的函数)。
  - method(表示要使用的方法名称，例如：CG、BFGS、Newton-CG、L-BFGS-B、TNC、COBYLA、SLSQP)。

from math import cos
from scipy.optimize import root
from scipy.optimize import minimize


def eqn(x):
    return x + cos(x)


myroot = root(eqn, 0)                         # 求方程x + cos(x)的根
print(myroot)                                 # 打印有关解决方案的所有信息
print("--------------------------------------------------")
print(myroot.x)                               # 返回的对象包含有关解决方案的更多信息
print("--------------------------------------------------")


def eqnr(y):
    return y**2 + y + 2


mymin = minimize(eqnr, 0, method="BFGS")
print(mymin)                                  # 使用BFGS方法最小化y^2 + y + 2函数

4 稀疏数据

稀疏数据
- 具有大部分未使用元素的数据，元素不携带任何信息。
  - 稀疏数据：即表示大多数项目值为0的数据集。
  - 密集数组：与稀疏数组相反，大多数值不为0。
- 科学计算中处理线性代数的偏导数时会遇到稀疏数据。
- SciPy有一个scipy.sparse模块，用于处理稀疏数据，主要使用两种类型的稀疏矩阵。
  - CSC：压缩稀疏列。
  - CSR：压缩稀疏行，通过将数组传递给scipy.sparse.csr_matrix()函数来创建。
- 还可以使用data属性，用于查看存储的数据，count_nonzero()方法可以计算非零数。
- eliminate_zeros()方法从矩阵中删除零条目，sum_duplicates()方法则消除重复条目。
- tocsc()方法从CSR类型转换为CSC类型，稀疏矩阵同样还支持普通矩阵支持的所有操作。

import numpy as np
from scipy.sparse import csr_matrix

arr1 = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])
print(csr_matrix(arr1))                       # 从数组创建CSR矩阵
print("-------------")

arr2 = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
print(csr_matrix(arr2).data)                  # 查看存储的数据，非零项
print("-------------")
print(csr_matrix(arr2).count_nonzero())       # 计算非零数
print("-------------")

mat1 = csr_matrix(arr2)
mat1.eliminate_zeros()                        # 删除零条目
print(mat1)
print("-------------")

mat2 = csr_matrix(arr2)
mat2.sum_duplicates()                         # 消除重复项
print(mat2)
print("-------------")

arr3 = csr_matrix(arr2).tocsc()               # 从CSR类型转换为CSC类型
print(arr3)

5 图表数据

图表数据
- 使用scipy.sparse.csgraph模块处理图表数据结构。
- 邻接矩阵：Adjacency Matrix，表示顶点之间相邻关系的矩阵。
  - 通过connected_components()方法查找所有连接的组件。
  - 迪杰斯特拉：通过dijkstra()方法在图中找到从一个元素到另一个元素的最短路径。
    - limit(路径的最大权重)、indices(元素的索引，仅返回该元素的所有路径)。
    - return_predecessors(即布尔值，True将返回整个遍历路径，否则为False)。
  - 弗洛伊德·沃歇尔：floyd_warshall()方法可找到所有元素对之间的最短路径。
  - 贝尔曼福特：bellman_ford()方法找到所有元素对之间的最短路径，也可处理负权重。
  - 深度一阶
    - depth_first_order()方法从节点返回深度优先遍历。
    - 这一方法采用了两个参数，即图表和遍历图的起始元素。
  - 广度优先顺序
    - breadth_first_order()方法从节点返回广度优先遍历。
    - 这一方法采用了两个参数，包括图表和遍历图的起始元素。

import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import dijkstra
from scipy.sparse.csgraph import bellman_ford
from scipy.sparse.csgraph import floyd_warshall
from scipy.sparse.csgraph import depth_first_order
from scipy.sparse.csgraph import breadth_first_order
from scipy.sparse.csgraph import connected_components

arr1 = np.array([[0, 1, 2], [1, 0, 0], [2, 0, 0]])
arr2 = csr_matrix(arr1)
print(connected_components(arr2))             # 查找所有连接的组件
print("----------------------------------------------------------")

arr3 = csr_matrix(arr1)                       # 元素1到元素2的最短路径
print(dijkstra(arr3, return_predecessors=True, indices=0))
print("----------------------------------------------------------")

arr4 = csr_matrix(arr1)                       # 所有元素对之间的最短路径
print(floyd_warshall(arr4, return_predecessors=True))
print("----------------------------------------------------------")

arr5 = csr_matrix(arr1)                       # 用给定的负权重找到元素1到元素2的最短路径
print(bellman_ford(arr5, return_predecessors=True, indices=0))
print("----------------------------------------------------------")

arr6 = np.array([
    [0, 1, 0, 1],
    [1, 1, 1, 1],
    [2, 1, 1, 0],
    [0, 1, 0, 1]
])
arr7 = csr_matrix(arr6)
print(depth_first_order(arr7, 1))             # 对于给定的邻接矩阵，首先遍历图深度
print("----------------------------------------------------------")

arr8 = csr_matrix(arr6)
print(breadth_first_order(arr8, 1))           # 对于给定的邻接矩阵，首先遍历图宽度

6 空间数据

空间数据
- 指在几何空间中表示的数据，例如坐标系上的点，SciPy提供了scipy.spatial模块。
- 三角测量：利用三角形计算多边形的面积，通过点生成三角部分的方法用Delaunay()。
- 凸包：覆盖了所有给定点的最小多边形，可以使用ConvexHull()方法进行凸包的创建。
- KDTrees：指针对最近邻查询优化的数据结构，有效地询问了哪些点最接近某个给定点。
  - KDTree()：返回一个KDTree对象(K维空间)。
  - query()：返回最近邻点的距离和邻点的位置。

import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import KDTree
from scipy.spatial import Delaunay
from scipy.spatial import ConvexHull

points1 = np.array([[2, 4], [3, 4], [3, 0], [2, 2], [4, 1]])
simplices = Delaunay(points1).simplices       # simplices属性创建了三角形符号的一般化
plt.triplot(points1[:, 0], points1[:, 1], simplices)
plt.scatter(points1[:, 0], points1[:, 1], color="r")
plt.show()                                    # 为points1矩阵创建三角部分

points2 = np.array([[2, 4], [3, 4], [3, 0], [2, 2], [4, 1],
                    [1, 2], [5, 0], [3, 1], [1, 2], [0, 2]])
hull = ConvexHull(points2)
hull_points = hull.simplices
plt.scatter(points2[:, 0], points2[:, 1])
for simplex in hull_points:
    plt.plot(points2[simplex, 0], points2[simplex, 1], "k-")
plt.show()                                    # 为points2矩阵创建凸包

points3 = [(1, -1), (2, 3), (-2, 3), (2, -3)]
kdtree = KDTree(points3)
res = kdtree.query((1, 1))
print(res)                                    # 找到点(1, 1)的最近邻点

距离矩阵
- 两个向量之间的距离不仅可以是直线长度，还可以是它们与原点的夹角，或所需的单位步数等。
- 欧几里得距离：euclidean()。
- 余弦距离：是A和B两点之间的余弦角值，cosine()。
- 城市街区距离(曼哈顿距离)：采用四方向移动计算的距离，cityblock()。
- 汉明距离：两位不同的位比例，测量二进制序列距离的方法，hamming()。

from scipy.spatial.distance import cosine
from scipy.spatial.distance import hamming
from scipy.spatial.distance import euclidean
from scipy.spatial.distance import cityblock

p1 = (1, 0)
p2 = (10, 2)
res1 = cityblock(p1, p2)                      # 给定点之间的街区距离
print(res1)

res2 = euclidean(p1, p2)                      # 给定点之间的欧几里得距离
print(res2)

p3 = (True, False, True)
p4 = (False, True, True)
res3 = hamming(p3, p4)                        # 给定点之间的汉明距离
print(res3)

res4 = cosine(p1, p2)                         # 给定点之间的余弦距离
print(res4)

7 插值函数

插值函数
- 在给定点之间生成点的方法，例如点1和2，插值并找到点1.33和1.66。
- 机器学习中经常处理数据集中的缺失数据，插值通常用于替换这些值。
- SciPy的scipy.interpolate模块具有许多处理插值的函数。
  - 一维插值：interp1d()，用于对具有1个变量的分布进行插值，点被拟合为曲线。
  - 样条插值：UnivariateSpline()，点被拟合为一个叫样条多项式定义的分段函数。
  - 径向基函数插值：使用Rbf()函数，径向基函数是对应固定参考点进行定义的函数。

import numpy as np
from scipy.interpolate import Rbf
from scipy.interpolate import interp1d
from scipy.interpolate import UnivariateSpline

xs1 = np.arange(10)
ys1 = 2*xs1 + 1
interp_func = interp1d(xs1, ys1)
newarr1 = interp_func(np.arange(2.1, 3, 0.1))
print(newarr1)                                # 对于给定的xs1和ys1插值从2.1、2.2...到2.9
print("------------------------------------------------------------------")

xs2 = np.arange(10)
ys2 = xs2**2 + np.sin(xs2) + 1
interp_func = UnivariateSpline(xs2, ys2)
newarr2 = interp_func(np.arange(2.1, 3, 0.1))
print(newarr2)                                # 为非线性点找到2.1、2.2...2.9的单变量样条插值
print("------------------------------------------------------------------")

xs3 = np.arange(10)
ys3 = xs3**2 + np.sin(xs3) + 1
interp_func = Rbf(xs3, ys3)
newarr3 = interp_func(np.arange(2.1, 3, 0.1))
print(newarr3)                                # 插入xs3和ys3并找到2.1、2.2...2.9的值

8 Matlab数组

Matlab数组
- SciPy提供了模块scipy.io，具有处理Matlab数组的功能。
- savemat()：以Matlab格式导出数据。
  - mdict(包含数据的字典)、filename(即保存数据的文件名)。
  - do_compression(即布尔值，指定是否压缩结果，默认为假)。
- loadmat()：从Matlab文件中导入数据，filename则是必须参数。

import numpy as np
from scipy import io                          # 注意：当前文件夹下必须存在matlab目录，否则报错

arr1 = np.arange(10)                          # 将arr1数组作为变量名vec导出到arr1_file.mat文件
io.savemat("matlab/arr1_file.mat", {"vec": arr1})

arr2 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ])
io.savemat("matlab/arr2_file.mat", {"vec": arr2})
mydata = io.loadmat("matlab/arr2_file.mat")   # savemat导出Export，loadmat导入Import
print(mydata)                                 # 从arr2_file.mat文件导入数组，返回一个结构化数组
print(mydata["vec"])                          # 仅显示来自matlab数据的数组

mydata = io.loadmat("matlab/arr2_file.mat", squeeze_me=True)
print(mydata["vec"])                          # 参数去除额外增加的一维[]

9 显著性测试

显著性测试
- SciPy提供的scipy.stats模块，具有执行统计显著性检验的功能。
- 技术及关键字
  - 统计学假设：关于总体参数的假设。
  - 零假设：假设观察结果在统计上不显著。
  - 替代假设：假设观察结果是由于某种原因，零假设的替代品。
  - 一尾测试：当假设仅测试值的一侧时，称为单尾检验。
  - 二尾测试：当假设正在测试值的两侧时，称二尾测试。
  - 阿尔法值：Alpha值是显著性水平。
  - P值：数据实际与极端的接近程度。
  - 如果P值<=Alpha值，拒绝原假设并说数据具有统计显著性，否则接受原假设。
- T检验(双尾测试)
  - 用于确定两个变量的平均值之间是否存在显著差异，是否属于同一分布。
  - ttest_ind()：用两个相同大小的样本，生成一个t统计量和p值的元组。

import numpy as np
from scipy.stats import ttest_ind

v1 = np.random.normal(size=100)
v2 = np.random.normal(size=100)

res = ttest_ind(v1, v2)                       # 查找给定值v1和v2是否来自同一分布
pes = ttest_ind(v1, v2).pvalue                # 只返回p值
print(res)
print(pes)

9-1 KS测试

KS测试
- KS测试用于检查给定值是否服从分布，可以用作一尾或二尾测试，默认情况下是两个尾的。
- kstest()将要测试的值和CDF作为两个参数，CDF可以是字符串或返回概率的可调用函数。

import numpy as np
from scipy.stats import kstest

v = np.random.normal(size=100)
res = kstest(v, "norm")                        # 查找给定值是否服从正态分布
print(res)

9-2 统计描述

统计描述
- 可使用describe()函数来定义统计描述，用于查看数组中值的摘要。
- 返回参数：观察次数、最小值和最大值、平均值、方差、偏度、峰度。

import numpy as np
from scipy.stats import describe

v = np.random.normal(size=100)
res = describe(v)                              # 显示数组中值的统计描述
print(res)

9-3 正态性检验

正态性检验
- 基于偏度和峰度，使用normaltest()函数返回原假设的p值。
- 偏度：数据对称性的度量，正态分布时是0，负数数据向左倾斜，正数数据向右倾斜。
- 峰度：衡量数据是重尾还是轻尾正态分布的度量，正峰度是重尾，负峰度则轻微拖尾。

import numpy as np
from scipy.stats import normaltest
from scipy.stats import skew, kurtosis

v = np.random.normal(size=100)
print(skew(v))                                 # 偏度
print(kurtosis(v))                             # 峰度
print(normaltest(v))                           # 查找数据是否来自正态分布

Python

#SciPy #科学计算 #显著性检验

Python SciPy

https://stitch-top.github.io/2021/07/09/python/python08-python-scipy/

作者

Dr.626

发布于

2021年7月9日 23:25:33

许可协议

Python Matplotlib 上一篇

Python Pandas 下一篇