numpy基础

1
2
3
4
5
6
7
8
9
10
11
12
13
14
NumPy is the fundamental package for scientific computing with Python. 

It contains among other things:

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data.

Arbitrary data-types can be defined.

This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

废话不多说,numpy的重要性不言而喻,直接看代码。

1
2
3
4
5
import numpy as np

data = np.arange(12).reshape(3,4)

data
1
2
3
array([[ 0,  1,  2,  3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
  • 默认的操作是element wise的
1
data * 10
1
2
3
array([[  0,  10,  20,  30],
[ 40, 50, 60, 70],
[ 80, 90, 100, 110]])
1
data.shape
1
(3, 4)
1
data.dtype
1
dtype('int64')
1
data.ndim
1
2
  • 布尔型数组用于索引
1
2
data[data<5] = 0
data
1
2
3
array([[ 0,  0,  0,  0],
[ 0, 5, 6, 7],
[ 8, 9, 10, 11]])
1
data.T
1
2
3
4
array([[ 0,  0,  8],
[ 0, 5, 9],
[ 0, 6, 10],
[ 0, 7, 11]])
1
np.dot(data.T,data)
1
2
3
4
array([[ 64,  72,  80,  88],
[ 72, 106, 120, 134],
[ 80, 120, 136, 152],
[ 88, 134, 152, 170]])

一元ufunc

1
np.sqrt(data)
1
2
3
array([[ 0.        ,  0.        ,  0.        ,  0.        ],
[ 0. , 2.23606798, 2.44948974, 2.64575131],
[ 2.82842712, 3. , 3.16227766, 3.31662479]])
1
np.exp(data)
1
2
3
4
5
6
array([[  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
1.00000000e+00],
[ 1.00000000e+00, 1.48413159e+02, 4.03428793e+02,
1.09663316e+03],
[ 2.98095799e+03, 8.10308393e+03, 2.20264658e+04,
5.98741417e+04]])

二元ufunc

1
2
x = np.random.randn(8)
x
1
2
array([ 1.19911478, -0.87448992,  0.32508143, -0.29231038,  0.78727289,
-0.57976679, -0.65701734, -0.03742968])
1
2
y = np.random.randn(8)
y
1
2
array([ 0.83002571, -0.11149519, -0.26294768, -0.18584947,  0.04441277,
-1.46969178, -0.84550823, -1.481617 ])
1
2
z = np.power(x,y)
z
1
2
array([ 1.16266997,         nan,  1.34375634,         nan,  0.98943356,
nan, nan, nan])

三元ufunc where

1
2
3
4
5
6
x = np.array([1.1,1.2,1.3,1.4,1.5])
y = np.array([2.1,2.2,2.3,2.4,2.5])
cond = np.array([True,False,True,True,False])

result = np.where(cond,x,y)
result
1
array([ 1.1,  2.2,  1.3,  1.4,  2.5])

数学和统计方法

1
2
data = np.arange(12).reshape(3,4)
data
1
2
3
array([[ 0,  1,  2,  3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
1
data.mean()
1
5.5
1
data.mean(axis=0)
1
array([ 4.,  5.,  6.,  7.])
1
data.mean(axis=1)
1
array([ 1.5,  5.5,  9.5])
1
data.sum(axis=1)
1
array([ 6, 22, 38])

累加

1
data.cumsum(0)
1
2
3
array([[ 0,  1,  2,  3],
[ 4, 6, 8, 10],
[12, 15, 18, 21]])

累乘

1
data.cumprod(0)
1
2
3
array([[  0,   1,   2,   3],
[ 0, 5, 12, 21],
[ 0, 45, 120, 231]])

用于布尔型数组的方法

1
2
arr = np.random.randn(100)
(arr>0).sum() #正值的数量
1
57

any 和 all :

  • any用于测试数组中是否存在一个或者多个True,
  • all用于检测数组中所有值是否都是True
1
2
bools = np.array([False,False,True,False])
bools.any()
1
True
1
bools.all()
1
False
1
2
arr = np.random.randn(5,3)
arr
1
2
3
4
5
array([[ 0.3300801 , -0.1068226 , -0.45340813],
[-1.70614802, -1.15734883, -0.45054381],
[ 1.04189531, -0.40779194, -0.95224459],
[-0.88946027, 1.61607895, 0.53116522],
[ 0.35602213, 0.61704588, -0.63842772]])
1
2
arr.sort(1)
arr
1
2
3
4
5
array([[-0.45340813, -0.1068226 ,  0.3300801 ],
[-1.70614802, -1.15734883, -0.45054381],
[-0.95224459, -0.40779194, 1.04189531],
[-0.88946027, 0.53116522, 1.61607895],
[-0.63842772, 0.35602213, 0.61704588]])

线性代数

唯一化

找出数组中的唯一值,并且返回已经排序的结果。

1
2
names = np.array(['Bob','Bob','Cer',"Vivien","Cer"])
np.unique(names)
1
2
array(['Bob', 'Cer', 'Vivien'],
dtype='|S6')

文件读写

1
2
3
np.save("names",names)

np.load("names.npy")
1
array(['Bob', 'Bob', 'Cer', 'Vivien', 'Cer'], \n      dtype='|S6')

使用 np.c_[] 和 np.r_[] 分别添加行和列

1
2
3
4
5
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.ones(3)
c = np.arange(10,19).reshape(3,3)

np.c_[a,b] #添加列
1
2
3
array([[ 1.,  2.,  3.,  1.],
[ 4., 5., 6., 1.],
[ 7., 8., 9., 1.]])
1
np.r_[a,b] #添加行(错误)
1
ValueError: all the input arrays must have same number of dimensions
1
np.r_[a,b.reshape(1,3)] #添加行(正确),维度要一致,形状要对
1
2
3
4
array([[ 1.,  2.,  3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[ 1., 1., 1.]])
1
np.c_[a,b.reshape(3,1)]
1
2
3
array([[ 1.,  2.,  3.,  1.],
[ 4., 5., 6., 1.],
[ 7., 8., 9., 1.]])

使用insert

使用insert更简洁

1
np.insert(a, 0, values=b, axis=1)
1
2
3
array([[1, 1, 2, 3],
[1, 4, 5, 6],
[1, 7, 8, 9]])
1
np.insert(a, 1, values=b, axis=1)
1
2
3
array([[1, 1, 2, 3],
[4, 1, 5, 6],
[7, 1, 8, 9]])
1
np.insert(a, 0, values=b, axis=0)
1
2
3
4
array([[1, 1, 1],
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])