100 numpy exercisesの解説 76~100

100 numpy exercisesの解説 51~75 - minus9d’s diary の続きです。引き続き、 numpy-100/100 Numpy exercises.ipynb at master · rougier/numpy-100 · GitHub を片手にご覧ください。

76. Consider a one-dimensional array Z, build a two-dimensional array whose first row is (Z[0],Z[1],Z[2]) and each subsequent row is shifted by 1 (last row should be (Z[-3],Z[-2],Z[-1])

例えばZ = np.array([0, 1, …, 9])のとき、以下のような行列を作るのが目標です。

この回答例を理解するためには、strideという概念を理解する必要があります。

stridesとは、「Pythonによるデータ分析入門」を読む　NumPy編 - 車輪を再発明によると、"「『隣』の要素にアクセスするために、ポインタをいくら動かせばいいか」を表す数値"です。例えば、ある3x4行列のstrideをnp.arange(12).reshape(3,4).stridesというコードで調べると、(32,8)とわかります。8の意味は、「ポインタを一個右に動かすためには8バイト移動する」という意味で、32の意味は、「ポインタを一個下に動かすためには32バイト移動する」という意味です。np.arange(12).reshape(3,4)で生成した行列のdtypeはint64なので、計算が合っています。

回答例で使われているstride_tricks.as_stridedは、stridesの値を任意に指定することで、ポインタの進め方を偽装する関数です。回答例を以下に示します。

def rolling(a, window):
    shape = (a.size - window + 1, window)  # この場合 (8, 3)
    strides = (a.itemsize, a.itemsize)     # この場合 (8, 8)
    return stride_tricks.as_strided(a, shape=shape, strides=strides)  # (a)
Z = rolling(np.arange(10), 3)

式(a)は、「shapeが(10,)であるndarray aを、stridesを(8, 8)と偽装して、shapeが(8, 3)になるように舐めていく」という意味になります。stridesが(8, 8)ということは、「ポインタを一個右に動かすにも、ポインタを一個下に動かすにも、8バイト移動する」ということになります。

79. Consider 2 sets of points P0,P1 describing lines (2d) and a point p, how to compute distance from p to each line i (P0[i],P1[i])?

P0とP1とを結ぶ直線と、点pとの距離を求める問題です。想定解では10組分のP0, P1を考えているのでややこしいコードになっていますが、解法自体は高校数学の範囲です。

はてなブログで数式を綴るのが苦痛なので、ここからは画像で。

80. Consider an arbitrary array, write a function that extract a subpart with a fixed shape and centered on a given element (pad with a fill value when necessary)

理解できていません…。

81. Consider an array Z = [1,2,3,4,5,6,7,8,9,10,11,12,13,14], how to generate an array R = [[1,2,3,4], [2,3,4,5], [3,4,5,6], …, [11,12,13,14]]?

なぜかQ76と実質問題です。

82. Compute a matrix rank

matrix_rank()という関数が提供されている (numpy.linalg.matrix_rank — NumPy v1.13.dev0 Manual ) のでこれを使うのが良さそうです。

84. Extract all the contiguous 3x3 blocks from a random 10x10 matrix

Q76, Q81の応用編です。

86. Consider a set of p matrices wich shape (n,n) and a set of p vectors with shape (n,1). How to compute the sum of of the p matrix products at once? (result has shape (n,1))

テンソル積を行う関数tensordot()を用いて、pnnのテンソルと、pn1のテンソルの積をとり、n*1の結果をえる問題です。tensordot()についてはpython - Understanding tensordot - Stack Overflowを見るとよくイメージがわくと思います。

88. How to implement the Game of Life using numpy arrays?

ライフゲームの実装。各セルについて隣り合う8セルの状況を調べるのに、8通りにずらしたarrayを足し合わせる技に感心しました。

89. How to get the n largest values of an array

np.argsort(arr)は、配列arrを昇順にソートしたときの、配列のインデックスを返します。

Z = [9, 12, 16, 10, 14, 1, 4, 18, 13, 5]
print(np.argsort(Z))  # "[5 6 9 0 3 1 8 4 2 7]" とprintされる

np.argpartition(arr, idx)は、配列arrのうち、値が小さいものidx + 1個分のインデックスを返します。

Z = [9, 12, 16, 10, 14, 1, 4, 18, 13, 5]
print(np.argpartition(Z, 3))  # "[5 9 6 0 1 3 2 7 8 4] とprintされる

この場合、[5 9 6 0]までが、もっとも小さい要素ベスト4に対応するインデックスです。必ずしも小さい順になっている保証はないことに注意です。また、以降の[1 3 2 7 8 4]は何の意味もない値のはずです。

90. Given an arbitrary number of vectors, build the cartesian product (every combinations of every item)

デカルト積(=直積)を実装する問題です。pythonの標準ライブラリだと

list(itertools.product([1, 2, 3], [4, 5], [6, 7]))

というコードで直積を実現できますが、これをnumpyで実装しようという問題です。

回答例は理解できていません…。

92. Consider a large vector Z, compute Z to the power of 3 using 3 different methods

自分の手元では以下の結果でした。Q69に続き、np.einsum()は高速です。

1 loop, best of 3: 2.68 s per loop
1 loop, best of 3: 452 ms per loop
1 loop, best of 3: 299 ms per loop

93. Consider two arrays A and B of shape (8,3) and (2,2). How to find rows of A that contain elements of each row of B regardless of the order of the elements in B?

Aを構成する各行のうち、「Bを構成するどの行とも、共通する要素が存在する」ような行を探すという問題です。一見してわかりにくい問題なので、例を見ます。

np.random.seed(0)
A = np.random.randint(0,5,(8,3))
B = np.random.randint(0,5,(2,2))

で生成されるのは

A
[[4 0 3]
 [3 3 1]
 [3 2 4]
 [0 0 4]
 [2 1 0]
 [1 1 0]
 [1 4 3]
 [0 3 0]]
B
[[2 3]
 [0 1]]

です。Aの最初の行[4 0 3]は、3がBの0行目に、1がBの1行目に存在するので、題意を満たします。一方、Aの3行目[3 2 4]は、Bの1行目の要素を一つも持たないため、題意を満たしません。

回答例をうまく説明することが難しいので、中間値を示すことでお茶を濁します。

C = A[..., np.newaxis, np.newaxis] == B

により、8x3x3x3のTrue, False表を作成します。さらに

C.any((3,1))

として

[[ True  True]
 [ True  True]
 [ True False]
 [False  True]
 [ True  True]
 [False  True]
 [ True  True]
 [ True  True]]

C.any((3,1)).all()

として

[ True  True False False  True False  True  True]

となります。

96. Given a two dimensional array, how to extract unique rows?

理解できていません…。

98. Considering a path described by two vectors (X,Y), how to sample it using equidistant samples (★★★)?

ある軌跡の座標列がベクトルX, Yで与えられるとき、この軌跡をリサンプルして補間する？という問題のようです。要再読

99. Given an integer n and a 2D array X, select from X the rows which can be interpreted as draws from a multinomial distribution with n degrees, i.e., the rows which only contain integers and which sum to n.

行列Xの各行のうち、整数のみを含み、合計がnになる行を探す問題です。M = np.logical_and.reduce(np.mod(X, 1) == 0, axis=-1)が全要素が整数であるという条件を、(X.sum(axis=-1) == n)が合計nであるという条件を表しています。

100. Compute bootstrapped 95% confidence intervals for the mean of a 1D array X (i.e., resample the elements of an array with replacement N times, compute the mean of each sample, and then compute percentiles over the means).

bootstrapとは、与えられた一つのサンプルから、シミュレーションにより統計値を求める方法です。

X = np.random.randn(100)
- 長さ100のサンプルが一つで与えられます。
N = 1000; idx = np.random.randint(0, X.size, (N, X.size))
- 1000回、復元抽出ありで、100個の要素を取り出すシミュレーションを実施します。
means = X[idx].mean(axis=1)
- 1000回のシミュレーションのそれぞれで取り出した要素の平均値をとります
confint = np.percentile(means, [2.5, 97.5])
- 1000個分の平均値をソートして、中央の95%の範囲をとったものが、95%の信頼区間になります。