Numpy

모플로 2021. 7. 20. 22:19

1. Numpy

Numerical Python
파이썬은 인터프리터 언어이기 때문에 대용량 데이터를 다룰 때 조금 힘들기 때문에 Numpy라는 패키지를 활용
선형대수와 관련된 기능 제공
내부 구조는 C로 되어있음
ndarray(numpy dimension array)라는 단위를 사용
하나의 데이터 type만 사용
Dynamic typing을 지원하지 않음
For문을 사용하지 않고 웬만한면 가능

temp = np.array(["1",2,3], float)
type(temp[0]) -> float64

dtype

배열의 데이터 타입을 리턴함

temp = np.array(["1",2,3], float)
temp.dtype -> float64

shape

dimension의 구성정보

temp = np.array([["1",2,3]], float)
temp.shape -> (1,4)

temp2 = np.array(["1",2,3], float)
temp2.shpae -> (4,)

three tensors

차원이 늘어날수록 뒤로 밀려남

np.array([1,2,3,4], float).shape -> (4,)
np.array([[1,2,3,4],[1,2,3,4]], float).shape -> (2,4)
np.array([[[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              ],float).shape -> (3, 2, 4)

ndim

number of dimension

np.array([[[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              ],float).ndim -> 3

size

전체 데이터의 개수

np.array([[[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              ],float).size -> 24

reshape

데이터의 dimension을 변경

np.array([[[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              ],float).reshape(6,4)
-> 
array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

필요한 row의 개수를 잘 모르기 때문에 size기반으로 row 개수를 선정함

np.array([[[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              [[1,2,3,4],[1,2,3,4]],
              ],float).reshape(-1,2).shape -> (12, 2)

flatten

N차원의 데이터들을 1차원으로 변경

np.array([[1,2,3,4],[1,2,3,4]]).flatten() -> array([1, 2, 3, 4, 1, 2, 3, 4])

indexing

temp = np.array([[1,2,3,4],[1,2,3,4]])

# 두개가 같음
temp[0][0]
temp[0,0]

slicing

temp = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
temp [:,2:]  -> array([[3, 4],[3, 4],[3, 4]])
temp [1,1:3] -> array([2,3])
temp[1:3] -> array([[1, 2, 3, 4]])

arange

array의 범위를 지정하여 list를 생성

np.arange(30) -> array([0,1,2,3,4,5....27,28,29])


# start, end, step
np.arange(0, 5, 0.5) -> array([0, 0.5, 1 ... 4.0, 4.5])

np.arange(30).reshape(-1,5)

ones, zeros, empty

zeros: 0으로 초기화가된 array


np.zeros(shape=(10,), dtype=np.int8) -> [0,0,0,0,0,0,0,0,0,0]

np.zeros((2,5)) -> [[0,0,0,0,0][0,0,0,0,0]]

onse: 1로 초기화된 array
empty:
- shape만 주어지고 비어있는 ndarray 생성
- 초기화가 안되어있어서 쓰레기값이 들어있음

something_like

shape이 같은 새로운 배열을 생성

temp = np.zeros((2,5))  
np.ones_like(temp) -> [[1,1,1,1,1],[1,1,1,1,1]]

identity

단위행렬을 생성


np.identity(n=3, dtype=np.int8)  
-> [[1,0,0],  
[0,1,0],  
[0,0,1]]

eye

대각선이 1일 행렬 생성


# k: 시작 인덱스

np.eye(N=3, M=5, k=2, dtype=np.int8)  
-> array([[0,0,1,0,0],  
[0,0,0,1,0],  
[0,0,0,0,1]])

diag

대각선의 위치에 있는 값을 가져옴


matrix = np.arange(9).reshape(3,3)

# k: 시작 인덱스

np.diag(matrix, k=1)  
-> [1,5]

random sampling

데이터 분포에 따른 sampling으로 array 생성
균등분포

np.random.uniform(0,1,10).reshape(2,5)

정규분포

np.random.normal(0,1,10).reshape(2,5)

sum

모든 값을 더함


temp = np.arange(1,11)  
temp.sum(dtype=np.float)  
-> 55

axis

모든 operation function을 실행할 때 기준이 되는 차원의 축
axis 0은 항상 새로(늦게) 추가된 dimension이라고 보면됨, 기존에 있던 차원들이 한차원씩 밀린다고 생각(axis +1)

temp = np.arange(1,13).reshape(3,4)

temp.sum(axis=1)  
-> [10,26,42]

temp.sum(axis=0)  
-> [15,18,21,24]

mathematical functions

수학 연산들이 많음

concat

속도는 python의 list가 더 빠름
- 이유: 저장공간을 계속 확보해야하기 때문, 파이썬은 연결리스트라 다음 위치를 가리키기만 하면됨
stack

vstack (vertical)

a = np.array([1,2,3])  
b = np.array([4,5,6])  
np.vstack((a,b))  
-> [[1,2,3],[4,5,6]]

hstack (horizontal)

a = np.array([[1],[2],[3]])  
b = np.array([[4],[5],[6]])  
np.hstack((a,b))  
-> [[1,4],[2,5],[3,6]]

concatenate

a = np.array([[1,2,3]])  
b = np.array([[4,5,6]])  
np.concatenate((a,b), axis=0)  
-> [[1,2,3],[4,5,6]]

a = np.array([[1,2], [3,4]])  
b = np.array([[5,6]])

np.concatenate( (a,b.T), axis=1)  
-> [[1,2,5],[3,4,6]]

# T속성은 전치행렬

사칙연산

넘파이는 기본적으로 같은 위치에 있는 값끼리 사칙연산을함

broadcasting

shape이 다른 두개를 연산 할 경우 모든 위치에 더해줌

temp = np.array([1,2,3])  
temp+3  
-> [4,5,6]

temp = np.array([1,2,3])  
temp2 = np.array([2,3,4])  
temp + temp2  
-> [3, 5, 7]

all & any

값을 비교하는데 broadcasting이 일어남
all은 모든값이 조건에 부합하는지
any는 한개라도 조건에 부합하는지

temp = np.arange(10)  
temp > 5 -> [False, False, False ... True, True]

np.all(temp>5) -> False

np.any(temp>5) -> True

where

인덱스를 뽑음

a = np.array([1,2,3])

1) 조건문, True일 경우, False일 경우  
np.where(a>1, 5, 6)  
-> [6,5,5]

2) 인덱스값 반환  
np.where(a>1)  
-> (array([1,2],),)

argmax, argmin

argmax: 제일 큰값
argmin: 제일 작은값

a = np.array([1,2,13,0,5,6,7])  
np.argmax(a) -> 2  
np.argmin(a) -> 3

# 축에서 제일 큰값 작은값

a = np.array([[1,7,3],[4,5,6]])  
np.argmax(a, axis=0) -> [4,7,6]  
np.argmax(a, axis=1) -> [7,6]

boolean index

넘파이 배열은 특정 조건에 따른 값을 배열 형태로 추출 가능
comparison operation 함수들도 모두 사용가능
값을 뽑음

temp = np.array([1,2,3,4,5])  
temp[temp>3]  
-> [4,5]

fancy index

array를 index value로 사용해서 값을 추출하는 방법

a = np.array([2,4,6,8], float)  
b = np.array([0,0,1,3,2],int)  
a[b]  
-> [2,2,4,8,6]

#위의 방법과 같음  
a.take(b)  
-> [2,2,4,8,6]

#2차원  
a = np.array([[1,4],[9,16]], float)  
b = np.array([0,0,1,1,0], int)  
c = np.array([0,1,1,1,1], int)

a[b,c]  
-> [1,0,16,16,4]

save

pickle을 사용

Numpy

목차

1. Numpy

dtype

shape

three tensors

ndim

size

reshape

flatten

indexing

slicing

arange

ones, zeros, empty

something_like

identity

eye

diag

random sampling

sum

axis

mathematical functions

concat

사칙연산

broadcasting

all & any

where

argmax, argmin

boolean index

fancy index

save