numpy 매핑 작업의 성능 향상

첫 번째 차원이 (R, G, B, A) 네 쌍둥이를 나타내는 크기가 작은 배열 (4, X, Y)이 있습니다. 제 목표는 각 X*Y RGBA 네 쌍둥이를 X*Y 부동 소수점 값으로 변환하는 것입니다. 다음numpy 매핑 작업의 성능 향상

나의 현재 코드 :

codeTable = { 
    (255, 255, 255, 127): 5.5, 
    (128, 128, 128, 255): 6.5, 
    (0 , 0 , 0 , 0 ): 7.5, 
} 

for i in range(0, rows): 
    for j in range(0, cols): 
     new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999)

data 크기 (4, rows, cols)의 NumPy와 배열이다

및 new_data 크기 (rows, cols)이다.

코드는 정상적으로 작동하지만 꽤 오랜 시간이 걸립니다. 이 코드를 어떻게 최적화해야합니까? 다음은 예상 된 결과를 반환하는 접근 방식이다

import numpy 

codeTable = { 
    (253, 254, 255, 127): 5.5, 
    (128, 129, 130, 255): 6.5, 
    (0 , 0 , 0 , 0 ): 7.5, 
} 

# test data 
rows = 2 
cols = 2 
data = numpy.array([ 
    [[253, 0], [128, 0], [128, 0]], 
    [[254, 0], [129, 144], [129, 0]], 
    [[255, 0], [130, 243], [130, 5]], 
    [[127, 0], [255, 120], [255, 5]], 
]) 

new_data = numpy.zeros((rows,cols), numpy.float32) 

for i in range(0, rows): 
    for j in range(0, cols): 
     new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999) 

# expected result for `new_data`: 
# array([[ 5.50000000e+00, 7.50000000e+00], 
#  [ 6.50000000e+00, -9.99900000e+03], 
#  [ 6.50000000e+00, -9.99900000e+03], dtype=float32)

출처

2016-06-04 Kévin Lesénéchal

어떻게 효율적이고 간결하게 문제를 해결하는 데 사용할 수있는 list.index의 벡터화 차 배열 할 수있는 변형을 포함 많은'rows'와'cols'가 있습니까? – Will

@Will 각각 수천 개. –

아마도 도움이 될 것입니다. http://stackoverflow.com/questions/36480358/whats-a-fast-non-loop-way-to-apply-a-dict-to-a-ndarray-meaning-use-elements – hpaulj

, 그러나 이것은 빨리 당신을위한 것입니다 경우 데이터 등 소량으로 알고 열심히 : 여기

전체 예입니다. 그러나 double for loop를 피 했으므로 꽤 빠른 속도 향상을 볼 수있을 것이라고 생각합니다.

import numpy 
import pandas as pd 


codeTable = { 
    (253, 254, 255, 127): 5.5, 
    (128, 129, 130, 255): 6.5, 
    (0 , 0 , 0 , 0 ): 7.5, 
} 

# test data 
rows = 3 
cols = 2 
data = numpy.array([ 
    [[253, 0], [128, 0], [128, 0]], 
    [[254, 0], [129, 144], [129, 0]], 
    [[255, 0], [130, 243], [130, 5]], 
    [[127, 0], [255, 120], [255, 5]], 
]) 

new_data = numpy.zeros((rows,cols), numpy.float32) 

for i in range(0, rows): 
    for j in range(0, cols): 
     new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999) 

def create_output(data): 
    # Reshape your two data sources to be a bit more sane 
    reshaped_data = data.reshape((4, -1)) 
    df = pd.DataFrame(reshaped_data).T 

    reshaped_codeTable = [] 
    for key in codeTable.keys(): 
     reshaped = list(key) + [codeTable[key]] 
     reshaped_codeTable.append(reshaped) 
    ct = pd.DataFrame(reshaped_codeTable) 

    # Merge on the data, replace missing merges with -9999 
    result = df.merge(ct, how='left') 
    newest_data = result[4].fillna(-9999) 

    # Reshape 
    output = newest_data.reshape(rows, cols) 
    return output 

output = create_output(data) 
print(output) 
# array([[ 5.50000000e+00, 7.50000000e+00], 
#  [ 6.50000000e+00, -9.99900000e+03], 
#  [ 6.50000000e+00, -9.99900000e+03]) 

print(numpy.array_equal(new_data, output)) 
# True

출처

2016-06-05 17:31:23

당신의 솔루션은 사각형 입력 데이터에 대해서만 작동하는 것처럼 보이고,'cols! = rows' 일 때는 작동하지 않습니다. 하지만 아이디어를 주셔서 감사 드리며 조사하겠습니다. 어쨌든 속도는 내 순진한 이중 루프 솔루션보다 훨씬 만족 스럽습니다. –

수정 됨! 이제 요청 된 수의 행과 열을 사용하게됩니다. –

글쎄, 다른 데이터 셰이프에서는 코드가 작동하지 않습니다. 나는 초기 메시지를 좀 더 복잡한 예제로 업데이트했다. 코드가 올바른 결과를 반환하지만 출력 배열의 잘못된 위치에 반환됩니다. –

numpy_indexed 패키지 (면책 조항 : 나는 그것의 저자) :

import numpy_indexed as npi 
map_keys = np.array(list(codeTable.keys())) 
map_values = np.array(list(codeTable.values())) 
indices = npi.indices(map_keys, data.reshape(4, -1).T, missing='mask') 
remapped = np.where(indices.mask, -9999, map_values[indices.data]).reshape(data.shape[1:])

출처

2016-06-12 09:27:02

당신의 솔루션은 매력처럼 작동하는 것 같습니다. 감사! 나중에 성능 향상에 대해 설명하겠습니다. –

성능 비교를 기대합니다! –

numpy 매핑 작업의 성능 향상

답변

관련 문제