CUDA 추력 - 실행 지수

내가 CUDA 추력을 사용하여 파일 내에서 보고서 실행의 발생을 생산하는 "실행 길이 인코더를"구축을 위해 노력하고 실행 길이 인코딩. 이 "보고서"를 사용하여 나중에 런타임 길이 인코딩 단계를 수행합니다.CUDA 추력 - 실행 지수

입력 서열

inputSequence = [a, a, b, c, a, a, a];

의 출력 시퀀스 :

runChar = [a, a]; 
runCount = [2, 3]; 
runPosition = [0, 4];

출력은 2의 실행 위치 0에서 시작하는 그리고 3의 실행이 위치에서 시작있어 desribes 4.

아래에 설명 된 추력 런 길이 인코더 예제는 두 개의 배열을 출력합니다. 하나는 출력 char 용이고 다른 하나는 길이 용입니다.

2 개 미만의 실행은 제외되고 각 실행이 발생하는 위치도 출력되도록 수정하고 싶습니다. 이미 표시된 것을 구축

// input data on the host 
    const char data[] = "aaabbbbbcddeeeeeeeeeff"; 

    const size_t N = (sizeof(data)/sizeof(char)) - 1; 

    // copy input data to the device 
    thrust::device_vector<char> input(data, data + N); 

    // allocate storage for output data and run lengths 
    thrust::device_vector<char> output(N); 
    thrust::device_vector<int> lengths(N); 

    // print the initial data 
    std::cout << "input data:" << std::endl; 
    thrust::copy(input.begin(), input.end(), std::ostream_iterator<char>(std::cout, "")); 
    std::cout << std::endl << std::endl; 

    // compute run lengths 
    size_t num_runs = thrust::reduce_by_key 
            (input.begin(), input.end(),   // input key sequence 
            thrust::constant_iterator<int>(1), // input value sequence 
            output.begin(),      // output key sequence 
            lengths.begin()      // output value sequence 
            ).first - output.begin();   // compute the output size 

    // print the output 
    std::cout << "run-length encoded output:" << std::endl; 
    for(size_t i = 0; i < num_runs; i++) 
     std::cout << "(" << output[i] << "," << lengths[i] << ")"; 
    std::cout << std::endl; 

    return 0;

출처

2014-11-18 rmitchell

한 가지 가능한 방법 :

이 출력 길이를 가지고, 그 상에 exclusive_scan을한다. 이렇게하면 각 실행의 시작 인덱스에 해당하는 벡터가 작성됩니다.
스트림 압축 (remove_if)을 사용하여 해당 길이가 1 인 모든 배열 (출력, 길이 및 색인)에서 요소를 제거합니다.이 작업은 출력 및 색인 정리를위한 첫 번째 remove_if 작업과 길이는 스텐실로, 두 번째는 길이에 직접 작동합니다. 3 개 모두를 한 번에 조작하면 출력 길이 계산이 조금 더 복잡해질 수 있습니다. 이것을 정확히 처리하는 방법은 보유하려는 데이터 집합에 따라 달라집니다. 여기

는 완벽하게 작동 예입니다 코드 확장 :

$ cat t601.cu 
#include <iostream> 
#include <thrust/device_vector.h> 
#include <thrust/copy.h> 
#include <thrust/reduce.h> 
#include <thrust/scan.h> 
#include <thrust/iterator/constant_iterator.h> 
#include <thrust/iterator/zip_iterator.h> 

struct is_not_one{ 

template <typename T> 
    __host__ __device__ 
    bool operator()(T data){ 
    return data != 1; 
    } 
}; 

int main(){ 

// input data on the host 
    const char data[] = "aaabbbbbcddeeeeeeeeeff"; 

    const size_t N = (sizeof(data)/sizeof(char)) - 1; 

    // copy input data to the device 
    thrust::device_vector<char> input(data, data + N); 

    // allocate storage for output data and run lengths 
    thrust::device_vector<char> output(N); 
    thrust::device_vector<int> lengths(N); 

    // print the initial data 
    std::cout << "input data:" << std::endl; 
    thrust::copy(input.begin(), input.end(), std::ostream_iterator<char>(std::cout, "")); 
    std::cout << std::endl << std::endl; 

    // compute run lengths 
    size_t num_runs = thrust::reduce_by_key 
            (input.begin(), input.end(),   // input key sequence 
            thrust::constant_iterator<int>(1), // input value sequence 
            output.begin(),      // output key sequence 
            lengths.begin()      // output value sequence 
            ).first - output.begin();   // compute the output size 

    // print the output 
    std::cout << "run-length encoded output:" << std::endl; 
    for(size_t i = 0; i < num_runs; i++) 
     std::cout << "(" << output[i] << "," << lengths[i] << ")"; 
    std::cout << std::endl; 

    thrust::device_vector<int> indexes(num_runs); 
    thrust::exclusive_scan(lengths.begin(), lengths.begin()+num_runs, indexes.begin()); 
    thrust::device_vector<char> foutput(num_runs); 
    thrust::device_vector<int> findexes(num_runs); 
    thrust::device_vector<int> flengths(num_runs); 
    thrust::copy_if(thrust::make_zip_iterator(thrust::make_tuple(output.begin(), indexes.begin())), thrust::make_zip_iterator(thrust::make_tuple(output.begin()+num_runs, indexes.begin()+num_runs)), lengths.begin(), thrust::make_zip_iterator(thrust::make_tuple(foutput.begin(), findexes.begin())), is_not_one()); 
    size_t fnum_runs = thrust::copy_if(lengths.begin(), lengths.begin()+num_runs, flengths.begin(), is_not_one()) - flengths.begin(); 
    std::cout << "output: " << std::endl; 
    thrust::copy_n(foutput.begin(), fnum_runs, std::ostream_iterator<char>(std::cout, ",")); 
    std::cout << std::endl << "lengths: " << std::endl; 
    thrust::copy_n(flengths.begin(), fnum_runs, std::ostream_iterator<int>(std::cout, ",")); 
    std::cout << std::endl << "indexes: " << std::endl; 
    thrust::copy_n(findexes.begin(), fnum_runs, std::ostream_iterator<int>(std::cout, ",")); 
    std::cout << std::endl; 

    return 0; 

} 
$ nvcc -arch=sm_20 -o t601 t601.cu 
$ ./t601 
input data: 
aaabbbbbcddeeeeeeeeeff 

run-length encoded output: 
(a,3)(b,5)(c,1)(d,2)(e,9)(f,2) 
output: 
a,b,d,e,f, 
lengths: 
3,5,2,9,2, 
indexes: 
0,3,9,11,20, 
$

나는이 코드에 향상 될 수 있다는 확신을, 그러나 나의 목적은 당신에게 하나의 가능한 일반적인 접근 방식을 보여주는 것입니다.

제 생각에는 나중에 참조 할 수 있도록 샘플 코드의 포함 헤더를 제거하는 것이 도움이되지 않습니다. 나는 완전한 컴파일 가능한 코드를 제공하는 것이 더 낫다고 생각한다. 이 경우 큰 문제는 아닙니다.

또한 run length encoding 및 decoding에 대한 추력 예제 코드가 있습니다.

출처

2014-11-18 05:45:45

CUDA 추력 - 실행 지수

답변

관련 문제