다양한 범위의 Cuda 커널에서 임의의 숫자 생성

나는 cuda 커널에서 난수 난수를 생성하려고합니다. 나는 균일 분포와 무작위 수를 1에서부터 8까지의 정수 형태로 생성하고 싶다. 난수는 각각의 쓰레드마다 다를 것이다. 난수를 생성 할 수있는 범위는 스레드마다 다릅니다. 한 스레드에서 범위의 최대 값은 2로 낮을 수도 있고 다른 스레드에서 8보다 높을 수도 있지만 그보다 높지는 않습니다. 그래서, 내가 숫자가 생성 얻을하는 방법의 아래의 예를 제공하고있다 :다양한 범위의 Cuda 커널에서 임의의 숫자 생성

In thread#1 --> maximum of the range is 2 and so the random number should be between 1 and 2 
In thread#2 --> maximum of the range is 6 and so the random number should be between 1 and 6 
In thread#3 --> maximum of the range is 5 and so the random number should be between 1 and 5

등등 ...

어떤 도움 것은 매우 감사하겠습니다. 고맙습니다.

출처

2013-08-29 duttasankha

편집 : 나는 내 대답을 편집 한 결함 중 일부는 다른 답변 (@tudorturcu)과 의견에 지적 해결합니다.

사용 CURAND 나서 원하는 범위를 곱함으로써 (최대 값 - 최소 값 + 0.999999)가 uniform distribution
그런 다음 오프셋 (+ 최소값)을 추가하십시오.
그런 다음 정수로 자릅니다. 장치 코드에서이 같은

뭔가 :

int idx = threadIdx.x+blockDim.x*blockIdx.x; 
// assume have already set up curand and generated state for each thread... 
// assume ranges vary by thread index 
float myrandf = curand_uniform(&(my_curandstate[idx])); 
myrandf *= (max_rand_int[idx] - min_rand_int[idx] + 0.999999); 
myrandf += min_rand_int[idx]; 
int myrand = (int)truncf(myrandf);

당신이해야 :

#include <math.h>

여기 truncf

위한 완벽했다 예입니다 :

$ cat t527.cu 
#include <stdio.h> 
#include <curand.h> 
#include <curand_kernel.h> 
#include <math.h> 
#include <assert.h> 
#define MIN 2 
#define MAX 7 
#define ITER 10000000 

__global__ void setup_kernel(curandState *state){ 

    int idx = threadIdx.x+blockDim.x*blockIdx.x; 
    curand_init(1234, idx, 0, &state[idx]); 
} 

__global__ void generate_kernel(curandState *my_curandstate, const unsigned int n, const unsigned *max_rand_int, const unsigned *min_rand_int, unsigned int *result){ 

    int idx = threadIdx.x + blockDim.x*blockIdx.x; 

    int count = 0; 
    while (count < n){ 
    float myrandf = curand_uniform(my_curandstate+idx); 
    myrandf *= (max_rand_int[idx] - min_rand_int[idx]+0.999999); 
    myrandf += min_rand_int[idx]; 
    int myrand = (int)truncf(myrandf); 

    assert(myrand <= max_rand_int[idx]); 
    assert(myrand >= min_rand_int[idx]); 
    result[myrand-min_rand_int[idx]]++; 
    count++;} 
} 

int main(){ 

    curandState *d_state; 
    cudaMalloc(&d_state, sizeof(curandState)); 
    unsigned *d_result, *h_result; 
    unsigned *d_max_rand_int, *h_max_rand_int, *d_min_rand_int, *h_min_rand_int; 
    cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned)); 
    h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned)); 
    cudaMalloc(&d_max_rand_int, sizeof(unsigned)); 
    h_max_rand_int = (unsigned *)malloc(sizeof(unsigned)); 
    cudaMalloc(&d_min_rand_int, sizeof(unsigned)); 
    h_min_rand_int = (unsigned *)malloc(sizeof(unsigned)); 
    cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned)); 
    setup_kernel<<<1,1>>>(d_state); 

    *h_max_rand_int = MAX; 
    *h_min_rand_int = MIN; 
    cudaMemcpy(d_max_rand_int, h_max_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice); 
    cudaMemcpy(d_min_rand_int, h_min_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice); 
    generate_kernel<<<1,1>>>(d_state, ITER, d_max_rand_int, d_min_rand_int, d_result); 
    cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost); 
    printf("Bin: Count: \n"); 
    for (int i = MIN; i <= MAX; i++) 
    printf("%d %d\n", i, h_result[i-MIN]); 

    return 0; 
} 


$ nvcc -arch=sm_20 -o t527 t527.cu -lcurand 
$ cuda-memcheck ./t527 
========= CUDA-MEMCHECK 
Bin: Count: 
2 1665496 
3 1668130 
4 1667644 
5 1667435 
6 1665026 
7 1666269 
========= ERROR SUMMARY: 0 errors 
$

출처

2013-08-29 02:48:02

나는 아마 이렇게했습니다. 이 두 코드를 비교할 수 있도록이 코드를 코드에 넣을 수 있습니까? 다시 한번 감사드립니다. – duttasankha

@ Robert의 예제는 을 완벽하게으로 생성하지 않습니다 (범위의 모든 숫자가 생성되고 모든 생성 된 숫자가 범위 내에 있음). 최소값과 최대 값 모두 범위에서 나머지 숫자 중에서 선택 될 확률이 0.5입니다.

2 단계에서, 당신은 범위의 값의 수와 곱해야합니다 (최대 값 - 최소값 + 0.999999을). *

3 단계에서 오프셋은 (+ 가장 작은 값 + 0.5) 대신 (+ 최소값)이어야합니다.

1 단계와 4 단계는 동일하게 유지됩니다.

* @Kamil Czerski가 지적했듯이 1.0은 배포판에 포함되어 있습니다. 0.99999 대신 1.0을 추가하면 원하는 범위를 벗어난 숫자가 표시되는 경우가 있습니다.

출처

2014-07-30 10:07:34 tudorturcu

1.0은 curand_uniform에서 [included] (http://docs.nvidia.com/cuda/curand/device-api-overview.html#distributions)입니다. 당신이 정확히 1을 그릴 기회는 아주 적습니다.0을 곱한 후 (가장 작은 값 - 가장 작은 값)을 더하고 0으로 반올림합니다. 범위를 벗어납니다. [Here] (http://stackoverflow.com/questions/24537112/uniformly-distributed-pseudorandom-integers- inside-cuda-kernel/24537113 # 24537113)은 유니폼 정수를 생성하는 제 버전이지만 기본 아이디어는 Robert 's Crovella와 같습니다. 나는 1 대신 0.999999를 사용하고 당신이 제안한 것과 똑같이합니다. –

이 오류를 알려 주셔서 감사합니다. 나는 배포본에 1.0을 포함시키고 0.0을 상당히 제외시키는 결정을 찾는다. 내 대답을 수정하여 변경 사항을 포함시킵니다. 코드 샘플을 포함하도록 코드를 수정할 수 있습니다. – tudorturcu

다양한 범위의 Cuda 커널에서 임의의 숫자 생성

답변

관련 문제