CUDA에서 맞춤 메모리 관리 및 추력 믹스

내 프로젝트에서 응용 프로그램이 "예열"되면 불필요한 cudaMalloc 호출을 피하기 위해 사용자 정의 메모리 할당기를 구현했습니다. 또한 기본 배열 채우기, 배열 사이의 산술 연산 등을 위해 커스텀 커널을 사용하고 Thrust을 사용하고이 커널을 제거함으로써 코드를 단순화하고 싶습니다. 장치의 모든 배열은 원시 포인터를 통해 생성되고 액세스되며 (지금은)이 객체에 device_vector 및 Thrust 메서드를 사용하고 싶습니다. 그러나 원시 포인터와 device_ptr<> 사이를 변환하는 자신을 발견하고, 다소 복잡합니다. 암호.CUDA에서 맞춤 메모리 관리 및 추력 믹스

나의 다소 모호한 질문 : 맞춤형 메모리 관리의 사용 방법을 구성하는 방법/배열 방법과 사용자 정의 커널에 대한 호출을 가장 읽기 쉬운 방법으로 구성하는 방법은 무엇입니까?

출처

2012-01-25 bbtrb

'''device_vector''와 함께 사용할 커스텀 할당자를 생성 할 수 있습니다. –

@JaredHoberock : 설명서를 검색하고 있었고 아무 곳에도 쓸모가 없었습니다. 포인터를 제공 할 수 있습니까? – bbtrb

표준 C++ 컨테이너와 마찬가지로 thrust::device_vector이 자신의 "allocator"을 제공하여 스토리지를 할당하는 방법을 사용자 정의 할 수 있습니다. 기본적으로 thrust::device_vector의 할당자는 thrust::device_malloc_allocator이며, Thrust의 백엔드 시스템이 CUDA 인 경우 cudaMalloc (cudaFree)으로 저장 영역을 할당 (할당 해제)합니다.

경우에 따라 device_vector은 프로그램 초기화시 수행 된 단일 큰 할당 내에서 저장소를 하위 할당하려는 OP의 경우와 같이 메모리를 할당하는 방식으로 사용자 지정하는 것이 좋습니다. 이렇게하면 기본 할당 체계 (이 경우 cudaMalloc)에 대한 많은 개별 호출로 인해 발생할 수있는 오버 헤드를 피할 수 있습니다.

device_vector 사용자 지정 할당자를 제공하는 간단한 방법은 device_malloc_allocator에서 상속하는 것입니다. 원칙적으로 전체 할당자를 처음부터 만들 수는 있지만 상속 접근법을 사용하면 allocate 및 deallocate 멤버 함수 만 제공하면됩니다. 사용자 정의 할당자가 정의되면 두 번째 템플릿 매개 변수로 device_vector을 제공 할 수 있습니다. 우리는 듣고 있습니다,이 예에서

$ nvcc my_allocator_test.cu -arch=sm_20 -run 
my_allocator::allocate(): Hello, world! 
my_allocator::deallocate(): Hello, world!

이 예제 코드는 할당 및 할당 취소에 메시지를 인쇄하는 사용자 정의 할당 제공하는 방법을 보여줍니다 여기에

#include <thrust/device_malloc_allocator.h> 
#include <thrust/device_vector.h> 
#include <iostream> 

template<typename T> 
    struct my_allocator : thrust::device_malloc_allocator<T> 
{ 
    // shorthand for the name of the base class 
    typedef thrust::device_malloc_allocator<T> super_t; 

    // get access to some of the base class's typedefs 

    // note that because we inherited from device_malloc_allocator, 
    // pointer is actually thrust::device_ptr<T> 
    typedef typename super_t::pointer pointer; 

    typedef typename super_t::size_type size_type; 

    // customize allocate 
    pointer allocate(size_type n) 
    { 
    std::cout << "my_allocator::allocate(): Hello, world!" << std::endl; 

    // defer to the base class to allocate storage for n elements of type T 
    // in practice, you'd do something more interesting here 
    return super_t::allocate(n); 
    } 

    // customize deallocate 
    void deallocate(pointer p, size_type n) 
    { 
    std::cout << "my_allocator::deallocate(): Hello, world!" << std::endl; 

    // defer to the base class to deallocate n elements of type T at address p 
    // in practice, you'd do something more interesting here 
    super_t::deallocate(p,n); 
    } 
}; 

int main() 
{ 
    // create a device_vector which uses my_allocator 
    thrust::device_vector<int, my_allocator<int> > vec; 

    // create 10 ints 
    vec.resize(10, 13); 

    return 0; 
}

를 출력입니다 my_allocator::allocate()vec.resize(10,13)에 한번. 은 vec이 범위를 벗어나면 요소를 제거 할 때 한 번 호출됩니다.

출처

2012-01-25 21:00:38

믿어지지 않는 답변에 감사드립니다. – bbtrb

CUDA에서 맞춤 메모리 관리 및 추력 믹스

답변

관련 문제