다중 커널 호출 for 루프

루프를 위해 여러 커널 호출을 수행하려고합니다. 시도한 코드는 다음과 같습니다.다중 커널 호출 for 루프

id<MTLDevice> device = MTLCreateSystemDefaultDevice(); 
    NSLog(@"Device: %@", [device name]); 

    id<MTLCommandQueue> commandQueue = [device newCommandQueue]; 

    NSError * ns_error = nil; 
    //id<MTLLibrary> defaultLibrary = [device newDefaultLibrary]; 
    id<MTLLibrary>defaultLibrary = [device newLibraryWithFile:@"/Users/i/tmp/tmp6/s.metallib" error:&ns_error]; 
    //get fuction 
    id<MTLFunction> newfunc = [ defaultLibrary newFunctionWithName:@"sigmoid" ]; 

    // Buffer for storing encoded commands that are sent to GPU 
    id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer]; 

    //set input and output data 
    float tmpbuf[2][1000]; 
    float outbuf[2][1000]; 
    float final_out[2][1000]; 
    for(int i = 0; i < 1000; i++) 
    { 
     tmpbuf[0][i] = i; 
     outbuf[0][i] = 0; 
     tmpbuf[1][i] = 10*i; 
     outbuf[1][i] = 0; 
    } 

    int tmp_length = 1000*sizeof(float); 
    //get pipelinestat 
    id<MTLComputePipelineState> cpipeline[2]; 
    cpipeline[0] = [device newComputePipelineStateWithFunction: newfunc error:&ns_error ]; 
    cpipeline[1] = [device newComputePipelineStateWithFunction: newfunc error:&ns_error ]; 
    id<MTLBuffer> inVectorBuffer[2]; 
    id<MTLBuffer> outVectorBuffer[2]; 
    id <MTLComputeCommandEncoder> computeCommandEncoder[2]; 
    computeCommandEncoder[0] = [commandBuffer computeCommandEncoder]; 
    computeCommandEncoder[1] = [commandBuffer computeCommandEncoder]; 

    MTLSize ts= {10, 1, 1}; 
    MTLSize numThreadgroups = {100, 1, 1}; 

    for(int k = 0; k < 2; k++) 
    { 
     inVectorBuffer[k] = [device newBufferWithBytes: tmpbuf[k] length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ]; 
     [computeCommandEncoder[k] setBuffer: inVectorBuffer[k] offset: 0 atIndex: 0 ]; 
     outVectorBuffer[k] = [device newBufferWithBytes: outbuf[k] length: tmp_length options: MTLResourceOptionCPUCacheModeDefault ]; 
     [computeCommandEncoder[k] setBuffer: outVectorBuffer[k] offset: 0 atIndex: 1 ]; 


     [computeCommandEncoder[k] setComputePipelineState:cpipeline[k] ]; 
     [computeCommandEncoder[k] dispatchThreadgroups:numThreadgroups threadsPerThreadgroup:ts]; 
     [computeCommandEncoder[k] endEncoding ]; 

    } 

    [ commandBuffer commit]; 
    [ commandBuffer waitUntilCompleted];enter code here

제대로 작동하지 않습니다. 그것이 실행되었을 때, 다음 정보가 reproted되었습니다.

아무도 문제를 지적 할 수 있습니다. 미리 감사드립니다.

출처

2016-07-19 Pony

루프 내에서 각 반복마다 인코딩을 끝내지 만 루프 외부에서는 한 번만 커밋하지 않습니다. – Marius

한 번에 하나의 명령 버퍼 (명령 버퍼) 만 사용할 수 있습니다. 루프 외부에 여러 개의 인코더를 만드는 대신 내부에 명령 루프를 만들고 다음 루프 반복 전에 루프를 끝내고 인코딩을 끝내는 것이 좋습니다.

출처

2016-07-19 20:40:56 warrenm

워렌이 옳았다. 하나의 computeCommandEncoder는 하나의 작업에 해당하며 루프 내에 만들어야합니다. – Pony

같은 계산 명령 엔코더를 반복해서 사용할 수도 있습니다. 루프에서 endEncoding을 호출 할 이유가 없습니다. 그러면 명령 엔코더를 루프 외부에서 만들 수 있습니다. –

다중 커널 호출 for 루프

답변

관련 문제