이 코드의 속도를 높이는 방법은 무엇입니까? 행렬의 행과 열을 반복하는 미적분

행과 열을 반복하여 행렬에 대한 계산을 수행하는 코드 조각이 있습니다. 수행 된 미적분은 코사인 거리 측정 값이며 인터넷에서 찾은 코드가 있습니다 (지금 링크를 검색 할 수 없음).이 코드의 속도를 높이는 방법은 무엇입니까? 행렬의 행과 열을 반복하는 미적분

10,000 개의 행과 열이있을 수 있습니다. 행렬은 대칭이므로 절반 만 반복하면됩니다. 값은 실수입니다.

문제 : 매우 느립니다 (3-6 시간 소요). 누구든지 개선 사항을 가르쳐 줄 수 있습니까? 고마워!

코드에 대한 참고 사항 : 유연성을 위해 추상 클래스를 사용합니다. 이렇게하면 별도 클래스에 정의 된 코사인 계산을 다른 클래스로 쉽게 대체 할 수 있습니다.

코드 :

이

import Jama.Matrix; 
import java.util.ArrayList; 
import java.util.HashSet; 
import java.util.concurrent.ExecutionException; 

public abstract class AbstractSimilarity { 

    HashSet<Triple<Double, Integer, Integer>> set = new HashSet(); 
    public ArrayList<Thread> listThreads = new ArrayList(); 

    public void transform(Matrix matrixToBeTransformed) throws InterruptedException, 
ExecutionException { 

     int numDocs = termDocumentMatrix.getColumnDimension(); 

     Main.similarityMatrix = new Matrix(numDocs, numDocs); 

     System.out.println("size of the matrix: " + numDocs + "x " + numDocs); 

     //1. iteration through all rows of the matrixToBeTransformed 
     for (int i = numDocs - 1; i >0 ; i--) { 
      System.out.println("matrix treatment... " + ((float) i/(float) numDocs * 100) + "%"); 

      //2. isolates the row i of this matrixToBeTransformed 
      Matrix sourceDocMatrix = matrixToBeTransformed.getMatrix(
        0, matrixToBeTransformed.getRowDimension() - 1, i, i); 



      // 3. Iterates through all columns of the matrixToBeTransformed 
//   for (int j = 0; j < numDocs; j++) { 
//    if (j < i) { 
// 
//     //4. isolates the column j of this matrixToBeTransformed 
//     Matrix targetDocMatrix = matrixToBeTransformed.getMatrix(
//       0, matrixToBeTransformed.getRowDimension() - 1, j, j); 


        //5. computes the similarity between this given row and this given column and writes it in a resultMatrix 
//     Main.resultMatrix.set(i, j, computeSimilarity(sourceDocMatrix, targetDocMatrix)); 
//    } else { 
//     Main.resultMatrix.set(i, j, 0); 

//    } 
// 
//   } 
     }

가 계산을 정의하는 클래스가 수행 할이 :

import Jama.Matrix; 

public class CosineSimilarity extends AbstractSimilarity{ 

    @Override 
    protected double computeSimilarity(Matrix sourceDoc, Matrix targetDoc) { 
    double dotProduct = sourceDoc.arrayTimes(targetDoc).norm1(); 
    double eucledianDist = sourceDoc.normF() * targetDoc.normF(); 
    return dotProduct/eucledianDist; 
    } 

}

출처

2012-03-02 seinecle

숙제 프로젝트입니까? MatLab과 같은 수학 소프트웨어를 사용할 수 없습니까? –

이것은 학계에서 전문적인 프로젝트이며 자바를 사용해야합니다. 자신의 제한에 대한 저의 두려움이 두렵습니다! – seinecle

알고리즘의 어느 부분이 가장 오래 걸리는지 프로파일 링 해 보셨습니까? 연산의 시작/끝 부분에'new Date(). getTime();'을 추가하면 빼기 만하면 좋은 통찰력을 얻을 수 있습니다. – Marcelo

당신은 N^3 알고리즘을 처리 한 것으로 나타났습니다. n^2는 (반) 행렬을 채우기 때문에 발생합니다. 각 요소를 채우는 방법 (점 product/fnorm)에 시간이 걸리기 때문에 n을 다시 계산합니다. 좋은 소식은 계산이 서로 의존하지 않기 때문에 속도를 높이기 위해이를 다중 스레드 할 수 있다는 것입니다.

public class DoCalc extends Thread 
{ 
    public Matrix localM; 
    int startRow; 
    int endRow; 
    public DoCalc(Matrix mArg, int startArg, int endArg) 
    { 
    localM=mArg; 
    startRow=startArg; 
    endRow=endArg; 
    } 

    public void doCalc() 
    { 
    //Pseudo-code 
    for each row from startRow to endRow 
     for each column 0 to size-1 
     result[i][j] = similarityCalculation 
    } 
    public void run() 
    { 
    doCalc(); 
    } 
} 

public void transform(Matrix toBeTransformed) 
{ 
    int numDocs = termDocumentMatrix.getColumnDimension(); 

    Main.similarityMatrix = new Matrix(numDocs, numDocs); 
    Vector<DoCalc> running = new Vector<DoCalc>(); 
    int blockSize = 10; 
    for (int x = 0; x < numDocs-1;x+=blockSize) 
    { 
    DoCalc tempThread = new DoCalc(toBeTransformed,x,(x+blockSize>numDocs-1)?numDocs-1:x+blockSize); 
    tempThread.start(); 
    running.add(tempThread); 
    } 

    for (DoCalc dc : running) 
    dc.join(); 

}

주의 사항 :

이것은 매우 순진 구현입니다. 크기의 배열로 실행하려고하면 1000 개의 스레드가 생성됩니다. blockSize를 사용하거나 스레드 풀링을 조사 할 수 있습니다.

기껏해야 이렇게하면 속도가 4 배나 증가 할 것입니다. 규모 이득을 원한다면 알고리즘을 적절하게 프로파일 링하거나 알고리즘을보다 효율적으로 변경해야합니다. 수행하려는 작업 (매트릭스의 각 요소에 상대적으로 비싼 작업 실행)이 주어지면 후자가 불가능할 수 있습니다.

편집 : 멀티 스레딩은 CPU에 바인딩되어 있고 코어가 상대적으로 유휴 상태에있는 멀티 코어 CPU를 사용하는 경우 속도가 크게 증가합니다.

출처

2012-03-02 16:17:36 Thomas

thx! 고정 된 스레드 풀을 가진 멀티 스레딩 솔루션은 코드 3의 속도를 높입니다. 9400x9400 매트릭스에서 3h가 완료되었습니다. 이제 속도를 높이기위한 솔루션을 조사 중입니다! :-) => http://stackoverflow.com/questions/9550486/tutorials-or-books-on-kernel-programming-for-opencl – seinecle

이 코드의 속도를 높이는 방법은 무엇입니까? 행렬의 행과 열을 반복하는 미적분

답변

관련 문제