Lexicographical 정렬 배열 배열 알고리즘을 사용하여 C#

나는 vb6을 다루기 전에 이미 이런 종류의 질문을했으며 너무 느려서이 작업에 C#을 사용하기로 결정했다. 이제 동일한 코드가 두 배 속도로 실행되지만 여전히 느립니다.Lexicographical 정렬 배열 배열 알고리즘을 사용하여 C#

느린 이유는 모든 열을 검사하는 각 열의 끝에서 사전 식 정렬을 시작하기 때문입니다.

모든 열을 검사하는 첫 번째 열에서 정렬 프로세스를 시작하고 해당 열의 첫 번째 바이트와 가능한 동일한 여러 개의 첫 번째 낮은 바이트를 가진 가장 낮은 행을 검색하고 그룹화하는 경우 두 번째 (다음) 열을 검사하는 두 번째 바이트가 두 번째 바이트가 가장 낮은 바이트인지 확인합니다. 두 바이트가 모두 같으면 다음 열로 이동합니다. 다음 행의 바이트가 다른 위치를 감지하면 열 코드는 첫 번째 바이트에 대해 수행되고 두 번째 최저점을 찾는 단계로 이동합니다. 실제로이 프로세스가 좋은 속도 향상을 얻으려고 노력하는 방법입니다.하지만 불행히도이 정렬 기술과 큰 혼란을 겪었으며 누군가 나를 도와 줬어.

현재 코드는 마지막 열에서 무차별 방식으로 정렬하여 모든 행을 정렬 한 다음 한 열을 왼쪽으로 이동하고 모든 행을 다시 정렬하며 첫 번째 열에 도달 할 때까지 계속 정렬하고 정렬합니다 . 분명한 이유가 없으므로 반복 작업을 수행하기 때문에 속도가 느립니다.

현재 코드를 사용하여 각 행이 올바른 정렬 된 순서를 얻을 때까지 여러 행을 여러 번 정렬해야한다고 말하면 256 열과 256 행에 총 65,536 개의 배열 요소가 있다고 가정합니다. 각 열에 대해 65,536 회 반복을 수행 할 수 있습니다. 그래서 총 256 * 65536 = 16,777,216 번 반복 할 때마다 함수를 호출하고 그 속도가 느린 실제 이유입니다.

나는 이것이 많은 사람들에게 물어볼 만하다. 그러나 누군가 자유 시간을 가지기를 원한다면 이미 이것을했기 때문에 나를 도와 줄 수 있었으면 좋겠다.

여기 제가 지금까지 작업해야하는 코드가 있습니다.

byte[] sortArrayOfArraysLexicoGraphically(ref byte[] data) { 
    byte[] lexicoGraphicalIndexes; 
    long dataSize = data.Length; 
    long squareRootMinusOne; 
    int squareRoot; 
    int row = 0; 
    bool rowSwapped; 
    byte[] tmpRow; 

    squareRoot = (int)Math.Sqrt(dataSize); 
    tmpRow = new byte[squareRoot]; 
    squareRootMinusOne = squareRoot - 1; 
    lexicoGraphicalIndexes = new byte[squareRoot]; 

    for(short column = 0; column < lexicoGraphicalIndexes.Length; column++) { 
     lexicoGraphicalIndexes[column] = (byte)column; 
    } 

    for(long column = squareRootMinusOne; column >= 0; column -= 1) { 
     do { 
      rowSwapped = false; 
      do { 
       if(data[(row * squareRoot) + column] > data[((row + 1) * squareRoot) + column]) { 
        //Swaps a full row in a few copies. 
        //Copies full row to tmpRow 
        Buffer.BlockCopy(data, (row * squareRoot), tmpRow, 0, squareRoot); 
        //Replace first row with second row. 
        Buffer.BlockCopy(data, ((row + 1) * squareRoot), data, (row * squareRoot), squareRoot); 
        //Replace second row with tmpRow 
        Buffer.BlockCopy(tmpRow, 0, data, ((row + 1) * squareRoot), squareRoot); 
        swapBytes(ref lexicoGraphicalIndexes, row, row + 1); 
        rowSwapped = true; 
       } 
       row++; 
      } while (row < squareRootMinusOne); 
      row = 0; 
     } while (rowSwapped != false); 
    } 
    return lexicoGraphicalIndexes; 
} 

public void swapBytes(ref byte[] data, long firstIndex, long secondIndex) { 
    byte tmpFirstByte = data[firstIndex]; 
    data[firstIndex] = data[secondIndex]; 
    data[secondIndex] = tmpFirstByte; 
}

출처

2012-09-02 SSpoke

까지이 긴 몬스터를 작성 종료하지만 .. 몇 가지 테스트 실행을 위해 트릭을 할 것으로 보인다 없습니다 그게 완벽하게 더 많은 테스트가 필요하다면, 나는 더 많은 테스트를했을 때이를 업데이트 할 것입니다.

int[] sortArrayOfArraysLexicoGraphically(ref byte[] data) { 
     int[] lexicoGraphicalIndexes; 
     long dataSize = data.Length; 
     int squareRoot; 
     bool rowSwapped; 

     squareRoot = (int)Math.Sqrt(dataSize); 
     lexicoGraphicalIndexes = new int[squareRoot]; 

     for(int column = 0; column < lexicoGraphicalIndexes.Length; column++) { 
      lexicoGraphicalIndexes[column] = column; 
     } 

     byte currentLowestRowByte = 255; //set to highest to avoid unassigned local variable error. 
     int previousLowestRowByte = -1; //this is only used after the second pass. 
     int lowestRowIndex = -1; //hopefully this won't mess anything up. 
     List<int> lowestRowIndexes = new List<int>(); 
     bool stillSorting = true; 
     int startRow = 0; //which row to start with, as the sorting process gets more sorted this number increases. 
     int startColumn = 0; //first column 

     while(stillSorting) { 
      //Resets 
      lowestRowIndexes.Clear(); 
      startColumn = 0; 
      currentLowestRowByte = 255; 
      lowestRowIndex = -1; 

      //first step finds the lowest row in the first column 
      for(int row = 0; row < squareRoot; row += 1) { 
       if(data[(row * squareRoot) + startColumn] <= currentLowestRowByte && 
        data[(row * squareRoot) + startColumn] > previousLowestRowByte) { 
        currentLowestRowByte = data[(row * squareRoot) + startColumn]; 
        lowestRowIndex = row; 
       } 
      } 

      //Resets for next pass. 
      previousLowestRowByte = currentLowestRowByte; 

      //Check if sorting process is already finished. (No matches found from step 1). 
      if(lowestRowIndex == -1) { 
       stillSorting = false; 
       break; 
      } 

      //second step finds all the similar rows with the current lowestRowByte. 
      for(int row = 0; row < squareRoot; row += 1) { 
       if(data[(row * squareRoot) + startColumn] == currentLowestRowByte) { 
        lowestRowIndexes.Add(row); 
       } 
      } 

      //third step loops through all lowestRowIndexes to find which one comes first, second, third, etc... 
      if(lowestRowIndexes.Count > 1) { 
       //This sorts the same rows, rows*rows amount of times, until they are sorted correctly. 
       rowSwapped = true; 
       while(rowSwapped != false) { 
        rowSwapped = false; 
        for (int row = 0; row < lowestRowIndexes.Count; row++) 
        { 
         if((row+1) >= lowestRowIndexes.Count) 
          break; 
         //Current first row byte checked with Next first row byte in lowestRowIndexes. 
         //If both are equal keep going unto next column until a break is found, if any break. 
         startColumn = 1; 
         while(rowSwapped == false) { 
          //Reached beyond the last column. 
          if(startColumn == squareRoot) 
           break; 

          if(data[(lowestRowIndexes[row] * squareRoot) + startColumn] == data[(lowestRowIndexes[row+1] * squareRoot) + startColumn]) 
           startColumn++; 

          if(data[(lowestRowIndexes[row] * squareRoot) + startColumn] < data[(lowestRowIndexes[row+1] * squareRoot) + startColumn]) { 
           break; //Sorted already, get out. 
          } else if(data[(lowestRowIndexes[row] * squareRoot) + startColumn] > data[(lowestRowIndexes[row+1] * squareRoot) + startColumn]) { 
           swapBytesRow(ref data, lowestRowIndexes[row], lowestRowIndexes[row+1], squareRoot); 
           swapBytes(ref lexicoGraphicalIndexes, lowestRowIndexes[row], lowestRowIndexes[row+1]); 
           rowSwapped = true; //a swap has occurred. 
           break; 
          } 
         } 
        } 
       } 

       //forth step re-sorts all the pre-sorted lowestRowIndexes into master array, using startRow variable. 
       foreach(int row in lowestRowIndexes) { 

        //First checks if row is already in the proper sorted location. 
        if(row != startRow) { 
         swapBytesRow(ref data, startRow, row, squareRoot); 
         swapBytes(ref lexicoGraphicalIndexes, startRow, row); 
         startRow++; //skip Rows starting from value < startRow as they are perfectly sorted. 
        } else { 
         startRow++; //skip Rows starting from value < startRow as they are perfectly sorted. 
        }      
       } 
      } else { 
       //Only one instance of this lowestRowByte existed. so obviously this is the next best sorted match. 
       swapBytesRow(ref data, startRow, lowestRowIndex, squareRoot); 
       swapBytes(ref lexicoGraphicalIndexes, startRow, lowestRowIndex); 
       startRow++; //skip Rows starting from value < startRow as they are perfectly sorted. 
      } 
     } 
     return lexicoGraphicalIndexes; 
    }

public void swapBytes(ref byte[] data, long firstIndex, long secondIndex) { 
     byte tmpFirstByte = data[firstIndex]; 
     data[firstIndex] = data[secondIndex]; 
     data[secondIndex] = tmpFirstByte; 
    }

public void swapBytes(ref int[] data, long firstIndex, long secondIndex) { 
     int tmpFirstByte = data[firstIndex]; 
     data[firstIndex] = data[secondIndex]; 
     data[secondIndex] = tmpFirstByte; 
    }

출처

2012-09-03 00:01:47 SSpoke

코드의 문제점은 O (N * N) 복잡도를 갖는 정렬 알고리즘 (Bubble Sort와 유사)을 사용한다는 것입니다. [QuickSort] (http://en.wikipedia.org/wiki/Quicksort)와 같은 더 나은 정렬 알고리즘을 구현하거나 .Net의 정렬 기능을 사용한 것처럼 사용해야합니다. –

다른 방법으로는 볼 수 없습니다. 공통적으로 첫 번째 바이트가 동일한 행을 정렬 한 다음 첫 번째 행, 두 번째 등을 순서대로 바꿔서 순서대로 정렬합니다. 드문 경우의 주 정렬 코드는 최대 5-7 개의 행을 정렬 할 수 있습니다. 첫 번째 바이트 만 공통으로 포함하는 행은 한 번만 바로 위에 추가됩니다. – SSpoke

morelinq의 바이너리 (http://code.google.com/p/morelinq/)를 다운로드하고 위의 코드를 테스트 할 것을 강력히 권장합니다. 정말 당신의 알고리즘을 잘 이해하고 있고 코드가 똑같은 것을 말할 수는 없지만 적어도 속도의 차이를 볼 수는 있습니다. (추신 : 당신은 입력 배열을 Sqrt (N) 부분으로 나누기 위해 morelinq를 사용할 필요가 없습니다. "사용하기 쉽도록"사용했습니다) –

필자의 정렬 알고리즘은 매우 나쁘다. 최적화 및 기본 linq 사용하지 않고도 수십배의 속도를 얻을 수 있습니다.

크기가 N * N 인 데이터를 사용하여 N = 200으로 테스트했습니다 (아래 코드가 사용자의 것과 정확히 일치하는지 100 % 정확하지는 않지만 적어도 결과를 시도해 볼 수는 있습니다)

List<byte[]> result = data.Batch(N) 
          .OrderBy(b => b, new ArrayComparer()) 
          .Select(b => b.ToArray()) 
          .ToList();

편집

인플레 이스 종류는 더 빨리 할 수 있습니다.

var list = data.Batch(N).Select(x => x.ToArray()).ToList(); 
list.Sort(new ArrayComparer());

public class ArrayComparer : IComparer<IEnumerable<byte>> 
{ 
    public int Compare(IEnumerable<byte> x, IEnumerable<byte> y) 
    { 
     var xenum = x.GetEnumerator(); 
     var yenum = y.GetEnumerator(); 
     while (xenum.MoveNext() && yenum.MoveNext()) 
     { 
      if (xenum.Current != yenum.Current) 
        return xenum.Current - yenum.Current; 
     } 
     return 0; 
    } 
}

PS : Batch는 morelinq에서 확장 방법이다

출처

2012-09-02 10:01:28

나는 자체적으로 포함 된 것을 원했지만 그 사실에 대해서는 언급하지 않았다. 정말로 LinQ와 같은 새로운 기술과 내가 테스트 한 후에 필요로하는 여분의 파일을 좋아하지 않는다. 여전히 작동하지 않았다.'ThrowIfNull','ThrowIfNonPositive'를 이해할 수 없을 것이다. 직접 쓰는 것이 나에게 좋은 4 시간이 걸렸지 만 지금은 더 빨리 작동하는 것 같다. 대답으로 게시 할 것이다. – SSpoke

Lexicographical 정렬 배열 배열 알고리즘을 사용하여 C#

답변

관련 문제