다중 스레드 멀티 코어 시스템에서 HashMap의 확장 성 문제

hashmap에서 데이터를 읽는 중 확장 성 문제가 발생합니다. 내 컴퓨터에는 코어 당 2 개의 하이퍼 스레드 (총 64cpus)와 64GB RAM이있는 32 코어가 있습니다. HashMap에서 데이터를 읽고 산술 계산을 수행 할 때 16 스레드 이후의 성능 저하를 볼 수 있지만 산술 연산 만 수행하면 예상대로 확장됩니다. arithematic 작업의 HashMap에서 읽고 수행다중 스레드 멀티 코어 시스템에서 HashMap의 확장 성 문제

아래의 테스트 결과를 찾아주세요

스레드 없음 | 소요 시간 (초) => 1 | 85, 2 | 93, 4 | 124, 8 | 147, 16 | 644

수행에만 arithematic 작업 :

스레드 없음 | 소요 시간 (초) => 1 | 25, 2 | 32, 4 | 35, 8 | 41, 16 | 65, 32 | 108, 40 | 112, 64 | 117, 100 | 158

또한 참조 코드 블록을 추가 :

import java.util.*; 

import java.util.concurrent.*; 

import java.lang.*; 

public class StringCallable2 
{ 

// private static final long size = 500000L; 
    private static final long size = 1000000L; 
// private final static HashMap <Long,Long>map = new HashMap<Long, Long>(); 

// private static long[] array = new long[(int) size]; 
    public static class StringGenCallable implements Callable 
    { 
     int count; 
     public StringGenCallable(int count) 
     { 
      this.count = count; 
     } 

     public Long call() 
     { 

      //Random rand = new Random(); 
//   System.out.println("Thread " + count + " started test"); 
      long sum = 20; 
      // do a CPU intensive arithmetic operation; no Input Output 
      // operations, object creations or floating point arithmetic 

      for (long i = 0; i < size; i++) 
      { 
       //int numNoRange = rand.nextInt((int)(size-1)); 
       //long numNoRange = i; 
       // Long long1 = map.get((long)i); 
       //Long long1 = array[(int)i]; 
       sum = i + 19 * sum; 
      } 
//   System.out.println("Finished " + count); 

      return sum; 
     } 
    } 

    public static void main(String args[]) 
    { 
     try 
     { 
     System.out.println("Starting"); 
     // for (long i = 0; i < size; i++) 
     // { 
      //array[(int)i] = System.currentTimeMillis(); 
     // map.put(i, System.currentTimeMillis()); 
     // } 
     int sizt = Integer.valueOf(args[0]); 
     long curtime = System.currentTimeMillis(); 
     ExecutorService pool = Executors.newFixedThreadPool(sizt); 
     Set<Future<Integer>> set = new HashSet<Future<Integer>>(); 
     for (int i = 0; i < sizt; i++) 
     { 
      Callable<Integer> callable = new StringGenCallable(i); 
      Future<Integer> future = pool.submit(callable); 
      set.add(future); 
     } 

     long sum = 0; 
     for (Future<Integer> future : set) 
     { 
      future.get(); 
     } 

     System.out.println("Number of threads : "+sizt); 
     long finsihtime = System.currentTimeMillis(); 
     System.out.println("Total Time Taken : " + (finsihtime - curtime)+" ms"); 
     pool.shutdown(); 
     // System.exit(sum); 
     } 
     catch (Exception e) { 
      // TODO: handle exception 
      e.printStackTrace(); 
     } 
     catch (Error e) { 
      // TODO: handle exception 
      e.printStackTrace(); 
     } 
     catch (Throwable e) { 
      // TODO: handle exception 
      e.printStackTrace(); 
     } 
    } 

}

출처

2011-11-08 Rahul

Ooops. 너는 그 질문을 잊었다. – aioobe

그리고 여기서 질문은 무엇입니까? 잠금 경합이 확장 성을 해치는 것은 잘 알려진 사실입니다. 어쨌든, 당신의 경우 다중 스레드 사용을 위해 최적화 된'ConcurrentHashMap'을 시도 할 수 있습니다. –

java5 +를 사용하는 경우 java.util.ConcurrentHashMap을 시도하십시오.이 클래스는 더 적합합니다 – blob

을하면 ConcurrentHashMap를 사용한다] 다중의 레벨의 애플리케이션을 위해. 나는 그 변화를 반영하기 위해 재 설계하고, 그 다음에 성과를 재검토 할 것이다.

또한 얼마나 많은 스레드를 효과적으로 사용할 수 있는지 신중히 생각할 것입니다. '성능 향상'으로 '스레드 추가'를 쉽게 볼 수 있으며 그렇지 않습니다. 스레드 수를 제한하고 현재 공유 된 데이터 구조를 ThreadLocal으로 만들어 데이터 공유 및 그 결과로 발생하는 경합 및 컨텍스트 전환을 줄임으로써 더 많은 향상을 얻을 수 있습니다.

이 예제에서이 프로세스의 전체 상자를 소유한다고 가정해도 작업 항목은 순수하게 CPU 바인딩이므로 프로세스가 더 느리게 실행됩니다.

실제 응용 프로그램에서는 작업 단위가 여기에있는 것보다 훨씬 복잡하거나 오래 실행될 수 있습니다. 하드웨어에 대한 것부터 스레드 단위의 작업 단위에 이르기까지 너무 많은 결론을 내리는 것에는주의해야합니다. 요점은보다 복잡한 워크로드에 비해 스레드 관리 오버 헤드가 실행 된 작업에 비해 증폭된다는 것입니다. 더 복잡한 작업에서는 HashMap의 조회 효과가 사라지고 성능이 예상보다 향상 될 수 있습니다.

출처

2011-11-08 12:25:16

주석 처리 된 코드에서지도는 읽기 전용입니다. –

주석 처리 된 코드의 모양에서 보면 자동 오버 바이어가 높은 오버 헤드처럼 보입니다. map.get((long)i) 각각에 대해 새 Long 개체를 할당 할 가능성이 있습니다. 할당은 빠르지 만 그렇게 빠르지는 않습니다.

이것은 하나의 스레드를 보유하고 있든 많든간에 적용됩니다. 그러나 많은 스레드의 경우 CPU보다 메모리 대역폭이 더 중요 할 수 있습니다.

은 (Long.valueOf의 구현은 작은 값 가능성 동일한 값에 대해 동일한 Long 인스턴스를 리턴 할 수있다. 또한, "분석 탈출"의 어플리케이션 힙에서 Long을 제거 할 수있다.)

출처

2011-11-08 13:02:54

처음에는 HashMap 사례가 모든 조회에 개체를 만들기 때문에 그럴 것이라고 생각했습니다.

그러나 테스트 (아래 참조) 후에 캐시에 대한 효율적인 액세스를 얻는 것이 점점 어려워지고 있다고 생각합니다. 이 직선을 스캔 메모리

import gnu.trove.TLongLongHashMap; 

import java.util.HashMap; 
import java.util.concurrent.Callable; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
import java.util.concurrent.TimeUnit; 

/** 
* @author peter.lawrey 
*/ 
public class HashMapPerfMain { 
    public static final int REPEATS = 10000; 

    public static void main(String... args) throws InterruptedException { 
     int runLength = 10 * 1000; 
     HashMap<Long, Long> hashMap = new HashMap<Long, Long>(); 
     TLongLongHashMap troveMap = new TLongLongHashMap(); 
     long[] array = new long[runLength]; 
     for (long i = 0; i < runLength; i++) { 
      long now = System.nanoTime(); 
      hashMap.put(i, now); 
      troveMap.put(i, now); 
      array[((int) i)] = now; 
     } 

     for (int i = 0; i < 3; i++) { 
      timeHashMap(hashMap); 
      timeTroveMap(troveMap); 
      timeArray(array); 
     } 
    } 

    private static void timeHashMap(final HashMap<Long, Long> map) throws InterruptedException { 
     System.out.printf("%-16s ", map.getClass().getSimpleName()); 
     for (int t = 1; t <= Runtime.getRuntime().availableProcessors(); t *= 2) { 
      long start = System.nanoTime(); 
      ExecutorService es = Executors.newFixedThreadPool(t); 
      for (int i = 0; i < t * REPEATS; i++) 
       es.submit(new Callable<Long>() { 
        @Override 
        public Long call() throws Exception { 
         long sum = 20; 
         for (long key = 0; key < map.size(); key++) 
          sum = sum * 19 + map.get(key); 
         return sum; 
        } 
       }); 
      es.shutdown(); 
      es.awaitTermination(10, TimeUnit.MINUTES); 
      long time = System.nanoTime() - start; 
      System.out.printf("%d | %.3f ", t, time/1e9); 
     } 
     System.out.println(); 
    } 

    private static void timeTroveMap(final TLongLongHashMap map) throws InterruptedException { 
     System.out.printf("%-16s ", map.getClass().getSimpleName()); 
     for (int t = 1; t <= Runtime.getRuntime().availableProcessors(); t *= 2) { 
      long start = System.nanoTime(); 
      ExecutorService es = Executors.newFixedThreadPool(t); 
      for (int i = 0; i < t * REPEATS; i++) 
       es.submit(new Callable<Long>() { 
        @Override 
        public Long call() throws Exception { 
         long sum = 20; 
         for (long key = 0; key < map.size(); key++) 
          sum = sum * 19 + map.get(key); 
         return sum; 
        } 
       }); 
      es.shutdown(); 
      es.awaitTermination(10, TimeUnit.MINUTES); 
      long time = System.nanoTime() - start; 
      System.out.printf("%d | %.3f ", t, time/1e9); 
     } 
     System.out.println(); 
    } 

     private static void timeArray(final long [] array) throws InterruptedException { 
      System.out.printf("%-16s ", array.getClass().getSimpleName()); 
     for (int t = 1; t <= Runtime.getRuntime().availableProcessors(); t *= 2) { 
      long start = System.nanoTime(); 
      ExecutorService es = Executors.newFixedThreadPool(t); 
      for (int i = 0; i < t * REPEATS; i++) 
       es.submit(new Callable<Long>() { 
        @Override 
        public Long call() throws Exception { 
         long sum = 20; 
         for (int key = 0; key < array.length; key++) 
          sum = sum * 19 + array[key]; 
         return sum; 
        } 
       }); 
      es.shutdown(); 
      es.awaitTermination(10, TimeUnit.MINUTES); 
      long time = System.nanoTime() - start; 
      System.out.printf("%d | %.3f ", t, time/1e9); 
     } 
     System.out.println(); 
    } 
}

인쇄

HashMap   1 | 0.904 2 | 0.863 4 | 0.913 8 | 1.832 
TLongLongHashMap 1 | 0.568 2 | 0.566 4 | 0.572 8 | 1.048 
long[]   1 | 0.092 2 | 0.091 4 | 0.090 8 | 0.093 
HashMap   1 | 0.767 2 | 0.773 4 | 0.912 8 | 1.833 
TLongLongHashMap 1 | 0.560 2 | 0.563 4 | 0.570 8 | 1.057 
long[]   1 | 0.088 2 | 0.089 4 | 0.090 8 | 0.096 
HashMap   1 | 0.758 2 | 0.774 4 | 0.911 8 | 1.828 
TLongLongHashMap 1 | 0.565 2 | 0.564 4 | 0.568 8 | 1.056 
long[]   1 | 0.088 2 | 0.089 4 | 0.090 8 | 0.093

배열 액세스는 매우 효율적이다. HashMaps는 메모리에 무작위로 배열 된 데이터 의사를 갖는 경향이있어 캐시에 더 많은 부담을줍니다.

출처

2011-11-08 13:05:41

다중 스레드 멀티 코어 시스템에서 HashMap의 확장 성 문제

답변

관련 문제