사이트 당 3 개의 노드, 노드 당 8GB RAM이 할당 된 Cassandra 1.1.12 클러스터를 사용하고 있습니다. 우리는 주기적으로 노드에서 긴 GC 일시 중지를보고 있으며 이는 우리의 응용 프로그램 실시간 요구 사항을 망치고 있습니다. 우리가 실행하는 시스템은 24GB RAM을 갖춘 8 코어 시스템입니다.Cassandra와 함께 지속되는 GC 문제 - 긴 응용 프로그램 일시 중지
Google은 세계 GC를 중지하는 데 120 초 이상 걸렸습니다.
우리는 여기에
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms8G
-Xmx8G
-Xmn1600M
-XX:+HeapDumpOnOutOfMemoryError
-Xss180k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
JDK 1.7.0_04에서 이러한 플래그와 함께 실행은 상세한 GC 로그는 긴 일시 정지로 이어지는된다
2014-02-23T11:50:19.231-0500: 2119627.980: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 145905916 Max Chunk Size: 3472057 Number of Blocks: 160577 Av. Block Size: 908 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119627.981: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 32269664 bytes, 32269664 total : 1356995K->44040K(1474560K), 4.5270760 secs] 5829345K->4546031K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 144857324 Max Chunk Size: 2423465 Number of Blocks: 160577 Av. Block Size: 902 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 4.5295190 secs] [Times: user=24.78 sys=0.07, real=4.52 secs] Heap after GC invocations=82068 (full 2561): par new generation total 1474560K, used 44040K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 4.5524690 seconds {Heap before GC invocations=82068 (full 2561): par new generation total 1474560K, used 1354760K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:51:14.221-0500: 2119682.970: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 144857324 Max Chunk Size: 2423465 Number of Blocks: 160577 Av. Block Size: 902 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119682.971: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 41744088 bytes, 41744088 total : 1354760K->52443K(1474560K), 2.1589280 secs] 5856751K->4582809K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 143937754 Max Chunk Size: 1505947 Number of Blocks: 160575 Av. Block Size: 896 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 2.1613420 secs] [Times: user=12.82 sys=0.04, real=2.16 secs] Heap after GC invocations=82069 (full 2561): par new generation total 1474560K, used 52443K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 2.1719930 seconds {Heap before GC invocations=82069 (full 2561): par new generation total 1474560K, used 1363163K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:52:33.089-0500: 2119761.837: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 143937754 Max Chunk Size: 1505947 Number of Blocks: 160575 Av. Block Size: 896 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119761.839: [ParNew Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 37906760 bytes, 37906760 total : 1363163K->48710K(1474560K), 3.5105890 secs] 5893529K->4611208K(8224768K)After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 142756036 Max Chunk Size: 326281 Number of Blocks: 160573 Av. Block Size: 889 Tree Height: 146 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 3.5130550 secs] [Times: user=18.81 sys=0.08, real=3.52 secs] Heap after GC invocations=82070 (full 2561): par new generation total 1474560K, used 48710K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 3.5236060 seconds {Heap before GC invocations=82070 (full 2561): par new generation total 1474560K, used 1359430K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000) from space 163840K, 29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000) to space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) 2014-02-23T11:55:59.448-0500: 2119968.196: [GC Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 142756036 Max Chunk Size: 326281 Number of Blocks: 160573 Av. Block Size: 889 Tree Height: 146 Before GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 2119968.198: [ParNew (0: promotion failure size = 131074) (1: promotion failure size = 131074) (2: promotion failure size = 131074) (4: promotion failure size = 131074) (7: promotion failure size = 131074) (promotion failed) Desired survivor size 83886080 bytes, new threshold 1 (max 1) - age 1: 34318480 bytes, 34318480 total : 1359430K->1353373K(1474560K), 1.5971880 secs]2119969.795: [CMSCMS: Large block 0x000000073aa59270 : 4586726K->3600612K(6750208K), 149.1735470 secs] 5921928K->3600612K(8224768K), [CMS Perm : 23966K->23925K(40012K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 403131826 Max Chunk Size: 403131826 Number of Blocks: 1 Av. Block Size: 403131826 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 150.7724630 secs] [Times: user=28.89 sys=11.57, real=150.75 secs] Heap after GC invocations=82071 (full 2562): par new generation total 1474560K, used 0K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000) eden space 1310720K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000) from space 163840K, 0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000) to space 163840K, 0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000) concurrent mark-sweep generation total 6750208K, used 3600612K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 40012K, used 23925K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000) } Total time for which application threads were stopped: 150.7833360 seconds가
나는 또한 설정을 야간 직업을 가지고 그 오전 2시에 인스턴스에 GC를 강제 실행하여 이러한 문제를 완화하고 몇 가지 좋은 결과를 얻었지만 몇 일마다 문제가 계속 발생합니다.
- 추가 세부 사항 NewGen을 2GB (1600M에서)로 변경 한 후 처음 12 시간 이내에 15 초의 일시 중지가 발생했습니다.
은 ... 클래스 히스토그램 덤프 : 추가 팁15772.786: [Class Histogram (before full gc): num #instances #bytes class name ---------------------------------------------- 1: 9743656 526609104 [B 2: 9176097 440452656 java.nio.HeapByteBuffer 3: 8152787 326111480 java.math.BigInteger 4: 8126173 321393760 [I 5: 207997 307212904 [J 6: 8940730 214577520 java.lang.Long 7: 8121743 194921832 org.apache.cassandra.db.DecoratedKey 8: 8121399 129942384 org.apache.cassandra.dht.BigIntegerToken 9: 174374 78049856 [Ljava.lang.Object; 10: 914261 43884528 edu.stanford.ppl.concurrent.SnapTreeMap$Node 11: 1112269 35592608 java.util.concurrent.ConcurrentHashMap$HashEntry 12: 1101827 35258464 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 13: 306955 29467680 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch 14: 1216257 29190168 org.apache.cassandra.cache.KeyCacheKey 15: 1111387 26673288 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue 16: 427695 20529360 edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder 17: 417278 16691120 org.apache.cassandra.db.ExpiringColumn 18: 306955 9822560 edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch 19: 401296 9631104 org.apache.cassandra.db.ISortedColumns$DeletionInfo 20: 305 8691760 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 21: 528460 8455360 java.util.concurrent.atomic.AtomicReference 22: 263965 8446880 edu.stanford.ppl.concurrent.SnapTreeMap 23: 342775 8226600 org.apache.cassandra.db.ColumnFamily 24: 263965 6335160 org.apache.cassandra.db.AtomicSortedColumns$Holder 25: 193133 6180256 org.apache.cassandra.db.DeletedColumn 26: 179519 5744608 java.util.ArrayList$Itr 27: 221515 5316360 java.util.concurrent.ConcurrentSkipListMap$Node 28: 35607 5025400 29: 35607 4852600 30: 61748 4445856 org.apache.cassandra.io.sstable.SSTableIdentityIterator 31: 264409 4230544 edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr 32: 52605 4216504 [Ljava.util.HashMap$Entry; 33: 3389 3868680 34: 221414 3542624 org.apache.cassandra.db.AtomicSortedColumns 35: 107140 3428480 org.apache.cassandra.db.Column 36: 69734 3347232 java.util.TreeMap 37: 131777 3162648 java.util.ArrayList 38: 111752 2682048 java.util.concurrent.ConcurrentSkipListMap$Index 39: 51879 2490192 java.util.HashMap 40: 3389 2438184 41: 2978 2284352 42: 55660 1781120 org.apache.cassandra.io.util.DataOutputBuffer 43: 69664 1671936 org.apache.cassandra.db.TreeMapBackedSortedColumns 44: 51697 1654304 org.apache.cassandra.db.ArrayBackedSortedColumns 45: 61842 1485456 [Ljava.nio.ByteBuffer; 46: 61748 1481952 org.apache.cassandra.utils.BytesReadTracker 47: 3533 1462144 48: 55672 1336128 org.apache.cassandra.io.util.FastByteArrayOutputStream 49: 21743 1318848 [C 50: 33109 1268616 [Lorg.apache.cassandra.db.IColumn; 51: 28607 1144280 java.util.HashMap$KeyIterator 52: 17623 1127616 [Ljava.util.Hashtable$Entry; 53: 35003 1120096 java.util.Vector 54: 17335 1109440 com.sun.jmx.remote.util.OrderClassLoaders 55: 17601 844848 java.util.Hashtable 56: 18528 741120 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo 57: 22634 724288 java.lang.String 58: 17514 700560 java.security.ProtectionDomain 59: 19829 634528 java.util.HashMap$Entry 60: 25327 607848 org.apache.cassandra.utils.IntervalTree.Interval 61: 17513 560416 java.security.CodeSource 62: 22026 528624 java.lang.Double 63: 21740 521760 java.util.concurrent.LinkedBlockingDeque$Node 64: 18782 486728 [[J 65: 18660 447840 org.apache.cassandra.utils.BloomFilter 66: 18660 447840 org.apache.cassandra.utils.obs.OpenBitSet 67: 18529 444696 org.apache.cassandra.db.compaction.PrecompactedRow 68: 3717 444688 java.lang.Class 69: 18528 444672 org.apache.cassandra.db.ColumnIndexer$RowHeader 70: 18528 444672 org.apache.cassandra.utils.BloomCalculations$BloomSpecification 71: 24572 393152 java.lang.Object 72: 5753 335568 [S 73: 7326 293040 java.util.TreeMap$Entry 74: 17817 285072 java.util.HashSet 75: 5583 282488 [[I 76: 17514 280224 java.security.ProtectionDomain$Key 77: 17514 280224 [Ljava.security.Principal; 78: 6612 264480 org.apache.cassandra.service.WriteResponseHandler 79: 7862 251584 org.apache.cassandra.db.RowMutation 80: 10083 241992 org.apache.cassandra.db.EchoedRow 81: 6110 195520 org.apache.cassandra.utils.ExpiringMap$CacheableObject 82: 311 179136 83: 2927 163912 org.apache.cassandra.thrift.TBinaryProtocol 84: 6563 157512 org.apache.cassandra.net.Message 85: 6289 150936 org.apache.cassandra.net.Header 86: 6110 146640 org.apache.cassandra.net.CallbackInfo 87: 5996 143904 java.util.concurrent.LinkedBlockingQueue$Node 88: 8004 128064 java.util.HashMap$EntrySet 89: 7885 126160 java.util.TreeMap$Values 90: 7790 124640 java.util.concurrent.atomic.AtomicInteger 91: 5132 123168 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder 92: 6640 106240 org.apache.cassandra.utils.SimpleCondition 93: 6626 106016 java.util.HashMap$Values 94: 830 92960 java.net.SocksSocketImpl 95: 1076 86080 java.lang.reflect.Method 96: 4176 66816 java.lang.Integer 97: 2005 64160 java.lang.ThreadLocal$ThreadLocalMap$Entry 98: 1592 63680 java.lang.ref.SoftReference 99: 1586 63440 com.google.common.collect.SingletonImmutableMap 100: 1743 55776 java.util.TreeMap$ValueIterator 101: 470 48880 java.lang.Thread 102: 1469 47008 java.util.concurrent.locks.AbstractQueuedSynchronizer$Node 103: 1041 41640 java.lang.ref.Finalizer 104: 1002 40080 java.util.LinkedHashMap$Entry 105: 1603 38472 com.google.common.collect.SingletonImmutableSet 106: 1586 38064 com.google.common.collect.ImmutableEntry 107: 435 34800 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; 108: 970 31040 java.net.Inet4Address 109: 425 30600 java.lang.reflect.Constructor 110: 396 28512 java.lang.reflect.Field 111: 825 26400 java.net.Socket 112: 1050 25200 java.util.concurrent.atomic.AtomicLong 113: 687 21984 java.util.concurrent.SynchronousQueue$TransferStack$SNode 114: 387 21672 sun.security.provider.MD5 115: 451 21648 java.util.concurrent.ThreadPoolExecutor$Worker 116: 442 21216 java.net.SocketInputStream 117: 525 21000 org.apache.cassandra.thrift.TCustomSocket 118: 514 20560 org.apache.cassandra.db.CounterColumn 119: 639 20448 120: 353 19768 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT 121: 762 18288 java.io.FileDescriptor 122: 570 18240 org.apache.thrift.transport.TFramedTransport 123: 939 18200 [Ljava.lang.Class; 124: 546 17472 java.util.concurrent.locks.ReentrantLock$NonfairSync 125: 532 17464 [Ljava.lang.String; 126: 527 16864 java.util.Hashtable$Entry 127: 298 16688 org.apache.cassandra.service.ClientState$1 128: 298 16688 org.apache.cassandra.service.ClientState$2 129: 405 16200 org.apache.cassandra.thrift.Column 130: 323 15504 java.net.SocketOutputStream 131: 298 14304 org.apache.cassandra.service.ClientState 132: 585 14040 org.apache.thrift.transport.TMemoryInputTransport 133: 570 13680 org.apache.thrift.TByteArrayOutputStream 134: 284 13632 sun.nio.cs.UTF_8$Encoder 135: 405 12960 org.apache.cassandra.thrift.ColumnOrSuperColumn 136: 324 12960 java.io.BufferedInputStream 137: 530 12720 org.apache.cassandra.utils.EstimatedHistogram 138: 525 12600 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess 139: 388 12416 java.security.MessageDigest$Delegate 140: 129 12384 org.apache.cassandra.io.sstable.SSTableReader 141: 305 12200 java.util.concurrent.ConcurrentHashMap$Segment 142: 376 12032 org.apache.cassandra.db.columniterator.SSTableSliceIterator 143: 47 10904 [Z 144: 334 10688 org.apache.cassandra.net.OutboundTcpConnection$Entry 145: 330 10560 java.lang.ref.WeakReference 146: 435 10440 java.lang.ThreadLocal$ThreadLocalMap 147: 426 10224 java.util.BitSet 148: 314 10048 org.apache.cassandra.net.MessageDeliveryTask ... [removed anything taking less than 10K] Total 72302201 2937460496 , 3.1156640 secs] 15775.902: [CMSCMS: Large block 0x0000000714aace48 : 2751863K->2568755K(6340608K), 12.0184460 secs]15787.921: [Class Histogram (after full gc): num #instances #bytes class name ---------------------------------------------- 1: 8644126 451299632 [B 2: 8434549 404858352 java.nio.HeapByteBuffer 3: 7759859 310394360 java.math.BigInteger 4: 10686 293655512 [J 5: 7763265 248560200 [I 6: 8630461 207131064 java.lang.Long 7: 7759615 186230760 org.apache.cassandra.db.DecoratedKey 8: 7759483 124151728 org.apache.cassandra.dht.BigIntegerToken 9: 1825 64173960 [Ljava.lang.Object; 10: 1096613 35091616 java.util.concurrent.ConcurrentHashMap$HashEntry 11: 1092266 34952512 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node 12: 1092266 26214384 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue 13: 1092266 26214384 org.apache.cassandra.cache.KeyCacheKey 14: 532463 25558224 edu.stanford.ppl.concurrent.SnapTreeMap$Node 15: 221414 21255744 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch 16: 411588 16463520 org.apache.cassandra.db.ExpiringColumn 17: 341705 16401840 edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder 18: 305 8691760 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 19: 442919 7086704 java.util.concurrent.atomic.AtomicReference 20: 221414 7085248 edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch 21: 221414 7085248 edu.stanford.ppl.concurrent.SnapTreeMap 22: 221515 5316360 java.util.concurrent.ConcurrentSkipListMap$Node 23: 221418 5314032 org.apache.cassandra.db.ColumnFamily 24: 221418 5314032 org.apache.cassandra.db.ISortedColumns$DeletionInfo 25: 221414 5313936 org.apache.cassandra.db.AtomicSortedColumns$Holder 26: 35593 5023912 27: 35593 4850696 28: 3382 3862240 29: 116355 3723360 org.apache.cassandra.db.DeletedColumn 30: 221414 3542624 org.apache.cassandra.db.AtomicSortedColumns 31: 221414 3542624 edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr 32: 111752 2682048 java.util.concurrent.ConcurrentSkipListMap$Index 33: 3382 2433816 34: 2971 2279488 35: 3533 1462144 36: 11281 891344 [C 37: 3710 443904 java.lang.Class 38: 12509 400288 java.lang.String 39: 23066 369056 java.lang.Object 40: 5742 334480 [S 41: 5417 275792 [[I 42: 8620 206880 java.lang.Double 43: 8596 206304 java.util.concurrent.LinkedBlockingDeque$Node 44: 311 179136 45: 1631 139152 [Ljava.util.HashMap$Entry; 46: 4006 128192 org.apache.cassandra.db.Column 47: 3630 116160 java.util.HashMap$Entry 48: 1066 85280 java.lang.reflect.Method 49: 1310 62880 java.util.HashMap 50: 509 57008 java.net.SocksSocketImpl 51: 1178 47120 java.lang.ref.SoftReference 52: 228 40752 [[J 53: 411 29592 java.lang.reflect.Constructor 54: 703 28120 java.lang.ref.Finalizer 55: 367 26424 java.lang.reflect.Field 56: 248 25792 java.lang.Thread 57: 804 25728 java.lang.ThreadLocal$ThreadLocalMap$Entry 58: 554 22160 java.util.LinkedHashMap$Entry 59: 431 20688 java.net.SocketInputStream 60: 514 20560 org.apache.cassandra.db.CounterColumn 61: 835 20040 java.util.concurrent.atomic.AtomicLong 62: 912 17768 [Ljava.lang.Class; 63: 546 17472 java.util.concurrent.locks.ReentrantLock$NonfairSync 64: 524 17272 [Ljava.lang.String; 65: 535 17120 java.net.Inet4Address 66: 1068 17088 java.lang.Integer 67: 213 17040 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; 68: 708 16992 java.util.ArrayList 69: 1043 16688 java.util.concurrent.atomic.AtomicInteger 70: 684 16416 java.io.FileDescriptor 71: 506 16192 java.util.Hashtable$Entry 72: 504 16128 java.net.Socket 73: 184 15296 [Ljava.util.Hashtable$Entry; 74: 264 14784 org.apache.cassandra.thrift.TBinaryProtocol 75: 305 12200 java.util.concurrent.ConcurrentHashMap$Segment 76: 506 12144 org.apache.cassandra.utils.EstimatedHistogram 77: 201 11256 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT 78: 117 11232 org.apache.cassandra.io.sstable.SSTableReader 79: 229 10992 java.util.concurrent.ThreadPoolExecutor$Worker 80: 47 10904 [Z 81: 320 10240 java.lang.ref.WeakReference 82: 319 10208 java.util.Vector 83: 181 10136 sun.security.provider.MD5 ... [removed anything taking up less than 10K] Total 65302653 2582483384 , 2.5709140 secs] 2951753K->2568755K(8183808K), [CMS Perm : 21044K->21000K(35192K)]After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 482780727 Max Chunk Size: 482780727 Number of Blocks: 1 Av. Block Size: 482780727 Tree Height: 1 After GC: Statistics for BinaryTreeDictionary: ------------------------------------ Total Free Space: 0 Max Chunk Size: 0 Number of Blocks: 0 Tree Height: 0 , 17.7063220 secs] [Times: user=16.57 sys=0.05, real=17.70 secs] Heap after GC invocations=1233 (full 21): par new generation total 1843200K, used 0K [0x00000005fae00000, 0x0000000677e00000, 0x0000000677e00000) eden space 1638400K, 0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000065ee00000) from space 204800K, 0% used [0x000000065ee00000, 0x000000065ee00000, 0x000000066b600000) to space 204800K, 0% used [0x000000066b600000, 0x000000066b600000, 0x0000000677e00000) concurrent mark-sweep generation total 6340608K, used 2568755K [0x0000000677e00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 35192K, used 21000K [0x00000007fae00000, 0x00000007fd05e000, 0x0000000800000000) } Total time for which application threads were stopped: 17.7081220 seconds
이 굉장한 피드백이다. 우리의 작업량은 쓰기가 많습니다 (cfstats에 따르면 2 억 회에 대한 10 억 회의 쓰기). 우리는 수천 개의 여러 칼럼의 CF를 가지고 있지만 거의 읽혀지지 않았습니다. 행 캐시 설정이 없습니다 (JMX에서는 0이 사용됩니다). 우리의 키 캐시는 32MB 정도입니다. 우리는 주문 보존 키를 사용하지 않으므로 48 개의 하드 코딩 된 값이 우리에게 영향을 미치지 않는다고 가정합니다 (md5 키의 크기이기 때문에). 우리가 그 넓은 행을 검색 할 때 우리는 1000 개의 열 배치에서 검색합니다. 그래서 그렇게 될 수 있습니다. 히스토그램을 활성화하고 더 기웃 거리게됩니다. 감사! – Alcanzar
기쁜 마음으로 도와 줬습니다. 그래,이 문제가 절대로 발생해서는 안됩니다. 나는 네가 옳은 길을 가고 있다고 생각한다. 그 넓은 행을 읽는 것이 도움이 될 수 있습니다. 특히 동시 인스턴스가있는 경우. 높은 일관성 수준은 또한이 문제를 일으키는 노드를 더 많이 만듭니다. 그것은 또한 볼 가치가있는 것입니다. 예를 들어, LOCAL_QUORUM에서 읽고 2 노드라면 2 노드가 같은 시간에 일시 중지됩니다. – Arya
우리는이 문제를 완화하기 위해 다음과 같이 변경했습니다. 1) 최대 지속 임계 값을 32로 변경했습니다. 2) NEWSIZE를 2000M으로 증가 시켰습니다. 3) 모든 테이블에서 주요 압축을 강제했습니다. 4) 100 만분의 일괄 처리 크기를 통과 한 일부 코드가 변경되었습니다. 우리의 대형 테이블 중 2 개에서 정기적 인 복구가 실행되지 않아 WAN에서 문제가 발생합니다. 테이블에있는 항목 중 약 66 %가 삭제 표시입니다. 압축 후에 테이블은 크기의 절반보다 작고 삭제 표시는 길게 나옵니다. 해결책을 찾았던이 대답을 받아 들였습니다. 1.2 또는 그 이상으로 업그레이 드를 보는 것 또한 – Alcanzar