Longa incidental jovem coleção de lixo faz uma pausa

Nós ajustamos nosso GC para pausas mínimas de "parar o mundo". As gerações de Perm e Tenured se comportam bem. Young funciona bem na maior parte do tempo, e as pausas normalmente não excedem os 500 ms (nota [Times: user = 0.35 sys = 0.02, real = 0.06 segs]):

{Heap before GC invocations=11603 (full 60):
 par new generation   total 3640320K, used 3325226K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
  eden space 3235840K, 100% used [0x0000000600000000, 0x00000006c5800000, 0x00000006c5800000)
  from space 404480K,  22% used [0x00000006de300000, 0x00000006e3a4a898, 0x00000006f6e00000)
  to   space 404480K,   0% used [0x00000006c5800000, 0x00000006c5800000, 0x00000006de300000)
 concurrent mark-sweep generation total 4147200K, used 1000363K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
 concurrent-mark-sweep perm gen total 196608K, used 133030K [0x00000007f4000000, 0x0000000800000000, 0x0000000800000000)
2012-07-16T14:36:05.641+0200: 427048.412: [GC 427048.412: [ParNew
Desired survivor size 207093760 bytes, new threshold 4 (max 4)
- age   1:    8688880 bytes,    8688880 total
- age   2:   12432832 bytes,   21121712 total
- age   3:   18200488 bytes,   39322200 total
- age   4:   20473336 bytes,   59795536 total
: 3325226K->75635K(3640320K), 0.0559610 secs] 4325590K->1092271K(7787520K), 0.0562630 secs] [Times: user=0.35 sys=0.02, real=0.06 secs] 
Heap after GC invocations=11604 (full 60):
 par new generation   total 3640320K, used 75635K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
  eden space 3235840K,   0% used [0x0000000600000000, 0x0000000600000000, 0x00000006c5800000)
  from space 404480K,  18% used [0x00000006c5800000, 0x00000006ca1dcf40, 0x00000006de300000)
  to   space 404480K,   0% used [0x00000006de300000, 0x00000006de300000, 0x00000006f6e00000)
 concurrent mark-sweep generation total 4147200K, used 1016635K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
 concurrent-mark-sweep perm gen total 196608K, used 133030K [0x00000007f4000000, 0x0000000800000000, 0x0000000800000000)
}

No entanto, às vezes, normalmente uma vez por dia, ficamos excepcionalmente longos. Tempo de coleta de lixo jovem (nota [Horários: usuário = 0,41 sys = 0,01, real = 5,51 segundos]):

{Heap before GC invocations=7884 (full 37):
 par new generation   total 3640320K, used 3304448K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
  eden space 3235840K, 100% used [0x0000000600000000, 0x00000006c5800000, 0x00000006c5800000)
  from space 404480K,  16% used [0x00000006c5800000, 0x00000006c9b00370, 0x00000006de300000)
  to   space 404480K,   0% used [0x00000006de300000, 0x00000006de300000, 0x00000006f6e00000)
 concurrent mark-sweep generation total 4147200K, used 1967225K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
 concurrent-mark-sweep perm gen total 189100K, used 112825K [0x00000007f4000000, 0x00000007ff8ab000, 0x0000000800000000)
2012-07-15T01:23:25.474+0200: 293088.245: [GC 293093.636: [ParNew
Desired survivor size 207093760 bytes, new threshold 4 (max 4)
- age   1:   30210472 bytes,   30210472 total
- age   2:   11614600 bytes,   41825072 total
- age   3:    8591680 bytes,   50416752 total
- age   4:    7779600 bytes,   58196352 total
: 3304448K->73854K(3640320K), 0.1158280 secs] 5271674K->2046454K(7787520K), 0.1181990 secs] [Times: user=0.41 sys=0.01, real=5.51 secs] 
Heap after GC invocations=7885 (full 37):
 par new generation   total 3640320K, used 73854K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
  eden space 3235840K,   0% used [0x0000000600000000, 0x0000000600000000, 0x00000006c5800000)
  from space 404480K,  18% used [0x00000006de300000, 0x00000006e2b1fb40, 0x00000006f6e00000)
  to   space 404480K,   0% used [0x00000006c5800000, 0x00000006c5800000, 0x00000006de300000)
 concurrent mark-sweep generation total 4147200K, used 1972599K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
 concurrent-mark-sweep perm gen total 189100K, used 112825K [0x00000007f4000000, 0x00000007ff8ab000, 0x0000000800000000)
}

Abaixo está a saída relevante do jstat -gccause:

Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC                 
   293083.2  16.96   0.00  96.26  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      No GC               
   293083.2  16.96   0.00  96.26  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      No GC               
   293084.2  16.96   0.00  97.69  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      No GC               
   293085.3  16.96   0.00  98.32  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      No GC               
   293086.3  16.96   0.00  99.32  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      No GC               
Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC                 
   293087.3  16.96   0.00  99.55  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      No GC               
   293088.3  16.96   0.00 100.00  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      Allocation Failure  
   293089.3  16.96   0.00 100.00  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      Allocation Failure  
   293090.3  16.96   0.00 100.00  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      Allocation Failure  
   293091.3  16.96   0.00 100.00  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      Allocation Failure  
   293092.3  16.96   0.00 100.00  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      Allocation Failure  
   293093.3  16.96   0.00 100.00  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      Allocation Failure  
   293084.2  16.96   0.00 100.00  47.44  59.66   7884  772.326    74   10.893  783.219 unknown GCCause      Allocation Failure  
   293094.3   0.00  18.26   6.23  47.56  59.66   7885  772.442    74   10.893  783.334 unknown GCCause      No GC               
   293094.6   0.00  18.26   6.71  47.56  59.66   7885  772.442    74   10.893  783.334 unknown GCCause      No GC               
   293095.3   0.00  18.26   6.85  47.56  59.66   7885  772.442    74   10.893  783.334 unknown GCCause      No GC               
   293095.6   0.00  18.26   6.92  47.56  59.66   7885  772.442    74   10.893  783.334 unknown GCCause      No GC               
   293096.2   0.00  18.26   9.84  47.56  59.66   7885  772.442    74   10.893  783.334 unknown GCCause      No GC               
   293096.6   0.00  18.26  10.11  47.56  59.66   7885  772.442    74   10.893  783.334 unknown GCCause      No GC 

"Falha de alocação" aparece como a causa do GC em outros lugares também, mas sempre como uma única entrada. Quando se trata da sequência como essa, ela é associada a uma longa pausa do GC. Analisei as fontes do Oracle JVM e "Falha de alocação" parece uma situação bastante natural: o Young está sem espaço para novos dados e é hora de limpar. Eu verifiquei por qualquer ação intensiva de memória, inesperada no sistema antes da pausa acontecer, mas não encontrei nada suspeito.

Por favor, note que o Tempo de Coleta de Lixo Jovem não aumenta muito durante o tempo de pausa. Minha memória e configuração de GC são as seguintes (configurações de log omitidas):

-Xms6000m 
-Xmn2950m 
-Xmx6000m 
-XX:MaxPermSize=192m 
-XX:+UseConcMarkSweepGC 
-XX:+UseParNewGC 
-Dsun.rmi.dgc.client.gcInterval=9223372036854775807 
-Dsun.rmi.dgc.server.gcInterval=9223372036854775807
-XX:CMSInitiatingOccupancyFraction=50 
-XX:+CMSScavengeBeforeRemark 
-XX:+CMSClassUnloadingEnabled

Testado também com montes de 8000m e 12000m. Máquinas:

8-core com 16GB de memória24-core com 50GB de memória

Então, minha pergunta básica é: por que o ParNewGC incidentalmente se comporta dessa maneira?

questionAnswers(1)

yourAnswerToTheQuestion