HI @ Denis, вы правы, другой узел 10-83-85-122 не запустился и застрял в процессе запуска воспламенения, в результате чего он был удален с помощью автомасштабирования aws. Мой вопрос был, почему он не смог присоединиться к кластеру. Кроме того, перезапуск активного узла привел к его повторному подключению к сети. Есть ли какая-то причина, по которой он застрял, не мог получить что-либо из журналов только из-за того, что код после запуска зажигания застрял.

столкнулся с критической проблемой с Ignite на наших производственных серверах. У нас есть 2 экземпляра с размером кучи по 8 ГБ каждый. Иногда из-за длительной паузы или проблемы с сетью один из наших экземпляров останавливается. Это вызывает автоматическое масштабирование aws, чтобы запустить другой экземпляр. Это хорошо, но мы заметили, что в этом состоянии сетка становится нестабильной, и наши новые объекты зажигания никогда не могут присоединиться к топологии и зависать навсегда, заставляя новые автоматически масштабируемые экземпляры приходить снова и снова. Обходное решение для этого состоит в том, чтобы перезапустить другие экземпляры в кластер при этом заставляет узлы снова соединяться. Но в идеале в среде prod это должно происходить автоматически с автоматическим масштабированием. Также было добавлено более длительное время ожидания обнаружения сбоя, но это также не решает его полностью, и мы все еще иногда наблюдаем это.

Журналы, наблюдаемые на новых экземплярах, которые не появляются, как показано ниже. Использование версии IgGit равно 2,4, и режим кэширования вне режима используется для многораздельных кэшей. Наша сетка настраивается с использованием службы обнаружения tcp с использованием корзины s3.

У меня также есть несколько транзакционных кешей, которые блокируют на основе tryLocks.

evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]]
    2018-07-18 16:34:10.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
    2018-07-18 16:34:20.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
    2018-07-18 16:34:20.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Ready affinity version: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
    2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Last exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=247159314], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]]]], exchActions=ExchangeActions [startCaches=null, stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], affChangeMsg=null, initTs=1531931329576, centralizedAff=false, changeGlobalStateE=null, done=false, state=SRV, evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]
    2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.a.i.i.p.c.GridCachePartitionExchangeManager - First 10 pending exchange futures [total=0]
    2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Last 10 exchange futures (total: 1):
    2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=false]
    2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending transactions:
    2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending explicit locks:
    2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending cache futures:
    2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending atomic cache futures:
    2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending data streamer futures:
    2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending transaction deadlock detection futures:
    2018-07-18 16:34:20.547 UTC [FDPS] [grid-nio-worker-tcp-comm-3-#28%fdps%] [INFO ] [,] o.apache.ignite.internal.diagnostic - Exchange future waiting for coordinator response [crd=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0]]
    Remote node information:
    General node info [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, client=false, discoTopVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], time=12:34:20.537]
    Partitions exchange info [readyVer=AffinityTopologyVersion [topVer=29, minorTopVer=0]]
    Last initialized exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=30, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531930706210], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931660254, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=30, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531930706210], nodeId=ba6aba6c, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1921954756], init=false, lastVer=GridCacheVersion [topVer=0, order=1531930704443, nodeOrder=0], partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935479, nodeOrder=26], threadId=39726, id=559000, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=221, val=49583853497448469294730566354366524577617095530402283666, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547787212113, nodeOrder=26], threadId=39741, id=603904, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=288, val=49583853499611641578988037213538229804531966271996035234, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935487, nodeOrder=26], threadId=39740, id=558993, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=133, val=49583853497448469294730566354417299462040910024459419794, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935323, nodeOrder=26], threadId=39728, id=558949, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=1023, val=49583853497448469294730566353278491339963927967496667282, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935470, nodeOrder=26], threadId=39951, id=559009, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=556, val=49583853497448469294730566354226289182541798339977937042, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935497, nodeOrder=26], threadId=39683, id=558982, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=373, val=49583853497448469294730566354541818821461216966893109394, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935339, nodeOrder=26], threadId=39682, id=558941, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=156, val=49583853497448469294730566353353444740780034976328450194, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935358, nodeOrder=26], threadId=39936, id=558921, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=59, val=49583853497448469294730566353578304943228356208982229138, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandida... and 48550 skipped ...ead=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935486, nodeOrder=26], threadId=39894, id=558992, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=488, val=49583853497448469294730566354434224423515514832905306258, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935331, nodeOrder=26], threadId=39893, id=558948, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=570, val=49583853497448469294730566353289371672340459630069022866, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]]]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]]]], exchActions=null, affChangeMsg=null, initTs=1531930706210, centralizedAff=false, changeGlobalStateE=null, done=false, state=CRD, evtLatch=0, remaining=[ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=325602672]]
    Communication SPI statistics [rmtNode=7d5e83aa-736a-4190-8b64-7261db7382f6]
    Communication SPI recovery descriptors: 
    [key=ConnectionKey [nodeId=7d5e83aa-736a-4190-8b64-7261db7382f6, idx=0, connCnt=0], msgsSent=5, msgsAckedByRmt=0, msgsRcvd=7, lastAcked=0, reserveCnt=1, descIdHash=1972345954]
    Communication SPI clients: 
    [node=7d5e83aa-736a-4190-8b64-7261db7382f6, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, bytesRcvd=5740, bytesSent=77322, bytesRcvd0=853, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, igniteInstanceName=fdps, finished=false, hashCode=2068348067, interrupted=false, runner=grid-nio-worker-tcp-comm-3-#28%fdps%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=7, sentCnt=5, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=true, connectCnt=0, queueLimit=262144, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=7, sentCnt=5, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=true, connectCnt=0, queueLimit=262144, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.83.87.131:47100, rmtAddr=/10.83.89.183:34664, createTime=1531931330498, closeTime=0, bytesSent=77322, bytesRcvd=5740, bytesSent0=0, bytesRcvd0=853, sndSchedTime=1531931330498, lastSndTime=1531931500547, lastRcvTime=1531931660527, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@665c2413, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]], super=GridAbstractCommunicationClient [lastUsed=1531931330508, closed=false, connIdx=0]]]
    NIO sessions statistics:
    >> Selector info [idx=3, keysCnt=1, bytesRcvd=5740, bytesRcvd0=853, bytesSent=77322, bytesSent0=0]
    Connection info [in=true, rmtAddr=/10.83.89.183:34664, locAddr=/10.83.87.131:47100, msgsSent=5, msgsAckedByRmt=0, descIdHash=1972345954, unackedMsgs=[IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage], msgsRcvd=7, lastAcked=0, descIdHash=1972345954, bytesRcvd=5740, bytesRcvd0=853, bytesSent=77322, bytesSent0=0, opQueueSize=0]
    Exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531931329402], crd=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531931329402], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=980776600], init=false, lastVer=GridCacheVersion [topVer=0, order=1531931327875, nodeOrder=0], partReleaseFut=null, exchActions=null, affChangeMsg=null, initTs=0, centralizedAff=false, changeGlobalStateE=null, done=false, state=null, evtLatch=0, remaining=[], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=2138568466]]
    Local communication statistics:
    Communication SPI statistics [rmtNode=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9]
    Communication SPI recovery descriptors: 
    [key=ConnectionKey [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, idx=0, connCnt=-1], msgsSent=7, msgsAckedByRmt=0, msgsRcvd=6, lastAcked=0, reserveCnt=1, descIdHash=1891649612]
    Communication SPI clients: 
    Communication SPI clients: 
    [node=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=92833, bytesSent=5698, bytesRcvd0=15539, bytesSent0=853, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=fdps, finished=false, hashCode=2040212682, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#25%fdps%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=6, sentCnt=7, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=262144, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=6, sentCnt=7, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=262144, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.83.89.183:34664, rmtAddr=ip-10-83-87-131.ec2.internal/10.83.87.131:47100, createTime=1531931330468, closeTime=0, bytesSent=5698, bytesRcvd=92833, bytesSent0=853, bytesRcvd0=15539, sndSchedTime=1531931330468, lastSndTime=1531931660528, lastRcvTime=1531931660538, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@72024a61, directMode=true], GridConnectionBytesVerifyFilter], accepted=false]], super=GridAbstractCommunicationClient [lastUsed=1531931330468, closed=false, connIdx=0]]]
    NIO sessions statistics:
    >> Selector info [idx=0, keysCnt=1, bytesRcvd=92833, bytesRcvd0=15539, bytesSent=5698, bytesSent0=853]
    Connection info [in=false, rmtAddr=ip-10-83-87-131.ec2.internal/10.83.87.131:47100, locAddr=/10.83.89.183:34664, msgsSent=7, msgsAckedByRmt=0, descIdHash=1891649612, unackedMsgs=[GridDhtPartitionsSingleMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage], msgsRcvd=6, lastAcked=0, descIdHash=1891649612, bytesRcvd=92833, bytesRcvd0=15539, bytesSent=5698, bytesSent0=853, opQueueSize=0]
    2018-07-18 16:34:29.598 UTC [FDPS] [localhost-startStop-1] [WARN ] [,] o.a.i.i.p.c.GridCachePartitionExchangeManager - Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931669507, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931669507, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=247159314], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]]]], exchActions=ExchangeActions [startCaches=null, stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], affChangeMsg=null, initTs=1531931329576, centralizedAff=false, changeGlobalStateE=null, done=false, state=SRV, evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]]
    2018-07-18 16:34:30.537 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
    2018-07-18 16:34:40.537 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:

Информация о другом узле 10-83-85-122

Другой присоединяющийся узел так и не был запущен и застрял в фазе запуска зажигания. Журналы также не показывают узел, который должен подняться, или ip discovery, чтобы получить доступ. в конечном итоге привести к удалению узла через автоматическое масштабирование.

Транзакционные ошибки получены

javax.cache.CacheException: Failed to acquire lock for keys (primary node left grid, retry transaction if possible) [keys=[UserKeyCacheObjectImpl [part=281, 

Ответы на вопрос(1)

Ваш ответ на вопрос