Uploaded image for project: 'Norbert'
  1. NOR-19

ChannelPool closing channels prematurely

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects versions: None
    • Fix versions: None
    • Components: network
    • Labels:
      None
    • Environment:

      netty 3.5.11
      norbert 0784198286f09bae3d6d5b505da9330ff9c3054d

    • Sprint:

      Description

      We have recently increased our use of Norbert, which means more traffic on our cluster.
      Using default settings on server and client side, we have started to see ClosedChannelException in some cases. We have seen three different failure scenarios, but they all boil down to this:

      In ChannelPool, sendRequest will writeRequestToChannel(), which uses async write. Then it calls checkinChannel. This in turn will check if the channel should be closed or keept using reuseChannel. With default config, a channel is closed 30s after it is created.
      If it has expired, it calls poolEntry.channel.close().

      The problems this causes manifests themselfs in three ways:
      1. Request gets sent, but no response is received since the socket was closed. (most common)
      2. ClosedChannelException on write
      3. ClosedChannelException on close, since write buffer is not empty

      1 is very easy to reproduce, just set clientconfig closeChannelTimeMillis to 1ms and dispatch a few requests.
      Scenario 2 seems a bit trickier, but it happens almost every time my app is newly inited.
      Scenario 3 has only been seen in production, but stacktrace hints this is the problem.

      Solution: Not sure, I'm too unfamiliar with netty internals to get a solution from the top of my head.. But closing it after any pending requests have finished maybe?

      Workaround: set closeChannelTimeMillis to -1 or 0

      This seems to be the cause for NOR-16 Open as well.

      Stack traces:

      For scenario 1 there is no stack trace, the Future just doesn't resolve, since no response ever arrives.

      Scenario 2:

      07 Oct 2013 14:06:49,889 ERROR [New I/O  worker #1:] (com.linkedin.norbert.network.netty.ChannelPool:195) - IO exception for Node(1,localhost:40001,[],true,0x00000001), marking node offline
      07 Oct 2013 14:06:49,898  WARN [New I/O  worker #1:] (com.linkedin.norbert.network.netty.ClientChannelHandler:146) - Caught exception in network layer
      java.nio.channels.ClosedChannelException
      	at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265)
      	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:456)
      	at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:579)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:530)
      	at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:342)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processWriteTaskQueue(AbstractNioWorker.java:442)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:331)
      	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:724)
      
      07 Oct 2013 14:06:49,977 ERROR [http-49414-1:] (our.appcode.:66) - This is app logging code reporting unhandled ExecutionException
      …
      Caused by: java.nio.channels.ClosedChannelException
      	at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:265)
      	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:456)
      	at org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:579)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:530)
      	at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:342)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processWriteTaskQueue(AbstractNioWorker.java:442)
      	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:331)
      	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	... 1 more
      

      The async write fails, since the socket has been closed.

      scenario 3

      03 Oct 2013 15:01:56,590 ERROR http-195.20.100.202-8080-exec-3 com.linkedin.norbert.network.netty.ChannelPool - IO exception for Node(2,192.168.200.4:40002,[],true,0x00000001), marking node offline
      03 Oct 2013 15:01:56,600 ERROR http-195.20.100.202-8080-exec-3 our.appcode - This is app logging code reporting unhandled ExecutionException
      Caused by: java.nio.channels.ClosedChannelException
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:792)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:734)
              at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:99)
              at org.jboss.netty.handler.logging.LoggingHandler.handleDownstream(LoggingHandler.java:339)
              at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
              at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
              at org.jboss.netty.channel.Channels.close(Channels.java:820)
              at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
              at com.linkedin.norbert.network.netty.ChannelPool.com$linkedin$norbert$network$netty$ChannelPool$$checkinChannel(ChannelPool.scala:131)
              at com.linkedin.norbert.network.netty.ChannelPool.sendRequest(ChannelPool.scala:98)
              at com.linkedin.norbert.network.netty.NettyClusterIoClientComponent$NettyClusterIoClient.sendMessage(NettyClusterIoClientComponent.scala:41)
              at com.linkedin.norbert.network.common.BaseNetworkClient$class.doSendRequest(BaseNetworkClient.scala:155)
              at com.linkedin.norbert.network.netty.BaseNettyNetworkClient.doSendRequest(NettyNetworkClient.scala:39)
              at com.linkedin.norbert.network.client.NetworkClient$$anonfun$sendRequest$2.apply$mcV$sp(NetworkClient.scala:164)
              at com.linkedin.norbert.network.client.NetworkClient$$anonfun$sendRequest$2.apply(NetworkClient.scala:153)
              at com.linkedin.norbert.network.client.NetworkClient$$anonfun$sendRequest$2.apply(NetworkClient.scala:153)
              at com.linkedin.norbert.network.common.BaseNetworkClient$class.doIfConnected(BaseNetworkClient.scala:162)
              at com.linkedin.norbert.network.netty.BaseNettyNetworkClient.doIfConnected(NettyNetworkClient.scala:39)
              at com.linkedin.norbert.network.client.NetworkClient$class.sendRequest(NetworkClient.scala:153)
              at com.linkedin.norbert.network.netty.NettyNetworkClient.sendRequest(NettyNetworkClient.scala:130)
              at appcode (appcode:419)
      …..
      
      03 Oct 2013 15:01:56,601 WARN New I/O  worker #4 com.linkedin.norbert.network.netty.ClientChannelHandler - Caught exception in network layer
      java.nio.channels.ClosedChannelException
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:792)
              at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:734)
              at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:99)
              at org.jboss.netty.handler.logging.LoggingHandler.handleDownstream(LoggingHandler.java:339)
              at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
              at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
              at org.jboss.netty.channel.Channels.close(Channels.java:820)
              at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
              at com.linkedin.norbert.network.netty.ChannelPool.com$linkedin$norbert$network$netty$ChannelPool$$checkinChannel(ChannelPool.scala:131)
              at com.linkedin.norbert.network.netty.ChannelPool.sendRequest(ChannelPool.scala:98)
              at com.linkedin.norbert.network.netty.NettyClusterIoClientComponent$NettyClusterIoClient.sendMessage(NettyClusterIoClientComponent.scala:41)
              at com.linkedin.norbert.network.common.BaseNetworkClient$class.doSendRequest(BaseNetworkClient.scala:155)
              at com.linkedin.norbert.network.netty.BaseNettyNetworkClient.doSendRequest(NettyNetworkClient.scala:39)
              at com.linkedin.norbert.network.client.NetworkClient$$anonfun$sendRequest$2.apply$mcV$sp(NetworkClient.scala:164)
              at com.linkedin.norbert.network.client.NetworkClient$$anonfun$sendRequest$2.apply(NetworkClient.scala:153)
              at com.linkedin.norbert.network.client.NetworkClient$$anonfun$sendRequest$2.apply(NetworkClient.scala:153)
              at com.linkedin.norbert.network.common.BaseNetworkClient$class.doIfConnected(BaseNetworkClient.scala:162)
              at com.linkedin.norbert.network.netty.BaseNettyNetworkClient.doIfConnected(NettyNetworkClient.scala:39)
              at com.linkedin.norbert.network.client.NetworkClient$class.sendRequest(NetworkClient.scala:153)
              at com.linkedin.norbert.network.netty.NettyNetworkClient.sendRequest(NettyNetworkClient.scala:130)
              at appcode (appcode:419)
      …..
      

      The close fails, since the async write has not yet completed.

        Attachments

          Activity

            People

            • Assignee:
              jhartman Joshua Hartman
              Reporter:
              jstrom Johan Ström
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: