Saturday 12 November 2016

Load testing on single server

Image result for load testing

Application: w3wp.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
   at System.Threading.Thread.InternalCrossContextCallback(System.Runtime.Remoting.Contexts.Context, IntPtr, Int32, System.Threading.InternalCrossContextDelegate, System.Object[])
   at System.Runtime.Remoting.Channels.CrossContextChannel.SyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage)
   at System.Runtime.Remoting.Proxies.RemotingProxy.CallProcessMessage(System.Runtime.Remoting.Messaging.IMessageSink, System.Runtime.Remoting.Messaging.IMessage, System.Runtime.Remoting.Contexts.ArrayWithSize, System.Threading.Thread, System.Runtime.Remoting.Contexts.Context, Boolean)
   at System.Runtime.Remoting.Proxies.RemotingProxy.InternalInvoke(System.Runtime.Remoting.Messaging.IMethodCallMessage, Boolean, Int32)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(System.Runtime.Remoting.Proxies.MessageData ByRef, Int32)
   at .....followed by application exception 

When I was running my load tests using JMeter against a single server (usually against a load balancer but for some reason wanted to know the)I happened to get this weird error repeatedly with 100% error rate in Jmeter. The application I was testing was a WCF service hosted on IIS.

My first instinct was to check if the error had occurred because of my recent changes, I checked the event viewer which told me that this error was occurring very rarely.

It took a lot of googling and ended up with nothing, so I decided to check on the IIS 7 configuration.

I found the root cause of the failure, it was due to few configurations that had to be changed in IIS.


Configuration:

IIS Queue Length: The default value is 1000, so after 1000 active connections any new connections will be served a 503 Error.

IIS rapid fail protection: By default the value is set as true. So after 5 application errors, the app pool goes down.

To sum it up, since I ran the load test for 'n' Users. The default IIS Queue size is overloaded with many requests crossing the default 1000 queue size which returns application errors our and the other config “Rapid Fail Protection” pulls down the app pool.

This error has occurred only because I load tested on a single server with a connection limit of 1000 in the IIS.

I increased the queue size to the max value of 9000 and disabled the Rapid Fail Protection and Ran the same LOAD TEST. It FIXED the problem and worked well as I saw a error rate of less than 0.3% for a very large sample and the above error never occurred again. Turns out it only occurred because I ran the load test against a single server. But it did point out to me that all the servers had a default value of 1000 Queue size and in case of an actual load all the servers would have gone crashing down. 

So to conclude the above error was a memory error I believe. If you faced the same error please post below on what you did to fix it.
Here are the configurations for IIS 7.0 as suggested by Microsoft. Be careful in whatever you need for your application, not all is good for you.

1 comment:

  1. Good find. Was there any considerable increase in the process memory when the Queue size was increased? Is there any relation between WCF throttling and IIS Queue Size? In case, If I host multiple services on multiple app pools in the same IIS instance, how will this change impact the other services/app pools?

    ReplyDelete