There is not a lot of information on performance tuning ASP.NET 3.5, or any version of ASP.NET for that matter, especially for 64 bit servers, and recently I had the task of making a large scale Sitecore CMS website production ready with a view to handling around 2500 concurrent users. There's not much Sitecore documentation out there either, practically none about performance tuning in a production environment.
Anyway, I like a challenge! So, during a two day load testing session with frugal documentation, I scoured the internet, did some experiments, and had quite a lot of success at squeezing every last drop of performance out of ASP.NET 3.5 on a 64 bit server using a mixture of web.config and machine.config tweaks.
Although tuning is going to be different for every application the information in this post should be useful to a lot of other ASP.NET 2.0/3.5 production server deployments.
The application was an ASP.NET 3.5 website built using Sitecore 6.0 CMS populated with approximately 20,000 content pages that were shredded and held in a SQL Server 2005 database. The majority of pages in the website just displayed data direct from the database via the Sitecore API; a few peripheral pages called off to web services; there were only light updates to the database. The only quirk of this solution was that Sitecore has its own caching mechanism independent to the caching in ASP.NET, something that had to be factored in to the performance tuning.
The hardware consisted of one database server with four dual core Opteron processor containing 8GB of memory, and three web servers each with one dual core Operon processor containing 4GB memory. The database server had 3 x SCSI RAID 5 and 2 x SCSI RAID 1 and the three web servers each had 2 x SCSI RAID 1.
Both the database server and the three web servers were installed with 64 bit Windows Server 2003. IIS was configured on the web servers to run 64 bit ASP.NET. SQL Server Standard Edition 64 bit was installed on the database server.
Three load injectors were set up to simulate user load and windows performance monitor "perfmon" was used to gather information about each test run. Each test run was executed for 45 minutes and ramped up to a specific number of concurrent users. For ease of configuration, testing was done using only one of the web servers on the assumption that load balancing over three web servers would give three times the throughput (an invalid assumption if the database had been the bottleneck).
Each concurrent user had a journey consisting of ten pages "transactions" that were randomised in a way to exhibit real world usage. Each transaction included the page and any associated media such as stylesheets and images (the simulation respected client cache control headers sent from the server for these content types).
As a baseline, I just used the default installation of 64 bit Windows Server 2003, IIS and the ASP.NET 2.0 64 bit ISAPI (installed via the aspnet_regiis command line tool in the "Windows\Microsoft.NET\2.X\Framework64" folder). The necessary databases were backed up and restored to the server.
The first results were run using 200 concurrent users with an average transaction response time of 24 seconds. This was 100 times slower than the required 1 second response times needed for peak website usage which felt pretty daunting.
Performance Tweak #1
The perfmon results from the baseline showed the % CPU topping out at 100, # Induced GC rising from 0 to 8000 (this should remaing close to zero) and % time in GC averaging around 20 (this should be as close to zero as possible, but it's obviously going to increase with memory contention).
Changing the web.config from <compilation debug="true"> to <compilation debug="false"> doubled performance. The second results were run again using 200 concurrent users, this time with an average transaction response time of 12 seconds (instead of 24).
Performance Tweak #2
The perfmon results still showed the same resource bottlenecks as the previous run.
Changing the <prcocessModel> section in machine.config was the next step. On my default installation of Windows Server 2003 64 bit, the machine.config didn't have a <processModel> section so I had to add one.
The following settings brought the average transaction response time to 3 seconds (down from 12).
<httpRuntime maxRequestLength="16384" executionTimeout="600" requestLengthDiskThreshold="80" minFreeThreads="88" minLocalRequestFreeThreads="76" appRequestQueueLimit="5000" enableKernelOutputCache="true" enableVersionHeader="true" enable="true" shutdownTimeout="90" delayNotificationTimeout="5" waitChangeNotification="0" maxWaitChangeNotification="0" enableHeaderChecking="true" sendCacheControlHeader="true" apartmentThreading="false" />
<processModel enable="true" maxWorkerThreads="100" maxIoThreads="100" minWorkerThreads="1" minIoThreads="1" timeout="Infinite" idleTimeout="Infinite" requestLimit="Infinite" requestQueueLimit="5000" restartQueueLimit="10" memoryLimit="60" webGarden="false" userName="machine" password="AutoGenerate" logLevel="Errors" responseDeadlockInterval="00:03:00" responseRestartDeadlockInterval="00:03:00" serverErrorMessageFile="" pingFrequency="Infinite" pingTimeout="Infinite" maxAppDomains="2000" />
<setting name="Caching.DefaultDataCacheSize" value="20MB"/> <setting name="Caching.DefaultHtmlCacheSize" value="20MB"/>
Performance Tweak #3
The following changes brought the average transaction response time down to 1 second.
<httpRuntime minFreeThreads="88" minLocalRequestFreeThreads="76" /> <cache disableMemoryCollection = "false" disableExpiration = "false" privateBytesLimit = "2576980377" percentagePhysicalMemoryUsedLimit="0" privateBytesPollTime="00:00:30" />
<processModel enableKernelOutputCache="false" memoryLimit="70" />
<setting name="Caching.DefaultDataCacheSize" value="400MB"/> <setting name="Caching.DefaultHtmlCacheSize" value="200MB"/> <setting name="Caching.DefaultPathCacheSize" value="5MB"/> <setting name="Caching.DefaultRegistryCacheSize" value="5MB"/> <setting name="Caching.DefaultViewStateCacheSize" value="10MB"/> <setting name="Caching.DefaultXslCacheSize" value="10MB"/> <setting name="Caching.FastMediaCacheMapSize" value="10MB"/>
New Testing Baseline
Using the settings so far obtained, a new base line was measured for 400 users (instead of 200). Doubling the number of users from 200 to 400 actually reduced throughput by a factor of 4. So the new average transaction response time was 4 seconds for 400 users.
Performance Tweak #4
The following are enhancements specifically for Sitecore, and brought the average transaction response time from 4 seconds down to 1 second for 400 users making these changes an 8 times improvement over 200 users from performance tweak #3. This really goes to show how much Sitecore relies on caching for its speed as the changes are focussed on allocating larger cache sizes.
<setting name="Caching.AccessResultCacheSize" value="10MB"/> <setting name="Caching.StandardValues.DefaultCacheSize" value="10MB"/> <database id="web" > <cacheSizes hint="setting"> <data>400MB</data> <items>100MB</items> <paths>10MB</paths> <standardValues>10MB</standardValues> </cacheSizes> </database> <hooks> <param desc="Threshold">900MB</param> </hooks> <sites> <site name="website" htmlCacheSize="100MB" filteredItemsCacheSize="10MB" xslCacheSize="10MB" filteredItemsCacheSize="10MB" /> </sites> <cacheSizes> <sites> <website> <html>100MB</html> <registry>0</registry> <viewState>0</viewState> <xsl>10MB</xsl> </website> </sites> </cacheSizes> <setting name="Caching.AccessResultCacheSize" value="10MB"/> <fastCaches> <memoryCacheSize>5MB</memoryCacheSize> </fastCaches>
Simply applying a good runtime configuration and with some careful tuning, your application can yield massive performance improvements, in fact the application I was tuning ended up being a factor of 100 times faster with no code changes. It was a very rewarding but time consuming effort of tweaking and running simulated loads over a period of several days and hopefully anyone who reads this can enjoy the benefits.
I found the following MSDN article helpful when I was performance tuning (although the title of the MSDN article may not seem directly relevant, the formulas to calculate web.config and machine.config settings based on the number of CPUs and RAM proved invaluable):