MaxClients in Apache and its effect on Tomcat during Full GC

The effect of MaxClients on the system

The operation environment of NHN services has a variety of Throttle valve-type options. These options are important for reliable service operation. Let's see how the MaxClients
option in Apache affects the system when Full GC has occurred in Tomcat.
Most developers know that "stop the world (STW) phenomenon" occurs when GC has occurred in Java . In particular, Java developers at NHN may have experienced faults caused by GC-related issues in Tomcat. Because Java Virtual Machine(JVM) manages the memory, Java-based systems cannot be free of the STW phenomenon caused by GC.
Several times a day, GC occurs in services you have developed and currently operate. In this situation, even if TTS caused by faults does not occur, services may return unexpected 503 errors to users.

Service Operation Environment

For their structural characteristics, Web services are suitable for scale-out rather than scale-up. So, generally, physical equipment is configured with Apache * 1 + Tomcat * n according to equipment performance. However, this article assumes an environment where Apache * 1 + Tomcat * 1 are installed on one host as shown in Figure 1 below for a convenient description.

service_opeation_environment_assumed_for_the_article.png

Figure 1: Service Operation Environment Assumed for the Article.
For reference, this article describes options in Apache 2.2.21 (prefork MPM), Tomcat 6.0.35, jdk 1.6.0_24 on CentOS 4.72 (32-bit) environment.
The total system memory is 2 GB and the Garbage Collector uses ParallelOldGC. TheAdaptiveSizePolicy
option is set to true by default and the heap size is set to 600m.
STW and HTTP 503
Let's assume that requests are flowing into Apache at 200 req/s and more than 10 httpd processes are running for service, even though this situation may depend on response time for requests. In this situation, assuming that the pause time at full GC is 1 second, what will happen if full GC occurs in Tomcat?
The first thing that hits your mind is that Tomcat will be paused by full GC without responding to all requests being processed. In this case, what will happen to Apache while Tomcat is paused and requests are not processed?
While Tomcat is paused, requests will continuously flow into Apache at 200 req/s. In general, before full GC occurs, responses to requests can be sent quickly by the service with only 10 or more httpd processes. However, because Tomcat is paused now, new httpd processes will continuously be created for new inflowing requests within the range allowed by theMaxClients
parameter value of the httpd.conf file. As the default value is 256, it will not care that the requests are inflowing at 200 req/s.
At this time, how about the newly created httpd processes?
Httpd processes forwards requests to Tomcat by using the idle connections in the AJP connection pool managed by the mod_jk module. If there is no idle connection, it will request to create new connections. However, because Tomcat is paused, the request to create new connections will be rejected. Therefore, these requests will be queued in the backlog queue as many as the size of the backlog queue, set in the AJP Connector of the server.xml file, allows.
If the number of queued requests exceed the size of the backlog queue, a Connection is Refused error will be returned to Apache and Apache will return the HTTP 503 error to users.
In the assumed situation, the default size of backlogs is 100 and the requests are flowing in at 200 req/s. Therefore, more than 100 requests will receive the 503 error for 1 second of the pause time caused by full GC.
In this situation, if full GC is over, sockets in the backlog queue are retrieved by Tomcat's acceptance and assigned to worker threads within the range allowed byMaxThreads
(defaults to 200) in order to process requests.
MaxClients and backlog
In this situation, **which option should be set in order to prevent the 503 error to users? **
First, we need to understand that the backlog value should be enough to accept requests flowing to Tomcat during the pause time in full GC. In other words, it should be set to at least 200 or greater.
Now, is there a problem in such configuration?
Let's repeat the above situation under the assumption that the backlog setting value has been increased to 200. This result is more serious as shown below.
The system memory usage is typically 50%. However, it rapidly increases to almost 100% when full GC occurs, causing a rapid increase of swap memory usage. Moreover, because the pause time of full GC increases from 1 second to 4 or more seconds, the system is down for that time and cannot respond to any requests.
In the first situation, only 100 or more requests received the 503 error. However, after increasing the backlog size to 200, more than 500 requests will be hung for 3 or more seconds and cannot receive responses.
This situation is a good example that shows more serious situations which may occur when you do not precisely understand the organic relations between settings, i.e., their impact on the system.
Then, why does this phenomenon occur?
The reason is the characteristics of theMaxClients
option.
Setting theMaxClients
value to a generous value does not matter. The most important thing in setting theMaxClients
option is that the total memory usage should be calculated not to exceed 80% even though httpd processes are created as much as the maximumMaxClients
value.
The swappiness value of the system is set to 60 (default). As such, when memory usage exceeds 80%, swap will actively occur.
Let's see why this characteristic causes the more serious situation described above.
When requests are flowing in at 200 req/s and Tomcat is paused by full GC, the backlog setting value is 200. Approximately, an additional 100 httpd processes can be created in Apache above the first case. In this situation, when the total memory usage exceeds 80%, the OS will actively use the swap memory area, and objects for GC will be moved from the old area of JVM to the swap area since the OS considers them unused for a long period.
Finally, when the swap area is used in GC, the pause time will rapidly increase. So, the number of httpd processes will increase, causing 100% of memory usage and the situation previously described will occur.
The difference between the two cases is only the backlog setting values: 100 vs. 200. Why did this situation occur only for 200?
The reason for the difference is the number of httpd processes created in these configurations. When the setting value is set to 100 and, full GC occurs, 100 requests for creating new connections are created and then are queued in the backlog queue. The other requests receive the connection refused error message and return the 503 error. Therefore, the number of total httpd processes will be slightly more than 100.
When the value is set to 200, then 200 requests for creating new connections can be accepted. Therefore, the number of total httpd processes will be more than 200 and the value exceeds the threshold that determines the occurrence of memory swap.
Then, by setting theMaxClients
option without considering the memory usage, the number of httpd processes rapidly increases with full GC, causing swap and degradation of the system performance.
If so, how can we determine theMaxClients
value, what is the threshold value for the current system situation?

Calculation Method of MaxClients Setting
As the total memory of the system is 2 GB, theMaxClients
value should be set to use no more than 80% of the memory (1.6 GB) in any situation in order to prevent performance degradation caused by the memory swap. In other words, the 1.6 GB memory should be shared and allocated to Apache, Tomcat, and agent-type programs, which are installed by default.
Let's assume that the agent-type programs, which are installed in the system by default, occupy the memory at about 200 m. For Tomcat, the heap size set to-Xmx
is 600m. Therefore, Tomcat will always occupy 725m (Perm Gen + Native Heap Area) based on the top RES (see the figure below). Finally, Apache can use 700m of the memory.
full_gc_tomcat_top_screen_of_test_system.png

Figure 2: Top Screen of Test System.
If so, what should the value of MaxClients
be with a memory of 700m?

It will be different according to the type and the number of loaded modules. However, for NHN Web services, which use Apache as a simple proxy, 4m (based on top RES) will be enough for one httpd process (see Figure 2). Therefore, the maximumMaxClients
value for 700m should be 175.
Conclusion
Reliable service configuration should decrease the system downtime under overload and send successful responses to requests within the allowable range. For Java-based Web services, you must check whether the service has been configured to reliably respond to the STW under full GC.
If theMaxClients
option is set to a large value, to respond to simple increase of user requests and against DDoS attacks, without considering the system memory usage, it loses its functionality as a throttle valve, causing bigger faults.
In this example, the best way to solve the problem is to expand the memory or the server, or set theMaxClients
option to 175 (in the above case) so that Apache returns the 503 error to requests that only exceed 175.
The situation in this article occurs within 3 to 5 seconds, so it cannot be checked by most monitoring tools which run at regular sampling intervals.

By Dongsoon Choi, Senior Engineer at Game Service Technical Support Team, NHN Corporation.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末毙籽,一起剝皮案震驚了整個(gè)濱河市屑柔,隨后出現(xiàn)的幾起案子拳话,更是在濱河造成了極大的恐慌,老刑警劉巖捎拯,帶你破解...
    沈念sama閱讀 222,627評(píng)論 6 517
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異盲厌,居然都是意外死亡玄渗,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 95,180評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門狸眼,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)藤树,“玉大人,你說(shuō)我怎么就攤上這事拓萌∷甑觯” “怎么了?”我有些...
    開封第一講書人閱讀 169,346評(píng)論 0 362
  • 文/不壞的土叔 我叫張陵微王,是天一觀的道長(zhǎng)屡限。 經(jīng)常有香客問(wèn)我,道長(zhǎng)炕倘,這世上最難降的妖魔是什么钧大? 我笑而不...
    開封第一講書人閱讀 60,097評(píng)論 1 300
  • 正文 為了忘掉前任刃永,我火速辦了婚禮凌那,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己铃在,他們只是感情好果善,可當(dāng)我...
    茶點(diǎn)故事閱讀 69,100評(píng)論 6 398
  • 文/花漫 我一把揭開白布嗤堰。 她就那樣靜靜地躺著冷溃,像睡著了一般。 火紅的嫁衣襯著肌膚如雪乓土。 梳的紋絲不亂的頭發(fā)上宪潮,一...
    開封第一講書人閱讀 52,696評(píng)論 1 312
  • 那天,我揣著相機(jī)與錄音趣苏,去河邊找鬼狡相。 笑死,一個(gè)胖子當(dāng)著我的面吹牛食磕,可吹牛的內(nèi)容都是我干的谣光。 我是一名探鬼主播,決...
    沈念sama閱讀 41,165評(píng)論 3 422
  • 文/蒼蘭香墨 我猛地睜開眼芬为,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼萄金!你這毒婦竟也來(lái)了?” 一聲冷哼從身側(cè)響起媚朦,我...
    開封第一講書人閱讀 40,108評(píng)論 0 277
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤氧敢,失蹤者是張志新(化名)和其女友劉穎,沒(méi)想到半個(gè)月后询张,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體孙乖,經(jīng)...
    沈念sama閱讀 46,646評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,709評(píng)論 3 342
  • 正文 我和宋清朗相戀三年份氧,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了唯袄。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 40,861評(píng)論 1 353
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡蜗帜,死狀恐怖恋拷,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情厅缺,我是刑警寧澤蔬顾,帶...
    沈念sama閱讀 36,527評(píng)論 5 351
  • 正文 年R本政府宣布,位于F島的核電站湘捎,受9級(jí)特大地震影響诀豁,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜窥妇,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 42,196評(píng)論 3 336
  • 文/蒙蒙 一舷胜、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧活翩,春花似錦烹骨、人聲如沸翻伺。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,698評(píng)論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至脸爱,卻和暖如春遇汞,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背簿废。 一陣腳步聲響...
    開封第一講書人閱讀 33,804評(píng)論 1 274
  • 我被黑心中介騙來(lái)泰國(guó)打工空入, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留,地道東北人族檬。 一個(gè)月前我還...
    沈念sama閱讀 49,287評(píng)論 3 379
  • 正文 我出身青樓歪赢,卻偏偏與公主長(zhǎng)得像,于是被迫代替她去往敵國(guó)和親单料。 傳聞我的和親對(duì)象是個(gè)殘疾皇子埋凯,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,860評(píng)論 2 361

推薦閱讀更多精彩內(nèi)容