故事背景
我們的應(yīng)用是AWS云原生環(huán)境,之前在美國區(qū)域使用的是nacos-server.jar進(jìn)行安裝(ec2),后面開展新的區(qū)域部署時準(zhǔn)備將所有的ec2應(yīng)用全部轉(zhuǎn)換為ecs部署区宇,包括Nacos。我們參考Naocs官網(wǎng)Docker鏡像集群默認(rèn)方式部署,控制臺能夠正常方式并創(chuàng)建Namespace和congfig。但是到了應(yīng)用注冊時就報錯了:
2020-11-03 22:52:21.139 ERROR[main]com.alibaba.nacos.client.config.http.ServerHttpAgent.httpGet:122 -no available server
2020-11-03 22:52:21.140 ERROR[main]com.alibaba.nacos.client.config.impl.ClientWorker.getServerConfig:235 -[fixed-internal-myyshop-supplier-nacos-alb-2035283990.ap-northeast-1.elb.amazonaws.com_80-7f06f2ab-ea31-4ed9-853b-f520050140bb] [sub-server] get server config exception, dataId=gateway-server.yml, group=gateway-server, tenant=7f06f2ab-ea31-4ed9-853b-f520050140bb
java.net.ConnectException: no available server
at com.alibaba.nacos.client.config.http.ServerHttpAgent.httpGet(ServerHttpAgent.java:123)
at com.alibaba.nacos.client.config.http.MetricsHttpAgent.httpGet(MetricsHttpAgent.java:48)
at com.alibaba.nacos.client.config.impl.ClientWorker.getServerConfig(ClientWorker.java:230)
at com.alibaba.nacos.client.config.NacosConfigService.getConfigInner(NacosConfigService.java:143)
at com.alibaba.nacos.client.config.NacosConfigService.getConfig(NacosConfigService.java:92)
at com.alibaba.cloud.nacos.client.NacosPropertySourceBuilder.loadNacosData(NacosPropertySourceBuilder.java:85)
at com.alibaba.cloud.nacos.client.NacosPropertySourceBuilder.build(NacosPropertySourceBuilder.java:74)
at com.alibaba.cloud.nacos.client.NacosPropertySourceLocator.loadNacosPropertySource(NacosPropertySourceLocator.java:204)
at com.alibaba.cloud.nacos.client.NacosPropertySourceLocator.loadNacosDataIfPresent(NacosPropertySourceLocator.java:191)
at com.alibaba.cloud.nacos.client.NacosPropertySourceLocator.loadApplicationConfiguration(NacosPropertySourceLocator.java:145)
at com.alibaba.cloud.nacos.client.NacosPropertySourceLocator.locate(NacosPropertySourceLocator.java:103)
at org.springframework.cloud.bootstrap.config.PropertySourceLocator.locateCollection(PropertySourceLocator.java:52)
at org.springframework.cloud.bootstrap.config.PropertySourceLocator.locateCollection(PropertySourceLocator.java:47)
at org.springframework.cloud.bootstrap.config.PropertySourceBootstrapConfiguration.initialize(PropertySourceBootstrapConfiguration.java:98)
at org.springframework.boot.SpringApplication.applyInitializers(SpringApplication.java:626)
at org.springframework.boot.SpringApplication.prepareContext(SpringApplication.java:370)
最后還因為Nacos之間也無法互相注冊成功氏身,導(dǎo)致大量日志輸出,頻繁GC惑畴,最后Nacos服務(wù)宕機(jī)蛋欣。
查詢網(wǎng)絡(luò)以及官網(wǎng)issu也沒統(tǒng)一的處理意見,大多是說版本問題桨菜,我們從1.4.0降級到1.3.0也都沒法解決問題豁状。
問題解決
問題原因
通過仔細(xì)的排查日志捉偏、Nacos Server日志倒得,經(jīng)過小一個小時泻红,最終找到問題,原來是因為:
我們部署Cluster模式時使用的默認(rèn)配置霞掺,而默認(rèn)提供的參數(shù)列表不能覆蓋容器編排環(huán)境谊路。nacos在集群里取到了docker網(wǎng)卡地址,不能獲取正確的cluster server ip list
問題解決
改成Docker-standalone模式菩彬。數(shù)據(jù)持久化用mysql缠劝,使用aws負(fù)載均衡器能保證高可用,從而達(dá)到Nacos 集群的效果骗灶。