Bare Metal Service Load Balancers

Bare Metal Service Load Balancers

AKA "how to set up a bank of haproxy for platforms that don't have load balancers".

Disclaimer:

  • This is a work in progress.
  • A better way to achieve this will probably emerge once discussions on (#260, #561) converge.
  • Backends are pluggable, but Haproxy is the only loadbalancer with a working implementation.
  • I have never deployed haproxy to production, so contributions are welcome (see wishlist for ideas).
  • For fault tolerant load balancing of ingress traffic, you need:
    1. Multiple hosts running load balancers
    2. Multiple A records for each hostname in a DNS service.

This module will not help with the latter

Overview

Ingress

There are 2 ways to expose a service to ingress traffic in the current kubernetes service model:

  • Create a cloud load balancer.
  • Allocate a port (the same port) on every node in your cluster and proxy ingress traffic through that port to the endpoints.

The service-loadbalancer aims to give you 1 on bare metal, making 2 unnecessary for the common case. The replication controller manifest in this directly creates a service-loadbalancer pod on all nodes with the role=loadbalancer label. Each service-loadbalancer pod contains:

  • A load balancer controller that watches the kubernetes api for services and endpoints.
  • A load balancer manifest. This is used to bootstrap the load balancer. The load balancer itself is pluggable, so you can easily swap
    haproxy for something like f5 or pound.
  • A template used to write load balancer rules. This is tied to the loadbalancer used in the manifest, since each one has a different config format.

L7 load balancing of Http services: The load balancer controller automatically exposes http services to ingress traffic on all nodes with a role=loadbalancer label. It assumes all services are http unless otherwise instructed. Each http service gets a loadbalancer forwarding rule, such that requests received on http://loadbalancer-node/serviceName:port balanced between its endpoints according to the algorithm specified in the loadbalacer.json manifest. You do not need more than a single loadbalancer pod to balance across all your http services (you can scale the rc to increase capacity).

L4 loadbalancing of Tcp services: Since one needs to specify ports at pod creation time (kubernetes doesn't currently support port ranges), a single loadbalancer is tied to a set of preconfigured node ports, and hence a set of TCP services it can expose. The load balancer controller will dynamically add rules for each configured TCP service as it pops into existence. However, each "new" (unspecified in the tcpServices section of the loadbalancer.json) service will need you to open up a new container-host port pair for traffic. You can achieve this by creating a new loadbalancer pod with the targetPort set to the name of your service, and that service specified in the tcpServices map of the new loadbalancer.

Cross-cluster loadbalancing

On cloud providers that offer a private ip range for all instances on a network, you can setup multiple clusters in different availability zones, on the same network, and loadbalancer services across these zones. On GCE for example, every instance is a member of a single network. A network performs the same function that a router does: it defines the network range and gateway IP address, handles communication between instances, and serves as a gateway between instances and other networks. On such networks the endpoints of a service in one cluster are visible in all other clusters in the same network, so you can setup an edge loadbalancer that watches a kubernetes master of another cluster for services. Such a deployment allows you to fallback to a different AZ during times of duress or planned downtime (eg: database update).

Examples

Initial cluster state:

$ kubectl get svc --all-namespaces -o yaml  | grep -i "selfLink"
    selfLink: /api/v1/namespaces/default/services/kubernetes
    selfLink: /api/v1/namespaces/default/services/nginxsvc
    selfLink: /api/v1/namespaces/kube-system/services/elasticsearch-logging
    selfLink: /api/v1/namespaces/kube-system/services/kibana-logging
    selfLink: /api/v1/namespaces/kube-system/services/kube-dns
    selfLink: /api/v1/namespaces/kube-system/services/kube-ui
    selfLink: /api/v1/namespaces/kube-system/services/monitoring-grafana
    selfLink: /api/v1/namespaces/kube-system/services/monitoring-heapster
    selfLink: /api/v1/namespaces/kube-system/services/monitoring-influxdb

These are all the cluster addon services in namespace=kube-system.

Create a loadbalancer

  • Loadbalancers are created via a ReplicationController.
  • Load balancers will only run on nodes with the role=loadbalancer label.
$ kubectl create -f ./rc.yaml
replicationcontrollers/service-loadbalancer
$  kubectl get pods -l app=service-loadbalancer
NAME                         READY     STATUS    RESTARTS   AGE
service-loadbalancer-dapxv   0/2       Pending   0          1m
$ kubectl describe pods -l app=service-loadbalancer
Events:
  FirstSeen                                    From            Reason                  Message
  Tue, 21 Jul 2015 11:19:22 -0700              {scheduler }    failedScheduling        Failed for reason MatchNodeSelector and possibly others

Notice that the pod hasn't started because the scheduler is waiting for you to tell it which nodes to use as a load balancer.

$ kubectl label node e2e-test-beeps-minion-c9up role=loadbalancer
NAME                         LABELS                                                                STATUS
e2e-test-beeps-minion-c9up   kubernetes.io/hostname=e2e-test-beeps-minion-c9up,role=loadbalancer   Ready

Expose services

Let's create 3 services (HTTP, HTTPS and TCP) to test the loadbalancer.

HTTP

You can use the https-nginx example to create some new HTTP/HTTPS services.

$ cd ../../examples/https-nginx
$ make keys secret KEY=/tmp/nginx.key CERT=/tmp/nginx.crt SECRET=/tmp/secret.json
$ kubectl create -f /tmp/secret.json
$ kubectl get secrets
NAME                  TYPE                                  DATA
default-token-vklfs   kubernetes.io/service-account-token   2
nginxsecret           Opaque                                2

Lets introduce a small twist. The nginx-app example exposes the nginx service using NodePort, which means it opens up a random port on every node in your cluster and exposes the service on that. Delete the type: NodePort line before creating it.

$ kubectl create -f nginx-app.yaml
$ kubectl get svc
NAME         LABELS                                    SELECTOR    IP(S)         PORT(S)
kubernetes   component=apiserver,provider=kubernetes   <none>      10.0.0.1      443/TCP
nginxsvc     app=nginx                                 app=nginx   10.0.79.131   80/TCP
                                                                                 443/TCP
$ curl http://104.197.63.17/nginxsvc

HTTPS

HTTPS services are handled at L4 (see wishlist)

$ curl https://104.197.63.17:8080 -k

A couple of points to note:

  • The nginxsvc is specified in the tcpServices of the loadbalancer.json manifest.
  • The https service is accessible directly on the specified port, which matches the service port.
  • You need to take care of ensuring there is no collision between these service ports on the node.

SSL Termination

To terminate SSL for a service you just need to annotate the service with serviceloadbalancer/lb.sslTerm: "true" as seen below. This will cause your service to be served behind /{service-name} or /{service-name}:{port} if not running on port 80. This mimics the standard http functionality.

metadata:
  name: myservice
  annotations:
    serviceloadbalancer/lb.sslTerm: "true"
  labels:
  • Create a secret with one of your favorite tool

  • Add secrets to your loadbalancer pod

        ports:
        # All http services
        - containerPort: 80
          hostPort: 80
          protocol: TCP
        # ssl term
        - containerPort: 443
          hostPort: 443
          protocol: TCP
        # haproxy stats
        - containerPort: 1936
          hostPort: 1936
          protocol: TCP
        resources: {}
        volumeMounts:
        - mountPath: "/ssl"
          name: secret-volume
    volumes:
    - name: secret-volume
      secret:
        secretName: my-secret
       
  • Add your SSL configuration to loadbalancer pod
      args:
      - --ssl-cert=/ssl/crt.pem
      - --ssl-ca-cert=/ssl/ca.crt
      - --namespace=default
Custom ACL
  • Adding the aclMatch annotation will allow you to serve the service on a specific path although URLs will not be rewritten back to root. The following will cause your service to be available at /test and your web service will be passed the url with /test on the front.
 metadata:
   name: myservice
   annotations:
     serviceloadbalancer/lb.sslTerm: "true"
     serviceloadbalancer/lb.aclMatch: "-i /test"
   labels:

TCP

$ cat mysql-app.yaml
apiVersion: v1
kind: Pod
metadata:
  name: mysql
  labels:
    name: mysql
spec:
  containers:
  - image: mysql
    name: mysql
    env:
    - name: MYSQL_ROOT_PASSWORD
      # Use secrets instead of env for passwords
      value: password
    ports:
    - containerPort: 3306
      name: mysql
    volumeMounts:
    # name must match the volume name below
    - name: mysql-storage
      # mount path within the container
      mountPath: /var/lib/mysql
  volumes:
  - name: mysql-storage
    emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: mysql
  name: mysql
spec:
  type: NodePort
  ports:
    # the port that this service should serve on
    - port: 3306
  # label keys and values that must match in order to receive traffic for this service
  selector:
    name: mysql

We'll create the service and access mysql from outside the cluster:

$ kubectl create -f mysql-app.yaml
$ kubeclt get svc
NAME         LABELS                                    SELECTOR    IP(S)         PORT(S)
kubernetes   component=apiserver,provider=kubernetes   <none>      10.0.0.1      443/TCP
nginxsvc     app=nginx                                 app=nginx   10.0.79.131   80/TCP
                                                                                 443/TCP
mysql        app=mysql                                 app=mysql   10.0.63.72    3306/TCP

$ mysql -u root -ppassword --host 104.197.63.17 --port 3306 -e 'show databases;'
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+

Cross-namespace loadbalancing

By default, the loadbalancer only listens for services in the default namespace. You can list available namespaces via:

$ kubectl get namespaces
NAME          LABELS    STATUS    AGE
default       <none>    Active    1d
kube-system   <none>    Active    1d

You can tell it to expose services on a different namespace through a command line argument. Currently, each namespace needs a different loadbalancer (see wishlist). Modify the rc.yaml file to supply the namespace argument by adding the following lines to the bottom of the loadbalancer spec:

args:
  - --tcp-services=mysql:3306,nginxsvc:443
  - --namespace=kube-system

Though the loadbalancer can watch services across namespaces you can't start 2 loadbalancers with the same name in a single namespace. So if you already have a loadbalancer running, either change the name of the rc, or change the namespace in rc.yaml:

$ kubectl create -f rc.yaml
$ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       NODE
service-loadbalancer-yofyv   1/1       Running   0          1m        e2e-test-beeps-minion-c9up

$ kubectl get nodes e2e-test-beeps-minion-c9up -o json | grep -i externalip -A 1
                "type": "ExternalIP",
                "address": "104.197.63.17"
$ curl http://104.197.63.17/kube-ui

Cross-cluster loadbalancing

First setup your 2 clusters, and a kubeconfig secret as described in the [sharing clusters example] (../../examples/sharing-clusters/README.md). We will create a loadbalancer in our first cluster (US) and have it publish the services from the second cluster (EU). This is the entire modified loadbalancer manifest:

apiVersion: v1
kind: ReplicationController
metadata:
  name: service-loadbalancer
  labels:
    app: service-loadbalancer
    version: v1
spec:
  replicas: 1
  selector:
    app: service-loadbalancer
    version: v1
  template:
    metadata:
      labels:
        app: service-loadbalancer
        version: v1
    spec:
      volumes:
      # token from the eu cluster, must already exist
      # and match the name of the volume using in container
      - name: eu-config
        secret:
          secretName: kubeconfig
      nodeSelector:
        role: loadbalancer
      containers:
      - image: gcr.io/google_containers/servicelb:0.4
        imagePullPolicy: Always
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 30
          timeoutSeconds: 5
        name: haproxy
        ports:
        # All http services
        - containerPort: 80
          hostPort: 80
          protocol: TCP
        # nginx https
        - containerPort: 443
          hostPort: 8080
          protocol: TCP
        # mysql
        - containerPort: 3306
          hostPort: 3306
          protocol: TCP
        # haproxy stats
        - containerPort: 1936
          hostPort: 1936
          protocol: TCP
        resources: {}
        args:
        - --tcp-services=mysql:3306,nginxsvc:443
        - --use-kubernetes-cluster-service=false
        # use-kubernetes-cluster-service=false in conjunction with the
        # kube/config will force the service-loadbalancer to watch for
        # services form the eu cluster.
        volumeMounts:
        - mountPath: /.kube
          name: eu-config
        env:
        - name: KUBECONFIG
          value: /.kube/config

Note that it is essentially the same as the rc.yaml checked into the service-loadbalancer directory expect that it consumes the kubeconfig secret as an extra KUBECONFIG environment variable.

$ kubectl config use-context <us-clustername>
$ kubectl create -f rc.yaml
$ kubectl get pods -o wide
service-loadbalancer-5o2p4   1/1       Running   0          13m       kubernetes-minion-5jtd
$ kubectl get node kubernetes-minion-5jtd -o json | grep -i externalip -A 2
                "type": "ExternalIP",
                "address": "104.197.81.116"
$ curl http://104.197.81.116/nginxsvc
Europe

Advanced features

Troubleshooting:

  • If you can curl or netcat the endpoint from the pod (with kubectl exec) and not from the node, you have not specified hostport and containerport.
  • If you can hit the ips from the node but not from your machine outside the cluster, you have not opened firewall rules for the right network.
  • If you can't hit the ips from within the container, either haproxy or the service_loadbalacer script is not running.
    1. Use ps in the pod
    2. sudo restart haproxy in the pod
    3. cat /etc/haproxy/haproxy.cfg in the pod
    4. try kubectl logs haproxy
    5. run the service_loadbalancer with --dry
  • Check http://<node_ip>:1936 for the stats page. It requires the password used in the template file.
  • Try talking to haproxy on the stats socket directly on the container using kubectl exec, eg: echo “show info” | socat unix-connect:/tmp/haproxy stdio
  • Run the service_loadbalancer with the flag --syslog to append the haproxy log as part of the pod stdout. Use kubectl logs to check the
    status of the services or stats about the traffic

Wishlist:

  • Allow services to specify their url routes (see openshift routes)

  • Scrape :1926 and scale replica count of the loadbalancer rc from a helper pod (this is basically ELB)

  • Scrape :1936/;csv and autoscale services

  • Better https support. 3 options to handle ssl:

    1. Pass Through: Load balancer drops down to L4 balancing and forwards TCP encrypted packets to destination.
    2. Redirect: All traffic is https. HTTP connections are encrypted using load balancer certs.

    Currently you need to trigger TCP loadbalancing for your https service by specifying it in loadbalancer.json. Support for the other 2 would be nice.

  • Multinamespace support: Currently the controller only watches a single namespace for services.

  • Support for external services (eg: amazon rds)

  • Dynamically modify loadbalancer.json. Will become unnecessary when we have a loadbalancer resource.

  • Headless services: I just didn't think people would care enough about this.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市揪阶,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌忘巧,老刑警劉巖九秀,帶你破解...
    沈念sama閱讀 217,734評(píng)論 6 505
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡伟葫,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,931評(píng)論 3 394
  • 文/潘曉璐 我一進(jìn)店門(mén)院促,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)筏养,“玉大人,你說(shuō)我怎么就攤上這事常拓〗ト埽” “怎么了?”我有些...
    開(kāi)封第一講書(shū)人閱讀 164,133評(píng)論 0 354
  • 文/不壞的土叔 我叫張陵弄抬,是天一觀的道長(zhǎng)茎辐。 經(jīng)常有香客問(wèn)我,道長(zhǎng)掂恕,這世上最難降的妖魔是什么拖陆? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 58,532評(píng)論 1 293
  • 正文 為了忘掉前任,我火速辦了婚禮懊亡,結(jié)果婚禮上依啰,老公的妹妹穿的比我還像新娘。我一直安慰自己店枣,他們只是感情好速警,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,585評(píng)論 6 392
  • 文/花漫 我一把揭開(kāi)白布。 她就那樣靜靜地躺著鸯两,像睡著了一般坏瞄。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上甩卓,一...
    開(kāi)封第一講書(shū)人閱讀 51,462評(píng)論 1 302
  • 那天鸠匀,我揣著相機(jī)與錄音,去河邊找鬼逾柿。 笑死缀棍,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的机错。 我是一名探鬼主播爬范,決...
    沈念sama閱讀 40,262評(píng)論 3 418
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼弱匪!你這毒婦竟也來(lái)了青瀑?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 39,153評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎斥难,沒(méi)想到半個(gè)月后枝嘶,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 45,587評(píng)論 1 314
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡哑诊,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,792評(píng)論 3 336
  • 正文 我和宋清朗相戀三年群扶,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片镀裤。...
    茶點(diǎn)故事閱讀 39,919評(píng)論 1 348
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡竞阐,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出暑劝,到底是詐尸還是另有隱情骆莹,我是刑警寧澤,帶...
    沈念sama閱讀 35,635評(píng)論 5 345
  • 正文 年R本政府宣布担猛,位于F島的核電站汪疮,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏毁习。R本人自食惡果不足惜智嚷,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,237評(píng)論 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望纺且。 院中可真熱鬧盏道,春花似錦、人聲如沸载碌。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 31,855評(píng)論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)嫁艇。三九已至朗伶,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間步咪,已是汗流浹背论皆。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 32,983評(píng)論 1 269
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留猾漫,地道東北人点晴。 一個(gè)月前我還...
    沈念sama閱讀 48,048評(píng)論 3 370
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像悯周,于是被迫代替她去往敵國(guó)和親粒督。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 44,864評(píng)論 2 354

推薦閱讀更多精彩內(nèi)容

  • PLEASE READ THE FOLLOWING APPLE DEVELOPER PROGRAM LICENSE...
    念念不忘的閱讀 13,471評(píng)論 5 6
  • 這本書(shū)可以說(shuō)很火禽翼,因?yàn)樗邆淞艘槐緯充N(xiāo)書(shū)該有的一些特質(zhì): 由名人撰寫(xiě)(彼得·蒂爾 paypal創(chuàng)始人屠橄,其實(shí)我對(duì)另外...
    齊滇大圣閱讀 745評(píng)論 0 3
  • 昨夜你來(lái)過(guò)族跛, 隔窗喃喃說(shuō)。 因伊夢(mèng)末成锐墙, 晨為賴(lài)床客礁哄! 幽幽簾弄風(fēng), 默默水成河贮匕。 又聞聲漸響, 固城云墨色花枫!
    化繭成蝶_219a閱讀 182評(píng)論 0 0
  • 七月分的尾巴你是獅子座 八月份的前奏你是獅子座 雖然我不是獅子座 但是暑假就像獅子座一樣 在不知不覺(jué)中偷偷地溜走了...
    棋葩的夢(mèng)閱讀 386評(píng)論 2 3