High Availability @ Load Balancing Layer-HAProxy / ELB

Architecting High Availability at the Load Balancing layer is one of the important aspects in the web scale systems in AWS. We can follow multiple strategies for achieving the same. I am listing some of the designs for achieving the same.

Pattern 1: Route 53 DNS RR + HAProxy

Route53 is a Managed DNS service provided by Amazon Web Services. Route 53 supports Round robin and weighted algorithms. If the Route53 DNS server has several entries for a given hostname, it will return all of them in a rotating order. This way, various users will see different addresses for the same name and will be able to reach different EC2 instances in LB Tier.

$ host -t a HAProxyTestXYZ.com
HAProxyTestXYZ.com. has address 50.19.82.183 (Primary EIP)

HAProxyTestXYZ.com. has address 23.23.174.254 (Secondary EIP)

1

Example, if we attach the Elastic IP’s of 2 HAProxy EC2 instances under the Route 53, both the IP’s are sent to the user browsers by the Route 53 DNS. In case, the algorithm configured is Round Robin at the Route 53 level, then browser- 1 will get EIP-1(50.19.82.183) of HAProxy-1 as the primary IP and browser -2 will get EIP-2 (23.23.174.254) of HAProxy-2 as the primary IP in rotation basis.  The browser- 1 will contact the HAProxy-1 and in case HAProxy-1 is not reachable it will contact the secondary EIP which is HAproxy-2 and so forth. This is an age old technique generally used by search engines, content servers (or) web scale systems for achieving scalability in LB layer. But this method does not provide any means of High availability @ LB layer. It requires additional measures to permanently check the HAProxy EC2 LB instances status and switch a failed EC2 instance EIP to another HAProxy EC2 LB. For this reason, this pattern is generally used as a complementary solution in High Availability, not as a primary one.  For achieving better stability at this layer in AWS, I usually recommend having 2 or more HAProxies distributed on multiple AZ’s inside the Amazon EC2 region. This way if one of the HAProxy is down, the website still functions with the help of other HAProxies and even if the entire Amazon EC2 AZ is down still the HAProxies in the other AZ can handle the requests and keep the website active. Some load tests have proven that HAProxy on m1.large EC2 instance can handle close to ~4500+ HTTP requests/second. So depending upon the number of concurrent requests/sec needed on your application you can go ahead and attach multiple HAProxy EC2 instances to the Route 53. Now that we achieved availability horizontally using the Route53 DNS Round Robin in HAProxy layer let us try to understand the intricacies behind this architecture.  Since we now have 2 or more HAProxies what will happen to the contextual web sessions data that resides in the application servers. HAProxies need to know in which application server the session data of the user resides else the requests will have authorization failures.
There are 2 architecture designs we can follow for solving this contextual problem they are:
Stateless Application design:  This is the recommended and widely used design. The web session data is separated out from the Web/App server memory and they are kept in common cache stores like MemCacheD, TerraCotta etc.Since the session data is now kept in a common store like MemCacheD, HAProxies can direct their requests to any of the web/app servers attached under it without knowing where the session state is mapped. Whenever any web/app server receives the request from any of the HAProxies, it will validate and authorize the session data from the common store.  In event of any HAProxy or Web/App EC2 failure still the website functions without problems because other HAProxies and Web/App servers are still able to handle the subsequent requests. Thus we achieve availability and scalability on the HAProxy/Load Balancing layer following this model.
Sticky Application design: Things are usually not ideal and the way we assume to be in real world. Some applications are still designed with stateful nature and they store the session data, cache data etc. in their web/app server memory.  We can always recommend the application teams to re-architect this model to stateless, but not always this suggestion works for short term migrations, inter dependencies etc.  So as architects we need to find way to live and cope up with this design and still try to achieve availability on the load balancing layer. HAProxy follows a technique called as “Cookie Learning” and “Cookie Insertion” to help state full applications. HAProxy can be configured to learn the application cookie (“JSESSIONID”), when HAProxy receives the user’s request, it checks if it contains this particular cookie and a known value. If this is not the case, it will direct the request to any Web/App EC2 server, according to the load balancing algorithm configured. HAProxy will then extract the cookie value from the response and add it along with the server’s identifier to a local table. When the user request comes back again, the load balancer sees the cookie, lookups the table and finds the Web/App EC2 server to which it forwards the request. Let me detail this important flow a little bit;
HAProxy-1 EC2 instance will receive client’s requests from the browser. If a request does not contain a cookie, it will be forwarded to a valid Web/App EC2 Instance Apache-A. In return, if a JESSIONID cookie is seen, the Web/App EC2 Instance name (Example “A”)will be prefixed into it, followed by a delimiter (‘~’) like “JSESSIONID=A~xxx”.When the browser client requests again with the cookie  JSESSIONID=A~xxx”, HAProxy-1 will know that it must be forwarded to Web/App Instance Apache-A. The EC2 Instance name ”A” will then be extracted from cookie before it is sent to the Web/App EC2 Instance Apache-A.
If Web/App EC2 Instance Apache-A dies, then requests will be sent to another valid server Web/App EC2 Instance Apache-B by LB and the cookie will be reassigned.
If HAProxy-1 itself dies, then requests will be sent to HAProxy-2 which will identify the Web/App EC2 instance to forward the request. This way even if the subsequent requests moves from HAProxy-1 to HAProxy-2 in event of HAProxy-1 failure, still the requests are sent to the same Web/App instance Apache-A by the cookie learning/insertion mechanism of HAProxy.
Sample HAProxy Settings to achieve this
listen webfarm 192.168.1.1:80
       mode http
       balance roundrobin
       cookie JSESSIONID prefix
       option httpclose
       option forwardfor
       option httpchk HEAD /index.html HTTP/1.0
       server Apache-A 192.168.1.11:80 cookie A check
       server Apache-B 192.168.1.12:80 cookie B check

Note: You can use more sophisticated DNS services like UltraDNS , DNSMadeEasy etc also in this architecture to better control the Load balancing and traffic direction at the DNS level.

Pattern 2: Route 53 DNS RR + HAProxy in Active-Passive mode

This is an extension of the Route53 DNS RR pattern and everything discussed in the previous pattern still applies to this context. In addition to associating HAProxies horizontally under Route53, we will build availability for every HAProxy vertically as well in this pattern. High Availability is built taking into consideration HAProxy process failure and HAProxy EC2 instance failure.
2 or more HAProxies from multiple AZ’s are taken and they are attached with Amazon Elastic IP’s. These Elastic IP’s are then associated in Route 53 with DNS RR. These HAProxies are now “Active” and are ready to handle the user requests. For HA, another equivalent set of HAProxies are launched in the respective AZ’s as “Standby”. In event of the “Active” HAProxy failure, the Standby HAProxy remaps to the same Amazon Elastic IP takes over the subsequent requests from the client.
2
In the above diagram, there are 2 HAProxies in “Active” state with Elastic IP’s 50.19.82.183 & 23.23.174.254. They are deployed across Multiple Availability Zones inside an Amazon EC2 region. Another 2 HAProxies are launched in respective AZ’s, but they are kept idle in “Standby” state. In event of HAProxy-1 (EIP: 50.19.82.183) failure the Elastic IP is remapped to Standby HAProxy-3 in the same AZ. The remapping takes ~60 seconds and the HAPorxy-3 will be handling the subsequent requests directed by the browsers to the 50.19.82.183 IP.
Broadly there are 2 levels of failure in this pattern;
Failure @ HAProxy Process level
Failure @ HAProxy EC2 instance level
3
 Failure @HAProxy Process level:  When HAProxy Process at the “Active” server fails; we can detect this using KeepAliveD and switch the Elastic IP from Active -> Standby. We have observed it takes ~60-120 seconds for the standby to takeover. During this time the particular HAProxy alone will be unreachable. KeepAliveD script is configured in both the Active and standby HAProxy EC2 instance. KeepAliveD implements a set of checkers to dynamically and adaptively maintain and manage load balanced server pool according to their health. High availability is achieved by Virtual Router Redundancy Protocol VRRP protocol of the KeepAliveD. Since Amazon EC2 currently does not support Multicast protocol we need to configure KeepAliveD with Unicast TCP in this scenario.  For more details refer http://www.keepalived.org/. Mean time manually we can bring the failed HAProxy Process up and make this as the new standby.
Script Name: “/etc/keepalived/keepalived.conf”
vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
script “killall -0 haproxy”     # cheaper than pidof
interval 20                      # check every 2 seconds
weight 20                        # add 2 points of prio if OK
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
ipriority 101                    # 101 on master, 100 on backup
vrrp_unicast_bind 10.215.31.4
      #internal IP address of EC2 instance 01
vrrp_unicast_peer 10.85.110.252
  #internal IP address of EC2 instance 02
notify_master “/etc/keepalived/vrrp.sh”
track_script {
chk_haproxy weight 20
}
 
}
Script Name: /etc/keepalived/vrrp.sh
#vrrp.sh
#!/bin/bash
cd /opt/aws/apitools/ec2/bin
#DisAssociate EIP from this instance.
./ec2-disassociate-address –aws-access-key XXXXXXX –aws-secret-key XXXXXXX  [EIP]
#Mapping EIP to secondary server
./ec2-associate-address –aws-access-key XXXXXXX  –aws-secret-key XXXXXXX  [EIP] -i [ec2_instance_id_of_primary_or_secondary]
Failure @ HAProxy EC2 instance level: When the Active HAProxy EC2 instance itself fails; we can detect this using Heartbeat and switch the Elastic IP from Active -> Standby. Heartbeat tool connects two servers and checks the regular “pulse” or “heartbeat” between them. The standby server takes over the work of the “Active” as soon as it detects an alteration in the “heartbeat” of the former. We have observed it takes ~120+ seconds for the standby to takeover. During this time the particular HAProxy alone will be unreachable. Heartbeat has to be configured in both the Active and standby HAProxy EC2 instance. Since Amazon EC2 currently does not support Multicast protocol we need to configure Heartbeat with Unicast UDP in this scenario. Mean time manually we can bring the failed HAProxy EC2 instance up and make this as the new standby.
Script Name: /etc/ha.d/ha.cf
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
udpport 694
ucast eth0 xx.xxx.xxx.xxa #Internal IP of EC2 instance 01
ucast eth0 xx.xxx.xxx.xxb #Internal IP of EC2 instance 02
auto_failback off
Script Name: Create a script named “elastic_ip” in both the servers.
#!/bin/bash
I_ID=”[ec2_instance_id” # different for each EC2 servers.
ELASTIC_IP=”X.X.X.X”
case $1 in
    start)
ec2-associate-address –aws-access-key XXXXX –aws-secret-key XXXXX “$ELASTIC_IP” -i “$I_ID” > /dev/null
       echo $0 started
       ;;
    stop)
ec2-disassociate-address –aws-access-key XXXXX –aws-secret-key XXXXX “$ELASTIC_IP” > /dev/null
    echo $0 stopped
       ;;
    status)
ec2-describe-addresses –aws-access-key XXXXX –aws-secret-key XXXXX | grep “$ELASTIC_IP” | grep “$I_ID” > /dev/null
    # grep will return true if this ip is mapped to this instance
    [ $? -eq 0 ] && echo $0 OK || echo $0 FAIL
    ;;
 
esac
Why do we need this redundancy in the HAProxy layer?
Not all the times the DNS RR with LB Cookie Insertion alone is enough for ensuring availability;
Case 1: Imagine you have not automated the scalability @ Load Balancing Layer and one of your Load balancer is down. You do not want to be waked up in the middle of the night rather it is better to have a standby Load Balancer automatically replacing the failed one. Manually you can replace the faulty LB next day.
Case 2: You have a gaming site where long running TCP sockets are established from flash gaming clients to the LB layer. You have planned the capacity of Front end Load Balancers with concurrent connections/sec. Now couple of your Load balancers are down, the new connections will be established to other running LB, but overall your site will now start performing poorly and chances are new connections are exhausted after few hours of heavy traffic. It is better to automatically detect and replace the faulty LB EC2 instance with the standby.
Case 3: Some clients cache the IP address of the Load Balancer, Some of them have long running sticky sessions with web/app, Some hardware devices can take only IP address to push data into the Server infrastructure. Though it is suggested to resolve the IP using DNS, still in reality some use cases does not work the same way.
Pattern 3: Use ELB
Do not worry about all the above patterns, just go and configure Amazon Elastic Load Balancing (ELB). For most of the use cases ELB is more than sufficient.
Amazon Elastic Load Balancer can distribute incoming traffic across your Amazon EC2 instances in a single Availability Zone or multiple Availability Zones. Amazon Elastic Load Balancing automatically scales its request handling capacity in response to incoming application traffic. It can handle 20k+ concurrent requests/sec with ease. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. Elastic Load Balancing detects unhealthy instances within a pool and automatically reroutes traffic to healthy instances until the unhealthy instances have been restored. Any faulty Load balancers in the ELB tier are automatically replaced.
Though for most of the common use cases ELB is more than sufficient in AWS. There are some unique cases which demand the use of Load balancers like HAProxy, Nginx and NetScaler in our architecture in the AWS infrastructure.
Advertisements

Setting up keepalived-Load balancing using HAProxy Part 2

In our previous post we have set up a HAProxy loadbalancer to balance the load of our web application between three webservers, here’s the diagram of the situation we have ended up with:

3

As we already concluded in the last post, there’s still a single point of failure in this setup. If the loadbalancer dies for some reason the whole site will be offline. In this post we will add a second loadbalancer and setup a virtual IP address shared between the loadbalancers. The setup will look like this:

5

So our setup now is:
– Three webservers, WebServer1 (192.168.0.1), WebServer2 (192.168.0.2 ), and WebServer3 (192.168.0.3) each serving the application
– The first load balancer (loadb01, ip: (192.168.0.100 ))
– The second load balancer (loadb02, ip: (192.168.0.101 )), configure this in the same way as we configured the first one.

To setup the virtual IP address we will use keepalived

1
loadb01$ sudo apt-get install keepalived

Good, keepalived is now installed. Before we proceed with configuring keepalived itself, edit the following file:

1
loadb01$ sudo vi /etc/sysctl.conf

And add this line to the end of the file:

1
net.ipv4.ip_nonlocal_bind=1

This option is needed for applications (haproxy in this case) to be able to bind to non-local addresses (ip adresses which do not belong to an interface on the machine). To apply the setting, run the following command:

1
loadb01$ sudo sysctl -p

Now let’s add the configuration for keepalived, open the file:

1
loadb01$ sudo vi /etc/keepalived/keepalived.conf

And add the following contents

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Settings for notifications
global_defs {
    notification_email {
        your@emailaddress.com     # Email address for notifications
    }
    notification_email_from loadb01@domain.ext  # The from address for the notifications
    smtp_server 127.0.0.1     # You can specifiy your own smtp server here
    smtp_connect_timeout 15
}
  
# Define the script used to check if haproxy is still working
vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
    weight 2
}
  
# Configuation for the virtual interface
vrrp_instance VI_1 {
    interface eth0
    state MASTER        # set this to BACKUP on the other machine
    priority 101        # set this to 100 on the other machine
    virtual_router_id 51
  
    smtp_alert          # Activate email notifications
  
    authentication {
        auth_type AH
        auth_pass myPassw0rd      # Set this to some secret phrase
    }
  
    # The virtual ip address shared between the two loadbalancers
    virtual_ipaddress {
        192.168.0.200
    }
    
    # Use the script above to check if we should fail over
    track_script {
        chk_haproxy
    }
}

And start keepalived:

1
loadb01$ /etc/init.d/keepalived start

Now the next step is to install and configure keepalived on our second loadbalancer aswell, redo the steps starting from apt-get install keepalived. In the configuration step for keepalived, be sure change these two settings:

1
2
state MASTER        # set this to BACKUP on the other machine
priority 101        # set this to 100 on the other machine

To:

1
2
state BACKUP     
priority 100     

That’s it! We have now configured a virtual IP shared between our two loadbalancers, you can try loading the haproxy statistic page on the virtual IP adddress and should get the statistics for loadb01, then switch off loadb01 and refresh, the virtual IP address will now be assigned to the second loadbalancer and you should see the statistics page for that.

If there’s anything else you’d like us to cover, or if you have any questions please leave a comment!

 

 

High availability load balancing using HAProxy PART-1

HAProxy is a free, very fast and reliable solution offering high availability, load balancing and proxying for TCP and HTTP-based applications. It is particularly suited for web sites crawling under very high loads while needing persistence or Layer7 processing. Supporting tens of thousands of connections is clearly realistic with todays hardware. Its mode of operation makes its integration into existing architectures very easy and riskless, while still offering the possibility not to expose fragile web servers to the Net,such as below:

haproxy-pmode

In this post I will show you how to easily setup load balancing for your web application. Imagine you currently have your application on one webserver called Webserver1.

1But traffic has grown and you’d like to increase your site’s capacity by adding more webservers (WebServer2 and WebServer3), aswell as eliminate the single point of failure in your current setup (if web01 has an outage the site will be offline).

2

In order to spread traffic evenly over your three web servers, we could install an extra server to proxy all the traffic an balance it over the webservers. In this post we will use HAProxy, an open source TCP/HTTP load balancer. (see: http://haproxy.1wt.eu/) to do that:

3

So our setup now is:
– Three webservers, WebServer1 (192.168.0.1), WebServer2 (192.168.0.2 ), and WebServer3 (192.168.0.3) each serving the application
– A new server (LoadBalancer-1, ip: (192.168.0.100 )) with Ubuntu installed.

Allright, now let’s get to work:

Start by installing haproxy on your loadbalancing machine:

LoadBalancer01$ sudo apt-get install haproxy
Now let’s backup the original haproxy configuration file and create a new one with our config which will tell haproxy to listen for incoming http requests on port 80 and balance them between the three webservers:
loadb01$ sudo mv /etc/haproxy/haproxy.cfg /etc/haproxy/backup_haproxy.cfg
loadb01$ sudo vi /etc/haproxy/haproxy.cfg
Paste the following configuration there:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
global
        maxconn 4096
        user haproxy
        group haproxy
        daemon
 
defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option  redispatch
        maxconn 2000
        contimeout      5000
        clitimeout      50000
        srvtimeout      50000
 
listen webcluster *:80
        mode    http
        stats   enable
        stats   auth us3r:passw0rd
        balance roundrobin
        option httpchk HEAD / HTTP/1.0
        option forwardfor
        cookie LSW_WEB insert
        option httpclose
        server web01 192.168.0.1:80 cookie LSW_WEB01 check
        server web02 192.168.0.2:80 cookie LSW_WEB02 check
        server web03 192.168.0.3:80 cookie LSW_WEB03 check
Enable HAproxy by editing the /etc/default/haproxy file
loadb01$ sudo nano /etc/default/haproxy
and setting ENABLED to 1
1
2
3
4
# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
#EXTRAOPTS="-de -m 16"

Then, start HAProxy:

1
loadb01$ sudo /etc/init.d/haproxy start

Now open your webbrowser and browse to http://192.168.0.100/ (or whatever IP you have set for loadb01), you should be served a file from one of the webservers! The loadbalancing is now working, but let’s take a closer look at some of the things we configured in the HAProxy configuration:

1
listen webcluster *:80

Listen for incoming connections on all interfaces, port 80 (the * can also be replaced with a single ip address)

1
2
stats   enable
stats   auth us3r:passw0rd

This enables HAProxy’s statistics interface which you can access by browsing to http://192.168.0.100/haproxy?stats login with the username and password given and you should see a nice statistics report like this:

4

The first line in this block enables the use of cookies, basically, when a user reaches the webcluster group, the cookie LSW_WEB will be created and the server id (LSW_WEB01, LSW_WEB02, LSW_WEB03) will be stored in it. For all next requests in the same session, HAProxy will look at the cookie and redirect that user to the same webserver (unless it’s down).

The last three lines define the backend webservers which HAProxy will use, you can easily add more lines here as the infrastructure grows.

Allright the loadbalancing is working and we are almost there, just one thing left to do in this article and that’s fixing your webserver logs on the web01/web02/web03 servers. Since requests now changed from:

1
user --> webserver

To:

1
user --> HAProxy --> webserver

You will see the loadbalancer’s ip in the access log on your webservers. In order to fix this when you are using Apache webserver open your /etc/apache2/apache2.conf file and replace this line:

1
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

By

1
2
#LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Then restart/reload apache and the logging should be fixed, it will now include the IP address which is send in the X-Forwarded-For header (This header contains a value representing the client’s IP address.) that HAProxy includes in all requests to the backend webserver. We enabled that earlier by setting the

1
option forwardfor

option in the HAPRoxy configuration.

If there’s anything else you’d like to cover, or if you have any questions please leave a comment!