prosperent brian
07-08-2011, 02:44 PM
Infrastructure is probably one of the most difficult topics to find information on, so hopefully I can share some of my findings and help someone else down the road with the same issues.
Right now, prosperent has 13 dual quad core xeon "application servers" sitting behind a software haproxy loadbalancer on a dedicated server, then backed with a large "image" storage array, and finally a large mysql database server.
Our application servers connect to the large image array via nfs mounts which serve our source code from a single point. This is great for deployment, but gives us a single point of failure (downtime of the image server node).
MySQL used to be another point of issue. We had a single MySQL server with a large raid array for redundancy. We currently have two large MySQL servers in a master/slave configuration so there are always two live copies of all of our data. On top of that, the master server is backed up every 3 hours and the slave hourly. This gives us expansion for reads, but we would have to shard the database to get more write capacity.
This is where cassandra comes in. 99 percent of our write load is analytics. These queries can be pushed to a cassandra cluster that can scale linearly for writes and also handle multi datacenter replication which will be important down the road.
NOW, the changes. The software load balancer (haproxy) that we use now works fine, but a single point of failure doesn't make me happy. It's easy enough to divert traffic to our second load balancer, but there are other issues deeper in the system, so the new infrastructure will keep a global layer 7 hardware load balancer in front of everything. These typically have 5 9's of uptime, so we're looking good already. Sitting behind the hardware load balancer will now be multiple complete clusters in multiple datacenters. Each arm of the cluster will have a haproxy dedicated load balancer at the very front taking requests from the hardware load balancer. The haproxy nodes can then monitor all of the servers under them and shift traffic to different datacenters/clusters based on network latency, and server failure. If an entire node fails, we know at a very high level and can reroute all traffic to the other clusters with no downtime.
Each of these clusters will be complete with application servers, an image server for images, and our nfs mounts, and of course a load balancer to make sure it is all up and running.
Behind the clusters are several large mysql servers in multi master configuration. While you can only write to one node at a time, you can run selects across all of them, and further, if our primary mysql node goes down, we can failover to one of the slaves in under 1 second.
The cassandra clusters will all be built per datacenter and configured to replicate data within the datacenter as well as to each other datacenter. Again, no single point of failure.
Now that we have this mapped out, we start the migration. Luckily, all of the big pieces are already in place. All that is left is setting up the second "image" server which is underway now, then we configure the second loadbalancer and switch dns over to our hardware load balancer. The rest is already in place and running, and the last bits can be completed with zero downtime.
What this means to you? Even if an entire arm of our cluster goes down, we still have capacity left in the other arms to serve up requests with minimal or zero downtime.
Alright, back to linux shell to make this all happen :)
Right now, prosperent has 13 dual quad core xeon "application servers" sitting behind a software haproxy loadbalancer on a dedicated server, then backed with a large "image" storage array, and finally a large mysql database server.
Our application servers connect to the large image array via nfs mounts which serve our source code from a single point. This is great for deployment, but gives us a single point of failure (downtime of the image server node).
MySQL used to be another point of issue. We had a single MySQL server with a large raid array for redundancy. We currently have two large MySQL servers in a master/slave configuration so there are always two live copies of all of our data. On top of that, the master server is backed up every 3 hours and the slave hourly. This gives us expansion for reads, but we would have to shard the database to get more write capacity.
This is where cassandra comes in. 99 percent of our write load is analytics. These queries can be pushed to a cassandra cluster that can scale linearly for writes and also handle multi datacenter replication which will be important down the road.
NOW, the changes. The software load balancer (haproxy) that we use now works fine, but a single point of failure doesn't make me happy. It's easy enough to divert traffic to our second load balancer, but there are other issues deeper in the system, so the new infrastructure will keep a global layer 7 hardware load balancer in front of everything. These typically have 5 9's of uptime, so we're looking good already. Sitting behind the hardware load balancer will now be multiple complete clusters in multiple datacenters. Each arm of the cluster will have a haproxy dedicated load balancer at the very front taking requests from the hardware load balancer. The haproxy nodes can then monitor all of the servers under them and shift traffic to different datacenters/clusters based on network latency, and server failure. If an entire node fails, we know at a very high level and can reroute all traffic to the other clusters with no downtime.
Each of these clusters will be complete with application servers, an image server for images, and our nfs mounts, and of course a load balancer to make sure it is all up and running.
Behind the clusters are several large mysql servers in multi master configuration. While you can only write to one node at a time, you can run selects across all of them, and further, if our primary mysql node goes down, we can failover to one of the slaves in under 1 second.
The cassandra clusters will all be built per datacenter and configured to replicate data within the datacenter as well as to each other datacenter. Again, no single point of failure.
Now that we have this mapped out, we start the migration. Luckily, all of the big pieces are already in place. All that is left is setting up the second "image" server which is underway now, then we configure the second loadbalancer and switch dns over to our hardware load balancer. The rest is already in place and running, and the last bits can be completed with zero downtime.
What this means to you? Even if an entire arm of our cluster goes down, we still have capacity left in the other arms to serve up requests with minimal or zero downtime.
Alright, back to linux shell to make this all happen :)