Upgrading a Cloud Infrastructure With Terraform and ANsible
Upgrading and expanding instances and volumes with Terraform and Ansible in a OpenStack Cloud environment
Upgrading a running cloud infrastructure is a critical task that have to planned carefully in advance. Before choosing which strategy to follow for upgrading, we have to asks ourselves some questions:
- Is okay for us to have some downtime in our service? Can we stop our services and for how long?
- Is there any data compromised which need to be saved and restored?
- Do we really need to updrage our infrastructure?
- Is there any security risk in doing so?
- Do we have the necessary resources?
In this article we will deal with the scenario where we can have some downtime and there is data that need to be saved and restored.
In our scenario we start with 11 instances running in a OpenStack cloud provider, with the following configuration:
- 3 x m1.medium instances (1 VCPUs + 4GB RAM)
- 6 x m1.large instances (2 VCPUs + 8GB RAM)
- 2 x m1.xlarge instances (4 VCPUs + 16GB RAM)
- 6 x 80GB disk
We have a Docker Swarm platform running in all instances with services to run real-time processes, 3 Apache Cassandra nodes, 3 Blazegraph nodes, 3 Zookeeper nodes, 3 Kafka nodes, a Mongo DB and other services.
We want to upgrade our infrastructure to:
- 4 x m1.medium instances (1 VCPUs + 4GB RAM)
- 9 x m1.large instances (2 VCPUs + 8GB RAM)
- 4 x m1.xlarge instances (4 VCPUs + 16GB RAM)
- 3 x 3TB disk
- 1 x 11TB disk
The Apache Cassandra and Blazegraph data need to be migrated to the new disks and all services will have to start running again with the minimun downtime possible.
To do so, we will make use of Terraform, to modify the previous configuration with the need requirments and create volume disks snaptshots for migrating our data. Then, with Ansible we will mount the new volumes with the migrated data and install the Docker Swarm platform again.
Creating voume disks snaptshots
First, we need to backup our data. To do so, we can either use an external service such as Google Cloud Storage or use the OpenStack’s volume snaptshots. In any case, my advice is to stop the services running in the instances where you have the data to avoid any data corruption during the process.
In our cause, we are going to create volume snapshots with OpenStack and rememeber the snapshot ID, which will be needed leter with Terraform configuration.
As our platform is using Swarm to orquestrate docker containers, we don’t have an option to stop a running container as it is with docker-compose or stand-alone containers. Thus, we will have to remove the running service to stop it:
docker service rm <service-name>
The removing process of a service may take sometime, thus it is important to verify that the manager have removed the service:
docker service inspect <service-name>
As well as, to verify that the service was removed from the node where it was running:
docker ps
If the service does not appear, then we can be sure that we successfuly removed the service and we can process to create a spanshot.
Once the service has been removed, we can proceed to detach the volume from the instance and create the snapshot. This process can be done manually from the OpenStack Dashboard or CLI. When detaching the volume from the instance, the status of the volum should appear as Available instead of In-use. The snapshot cam be created with attached volumes too, however doing a snapshot with a detached volume is safer and recomended for our purpose.
Upadating Terraform files
The Terraform files configuration should be updated to reflect our desired newer configuration.
The flavors type and the amount of instances for each type can be updated like this, where we map the type of instance to their characteristics and numbers.
flavor_name = {
"manager" = "m1.medium",
"cassandra" = "m1.large",
"blazegraph" = "m1.xlarge",
"worker" = "m1.large",
"hpc-worker" = "m1.xlarge",
"mongo" = "m1.medium",
"zookafka" = "m1.large"
}
instance_count = {
"manager" = 3,
"cassandra" = 3,
"blazegraph" = 1,
"worker" = 3,
"hpc-worker" = 3,
"mongo" = 1,
"zookafka" = 3
}
The image used in the instances can be updated to a newer version. It is important to check the image version since the images can be outdated or unavailable. To look for updated and available images:
openstack image list --status active
To update the terraform file with the desired image, use the Image ID instead of the image name to ensure that you always use the same image:
image_id = {
"Ubuntu20.04LTS" = "7085d64d-f591-4a23-bdfe-dbbd1288afcf"
}
To create new instances, we have to define new resources. In that case, we are creating a new hpc-worker instance using the previous maped values. The count
key is used to create multple instances according to our previous declared variable instance_count
, and the same apply for the other variables mapings (var.
). The count.index
is used to extract the index of each instance [0,1,2 … n) and generate a name like hpc-worker-0 for the first instance.
resource "openstack_compute_instance_v2" "hpc-worker" {
count = var.instance_count["hpc-worker"]
name = "${var.node_name["hpc-worker"]}-${count.index}"
image_id = var.image_id[var.image_name["image"]]
flavor_name = var.flavor_name["hpc-worker"]
key_pair = var.key_pub
security_groups = var.security_group
network {
name = var.network
}
metadata = {
ssh_user = var.role_ssh_user["hpc-worker"],
prefer_ipv6 = false,
my_server_role = var.node_name["hpc-worker"],
python_bin = "/usr/bin/python3"
}
}
To create a 3TB volume using the previous volume snapshot, so the data will be migrated to the new volume. Thus, we need to indicate the snapshot_id
which we want to use. If the snpashot is smaller than the new volume, we will need to expand the volume later, otherwise our instance will not use the full volume capacity. To attach the volume to the new instance, we need to provide the instance_id
where we want to attach the volume and the volum_id
to attach.
resource "openstack_blockstorage_volume_v3" "volume_cassandra_1" {
name = "${var.volume_name}-cassandra-1"
size = 3000
snapshot_id = "f2714817-f9f8-42f3-aa7c-363d7b887983"
}
resource "openstack_compute_volume_attach_v2" "attach_cassandra_volume_0_to_db_instances" {
instance_id = openstack_compute_instance_v2.cassandra[1].id
volume_id = openstack_blockstorage_volume_v3.volume_cassandra_1.id
}
Upgrading our infrastrcutre with Terraform
Once we have defined our desired infrastructure, we can start with the deployment:
terraform plan
terraform apply
Mounting, expanding disks and ploying Docker Swarm with Ansible
- mkdir -p /mnt/data
- sudo mount /dev/sdb /mnt/data
- xfs_growfs /dev/sdb