Tag: azure

  • Multi-Cloud Infrastructure as Code?

    I’m going to do an uncomfortable thing today: I was thinking about a problem, and I’m just going to share my thoughts before I research it. Then, I’ll actually do the research and refine things a bit. The goal is to show the thought process and learning.


    Courtesy: Wikimedia Commons

    One of the main selling points of HashiCorp’s Terraform is that it can be used for multi-cloud deployments. The benefits of this type of deployment are significant:

    • If one provider has an outage, you can simply use your infrastructure in a different provider.
    • You’re far more resistant to vendor lock-in. If a provider raises its prices, you aren’t stuck there.

    The problem of vendor lock-in is huge. Wherever I’ve worked, there’s always this pervasive background question: “Well, what if we wanted to go with Google instead?” And the answers have been unsatisfying. Realistically, the answer is sort of: “Well, we could start over and re-design all this infrastructure for the new platform.”

    If you look at production Terraform, it’s going to be full of resources such as aws_s3_bucket, which is definitively tied to one specific cloud provider.

    So how can you have Infrastructure as Code (IaC) for multi-cloud deployments, when all your resources are tied to a specific provider?

    One solution (and the one that HashiCorp probably recommends) would be to abstract your infrastructure into generic modules that implement your intentions in each of the cloud providers’ specific resources.

    The user would specify “I want a storage container that is readable to the public and highly available.” The module would then be able to create such a container in AWS, Azure, GCP, or wherever you needed to deploy it.

    So you’d have a module that looked maybe something like this:

    # Module "multicloud_storage"

    variable "cloud_provider" {
    type = "string"
    }

    resource "aws_s3_bucket" "main_bucket" {
    count = var.cloud_provider == "aws" ? 1 : 0
    ...
    }

    resource "azurerm_storage_blob" "main_bucket" {
    count = var.cloud_provider == "azure" ? 1 : 0
    ...
    }

    Disclaimer: Please don’t use this code. It’s exactly as untested as it looks.

    Note that awkward count field on every block. I think you could probably make such a generic module work, but you’d have to implement the thing in every provider that you wanted to support.

    But the configurations for the different providers’ storage systems don’t match up one-to-one. Take, for example, the access tier of your storage: how available the objects are and how quickly they can be accessed. AWS S3 has at least nine, plus an Intelligent-Tiering option, whereas Azure uses hot, cool, cold, and archive. In our hypothetical multi-cloud module, we probably want to abstract this away from the user. We might do something like this:

    ModuleAzure BlobAWS S3
    LavaHotExpress Zone One
    HotHotS3 Standard
    CoolCoolS3 Standard-IA
    ColdColdGlacier Instant Retrieval
    GlacierArchiveGlacier Flexible Retrieval
    Micro-KelvinArchiveGlacier Deep Archive

    This would allow us to offer the available storage classes in both providers, but the actual storage tier chosen is a little obfuscated from the user.

    But what about features that exist in one provider and not another? For example, S3 offers Transfer Acceleration to speed up transfers to and from the bucket, whereas Azure’s Blob seems to rely mainly on parallel uploads for performance.

    Then we get to whole types of resources that exist in one provider but not another. Leaky abstractions. The juice-squeeze ratio of maintaining all of these implementations for lesser-used resource types or highly specific ones like QuickSight.

    I’m about to end the rambling, self-reflecting portion of this post and do some actual research. I hope that someone has created modules like this that allow the same infrastructure to work for multi-cloud deployments. My intuition is that it’s too unwieldy.

    Here I go into the Internet. Fingers crossed!


    Hey, it’s me. I’m back, fifteen minutes later.

    I didn’t find a ton. There are a smattering of tools that claim to handle this.

    For example, Pulumi, an Infrastructure as Code tool and alternative to Terraform, says that it handles multi-cloud deployments natively. I’d be interested in learning more.

    I found several articles offering a guide to multi-cloud Terraform modules. I did not, however, find any well-maintained modules themselves.

    The void feels a little weird to me: there’s obviously a great need for this sort of module. It’s the sort of problem that the open source community has traditionally been good at solving. Like I said before, my intuition is that this is a very difficult (expensive) problem, so maybe the cost just outweighs the demand?

    One Stack Overflow post mentioned that one of the reasons people don’t share Terraform in open source is that it makes it easy to find vulnerabilities in your infrastructure. (But isn’t that supposed to be a strength of open source: to crowdsource the identification and correction of these vulnerabilities?) Anyway, extrapolating a bit: this reluctance to share infrastructure might also also be a huge barrier to making such a multi-cloud module.

    If I were going to implement something professionally, I’d do a lot more than fifteen minutes of research. But, gentle reader, it looks bleak out there. Let me know if there’s anything good out there that I missed.

  • Easier Cloud-to-Cloud Migrations?

    Cloud with a lock. Courtesy of Wikimedia Commons.

    An Empty (Theoretical) Promise

    It’s long been a promise of Infrastructure as Code tools like Terraform that you could theoretically create platform-independent IaC and deploy freely into any cloud environment. I doubt anyone ever really meant that literally, but the reality is that your cloud infrastructure is inevitably going to be tied quite closely to your provider. If you’re using an aws_vpc resource, it’s pretty unlikely that you could easily turn that into its equivalent in another provider.

    And yet, several of the organizations I’ve worked with have been reluctant to tie themselves closely with one cloud provider or another. The business reality is that the vendor lock-in is a huge source of anxiety: if AWS suddenly and drastically raised their prices, or if they for some reason became unavailable, lots and lots of businesses would be in a big pickle!

    The amount of work required to manually transfer an existing system from one provider to another would be nearly as much as creating the system in the first place.

    GenAI as the Solution?

    I ran across this article about StackGen’s Cloud Migration product. The article isn’t long, so go read it.

    Instead of requiring DevOps teams to map out resources manually, the system uses read-only API access to scan existing cloud environments. It automatically identifies resources, maps dependencies, and – perhaps most importantly – maintains security policies during the transition.

    StackGen isn’t new to using generative AI for infrastructure problems, but they have an interesting approach here:

    1. Use read-only APIs to identify resources, including those not already in IaC.
    2. Use generative AI to map those resources, including security policies, compliance policies, and resource dependencies.
    3. Convert those mapped resources into deployment-ready IaC for the destination environment.

    Using a process like this to migrate from provider to provider is interesting, but the one use case that really gets me thinking is the ability to deploy into a multi-cloud environment.

    I’ll be keeping my eyes on this one.