Cloud Native Active Directory — How Hard Could it be?

Aaron Gorka
engineering @ amaysim
7 min readJul 26, 2018

--

The way we architect applications has evolved over the years. Mainframes, commodity systems, virtual machines and ephemeral cloud resources: each require a paradigm shift to maximise cost efficiency. So, for organisations that use both AWS and Active Directory, you might think that it would be nice to have take a cloud-native approach to deploying Active Directory. After all, even applications that weren’t designed with a cloud-first mentality can take full advantage of the cloud with enough engineering.

In this post I’ll talk you through a number of different approaches I tried to achieve a cloud-native approach to deploying and managing Active Directory. I’ll show you, warts-and-all, what worked, what didn’t and ultimately where I landed (spoiler alert, it’s not quite where we hoped we would…).

The Requirements

Our network architecture is straightforward. We have some on-site Domain Controllers that connect to our VPCs (private network) via Direct Connect (dark fibre). We also use Amazon Connect, a “Cloud-Based Contact Centre”. Finally, we have a LOB application deployed to AWS which uses LDAP to authenticate users.

The requirements were as such:

  • Increase availability for workstation authentication with Active Directory
  • Enable SSO for Amazon Connect
  • Provide a highly-available LDAP interface for our LOB application

Building a Cloud-Native Active Directory Stack

With these in mind, we set off to start building our stack. By taking a cloud-native mindset, we could achieve the following benefits:

  • Zero or minimal-touch deployments and self-healing via automated configuration scripts. Deploying a new Domain Controller should not require any manual intervention or configuration.
  • Fault tolerance and high availability via Autoscaling Groups. If a Domain Controller dies in the middle of the night due to a fault in the underlying hardware, the Autoscaling Group will replace it with a new one. With multiple Domain Controllers deployed, a single EC2 instance failing will not cause any service disruption.
  • Infrastructure as Code via CloudFormation. This enables versioning and reproducibility on our infrastructure.

The Solution

Our initial solution looked something like this:

  • An Autoscaling Group that launched Windows EC2 instances
  • UserData that installed and configured Active Directory
  • A Lambda that updated a DNS record with the address of each EC2 instance

LDAP Integration Point

Creating and integration point for our LOB app was fairly simple, and using this example almost verbatim (after adding some Serverless configuration and 3 Muskteerising it) allowed us to have a DNS record that contained the IP addresses of all the Domain Controllers deployed in AWS. You can view all the code here: https://github.com/amaysim-au/devops-r53

Automating Active Directory Setup

Our first hurdle was achieving a zero-touch installation of Active Directory. The process is complex and is not normally automated from start to finish. One of the most challenging parts was figuring out how to reboot the server partway through the userdata. We had a few ideas such as using SSM to trigger various actions, using Lambda to remotely configure the instance via WinRM or Powershell Remoting or even the use of a full-blown configuration management system like Chef.

We decided on utilising a here-doc, Scheduled Tasks and some conditional logic to determine whether to trigger configuration on a reboot.

First, we join the server to the domain, retrieving credentials from AWS Parameter Store. At this point, we need to reboot before we can continue configuring the Active Directory role. However, before we do that, we write a second “stage” to disk at C:\script2.ps1, with the content of the script contained in a variable embedded in to the UserData:

$script2 = @'
# ... grab some variables ...
# Check if AD is already installed
$Service = Get-Service -Display $ServiceName -ErrorAction SilentlyContinue
If (-Not $Service) {
Install-WindowsFeature -Name AD-Domain-Services -IncludeManagementTools
Import-Module ADDSDeployment
Install-ADDSDomainController -NoGlobalCatalog:$false -CreateDnsDelegation:$false -Credential:$credentials -CriticalReplicationOnly:$false -DatabasePath "C:\Windows\NTDS" -DomainName "contoso.local" -InstallDns:$true -LogPath "C:\Windows\NTDS" -NoRebootOnCompletion:$false -ReplicationSourceDC:$replicationsource.Value -SiteName "AWS" -SysvolPath "C:\Windows\SYSVOL" -Force:$true -SafeModeAdministratorPassword:$dsrmcred
} Else {
$ServiceName + " is installed."
$ServiceName + "'s status is: " + $service.Status
}
'@
$script2 | Out-File -FilePath "C:\script2.ps1"

We then set this script to be run at boot time:

$script = "C:\script2.ps1"
$argument = "-WindowStyle Hidden -NonInteractive -Executionpolicy unrestricted -File $script"
$action = New-ScheduledTaskAction -Execute 'Powershell.exe' -Argument "$script"
$trigger = New-ScheduledTaskTrigger -AtStartup
$principal = New-ScheduledTaskPrincipal -UserID "System" -LogonType s4u -RunLevel Highest
Register-ScheduledTask -Action $action -Trigger $trigger -TaskName "AD" -Description "Scheduled task to install AD DC" -Principal $principal

The result is a two-stage installation which joins the domain, reboots and then configures Active Directory. On subsequent reboots, no action is taken.

The Connection between Amazon Connect and AD Connector

Active Directory Connector is an Amazon tool that is used to integrate your on-prem Active Directory server to your AWS account. It enables us to provide SSO for Amazon Connect. While setting up AD Connector, one of the parameters you’re required to provide is the IP addresses of your domain controllers for DNS resolution purposes.

In a cloud-native environment, IP addresses are highly ephemeral and can not be relied upon to stay the same forever. Ever tried to IP whitelist a cloud-based SaaS like New Relic?

After looking through the API documentation for AD Connector, I very quickly discovered that there is no way to update this list of DNS servers in-place once it has been configured.

Not all was lost, as Amazon do provide tools for applications that require static IP addresses, namely Elastic Network Interfaces and the ability to statically set private IP addresses directly on your EC2 instances. Statically setting IP private addresses directly is not possible when using Autoscaling Groups, so we set out to automate ENI attachment on Domain Controller creation.

ENI Attachment Automation

Automating this was quite simple, using Autoscaling Lifecycle Hooks and AWS Lambda. When launching a new instance, SNS sends a message and triggers a Lambda:

if sns_message['Event'] != "autoscaling:EC2_INSTANCE_LAUNCH":
return
instance_id = sns_message['EC2InstanceId']
instance_name = get_instance_name(instance_id)
interface_id = get_interface(eni_desc=eni_description)
attachment = attach_interface(interface_id, instance_id, device_index=1)

Shortly after launching an instance, an additional IP address shows up on the server. The IP addresses on the ENIs do not change, so AD Connector can happily use the same IP addresses! You can see the full code here:

https://github.com/amaysim-au/devops-eni

Active Directory and Multi-homing

Hurdle #2: Active Directory does not like multi-homing. It does not even like having multiple IP addresses. It causes issues with replication, syncing and ultimately is more trouble than it’s worth. The ENI automation we implemented adds an additional interface — the default one that comes with the EC2 instance is still present.

So, back to the drawing board. We tried a few workarounds:

  • Disabling the default interface from within Windows. This caused the instance healthchecks to fail and the instance was immediately terminated.
  • Using a Network Load Balancer to provide a static IP address for DNS requests. This did not work because NLB only supports TCP at this time, and AD Connector only makes DNS requests using UDP.
  • Changing the IP address of the instance after it had launched was not possible.
  • Creating new AD Connector each time a new instance was brought up

One solution we brainstormed was particularly creative. The idea was to create a subnet as small as possible (/28 for 16 addresses, minus 5 reserved AWS addresses), and then reserve all but one address using ENIs. That way, when terminating an instance and a new one created by the ASG, the IP address would be the same, even without having an ENI attached.

It was somewhere around this point that we realised that a completely cloud-native Active Directory was probably not going to be possible.

Right Tool for the Job

Designing a service that can be maintained is as important as one that is architecturally sound. Maintenance tasks should be accessible to everyone involved in ownership of a service. It’s important to balance complex automation and the ability to maintain something.

We were going to extreme measures to bend Active Directory in to a cloud native app. Anyone that wanted to make a minor change would find it very difficult to grok. Would we be better off with a design that was simple? With this in mind, we came up with a design that anyone could understand and use:

The solution consists of two CloudFormation stacks. One creates supporting resources and the other creates a standalone EC2 instance. The “instance template” takes an IP address as a parameter, and can be deployed an arbitrary number of times. That way, AD Connector can continue using the same IP addresses. The DNS record for the LDAP interface does not need to change.

Even if an instance disappears, creation is just a few clicks away in the CloudFormation console.

You can find the CloudFormation templates here:

https://github.com/amaysim-au/cloudformation-active-directory-extension

Summary

As you can probably see, this was a challenging exercise to go through. Although I didn’t quite get to the outcome I’d hoped for, I did learn a lot as part of the process (which I think is important in itself):

  • Active Directory has some limitations which prevent it from being deployed to AWS easily
  • AD Connector can not be updated once deployed so plan around having static IPs
  • When automating something, it’s worth considering whether your solution is elegant and easy to maintain, or if a simpler method will provide better results in the short term

Thanks to Brian Meyers for the Powershell/CloudFormation and Cormac Donnellan for expertise on Active Directory.

Has any of the above piqued your interest? Does amaysim sound like the sort of place where you think you could make an impact? Do you thrive in organisations where you are empowered to bring change and constant improvement? Why not take a few minutes to learn more about our open roles and opportunities and if you like what you see then say hi, we’d love to hear from you..

Shout-out to all the lawyers..

The views expressed on this blog post are mine alone and do not necessarily reflect the views of my employer, amaysim Australia Ltd.

--

--