Restoring failed Active Directory Domain Controllers
Just when you think everything is going well, disaster happens, and one or more of your domain controllers gets offline. This can happen due to a hard disk crash, a bad network card, file system corruption or corruption of the Active Directory database. Even if you have good backups somewhere on the shelf that’s only 50% of the job done, because as you’ll see later in the article, the way you recover a domain controller is way different from recovering file servers or application servers. Backing up virtual disks (for VMs) or using disk image software like Norton Ghost are specifically not supported for domain controller backups. Even if you might be tempted recovering your failed domain controller(s) using one of this methods can have catastrophic results on the consistency of the directory as a whole like SID rollback, lingering objects and USN rollback. Also, it is not supported by Microsoft.
There are two options to choose from when restoring your domain controller(s): restore from replication or restore from backup.
You can use this method only if you have multiple domain controllers in your environment, so the restored domain controller(s) can replicate from the existing ones. The recovery is done by promoting a newly installed machine and allowing replication to copy all of the data to the DC. Bear in mind, even tough this is an easy going process, in large environments with thousands of users it will create a lot of replication traffic, and this is something to take into consideration if the traffic is across a WAN link. As a last thing, before you promote the freshly installed server, the remnants of the old domain controller must be removed from Active Directory.
To do that, open Active Directory Users and Computers, locate your failed domain controller and deleted the computer object from the Domain Controllers container. The metadata cleanup steps will be performed automatically if your domain controllers are running Windows Server 2008 or higher. Under Windows Server 2003, this is a three-step process, which I’m not going to discuss it here.
You will get a big warning message after hitting the Yes button (a little different in 2008 server):
You are attempting to delete a Domain Controller without running the removal wizard. To properly remove the Domain Controller from the domain, you should run the Remove Roles and Features Wizard in Server Manager, or the Active Directory Domain Services Installation Wizard (DCPromo) for Windows Server 2008 r2 or earlier.
If you are sure this domain controller is permanently offline and you will never restore it from a backup, check the box Delete this Domain Controller anyway. It is permanently offline and can no longer be removed using the removal wizard then click Delete. If the domain controller was also a Global Catalog (GC), you will get another warning message. Click Yes on that message to delete the object from AD.
The last clean-up step is to remove the computer account from Active Directory Sites and Services. Locate the domain controller in the console, right-click it and choose Delete then Yes to confirm.
If you get the bellow error message is because the domain controller still has some objects representing it. Usually this happens if you open the Active Directory Sites and Services console -to delete the domain controller- just after a couple of minutes after you removed the object from AD. So you either wait for the replication to kick in and delete the representing objects automatically or you delete them manually, then try to remove the server.
It’s easy. Just right-click the object, choose Delete then confirm the action.
After the “clean-up process” wait a few hours or a even a day in large environments for the replication to do its magic in the forest then go ahead and rebuild the domain controller. As a first step reinstall the operating system and any other applications you support on your domain controllers, then promote the server as an additional domain controller in your domain, and then configure the necessary roles the failed domain controller had, like GC, DNS, FSMO roles.
And that’s it, your domain controller should be back on-line just like it was before. Now let’s look at the second restore option.
Restoring a failed domain controller using this method has two approaches known as nonauthoritative restore and authoritative restore.
Nonauthoritative restore does not require you to remove any objects from Active Directory. You simply restore the failed domain controller from backup and let it replicate to make it current; it’s AD database gets overwritten with any changes that occurred after the backup was taken. For branch offices this might generate some traffic, but it all depends on how many changes were made in the forest/domain since your last backup.
To restore a failed domain controller using this method, first, reinstall the operating system and any other applications you support on your domain controllers then go ahead and restore from backup.
[warning]Do not in any way delete the computer object from Active Directory or Active Directory Sites and Services because the domain controller will not function correctly after restore. Leave the server as a standalone computer (WORKGROUP) and restore from backup this way; you can’t even join it to the domain anyway using the old name.[/warning]
Depending on what backup product you are using, restore the system state onto the machine. For those that have System Center Data Protection Manager, go to the Recovery section, select the domain controller and the recovery time then choose Recover.
Choose to copy the backup to a network folder and click Next.
Select a destination share where you want to put the backup and continue the wizard using the default settings.
Click Recover to begin the recovery process. Depending on how big your system state is, it can take some time.
DPM does not restore the domain controller, it only exports the backup, and now we need to use that backup to restore the system state of the domain controller. Go to the target server and reboot in the so called Directory Service Repair Mode using the System Configuration utility. Open Run > msconfig. Go to the Boot tab, check the Safe boot check-box and select Active Directory repair. Choose Restart when prompted.
Server should now be in Directory Service Repair Mode.
Open the command prompt and use the bellow syntax to get the backup version identifier. This is needed in order to know what version of the backup to use in the restore process (if there is more than one).
wbadmin get versions –backuptarget: <Server\Share>
Replace Server\Share with the path where your system state backup for the domain controller resides.
Since we are interested in recovering the system state, use the following command:
wbadmin start systemstaterecovery –version:<version identifier> -backuptarget:\\<Server>\<ShareName>
If you are doing this over the network make sure you are doing it when there is less activity or after work time. Another option is to copy the backup folder locally on the server and run the restore from there.
The restore process might take a while and all this depends on the size of your Active Directory database.
At the end you get a message that you need to reboot the server in order for the restore process to finish. Before doing this, don’t forget to set the normal boot back. Open the System Configuration utility again and un-check the Safe boot check-box.
Now you can go ahead and press Y to reboot. If you need to review any logs about the restore process you can find them in the specified path in the window (C:\Windows\Logs).
The first time you log in you get a message that system state recovery has successfully completed. Press Enter to close the message window.
In a few minutes the replication process will start and the domain controller AD database will get updated with the latest changes. If the failed server had any FSMO roles or was a GC, you can configure the new server to have these roles.
So far I’ve talked about restoring a domain controller and performing a nonauthoritative restore. This was easy stuff, with little impact on the infrastructure, since all we simply wanted is to get the domain controller back up and running; but there are situations where you may need to restore data in Active Directory. This is done using an Authoritative Restore and you use it in situations like:
– Corruption of objects or the entire directory
– Accidental deletion of an entire subtree
– Accidental deletion of important objects
– Reversing certain object additions or modifications
To give you an example imagine this situation: You have your AD forest/domain. Backups run at night and saves the system state of your domain controllers but in the morning some junior admin deletes an entire OU containing a large number of users, or the AD database just gets corrupted. If you restore from backup like we did before (nonauthoritative restore) you will have the OU or the AD database back… for a few minutes, until replication kicks-off and overwrites everything you restored. If you are lucky enough you can use the Active Directory Recycle Bin, but let’s say this is not the case because you can’t upgrade your forest to support it or something. When you do an authoritative restore the object version number is increased (with 100000 by default, but can be changed) so that it is higher than the existing version number. After reboot the other domain controllers will see that they have a smaller version number than the restored domain controller and will replicate from it. Having a smaller version number tells those domain controllers that they have an outdated AD database.
In all of these scenarios, you can do a partial authoritative restore to reverse the changes if things like OUs, objects and some small stuff were deleted from AD. If the entire directory gets corrupted, you’ll need to do a complete authoritative restore which I’m going to discuss next. I will touch the partial authoritative restore in a future article. If you have to perform a complete authoritative restore, the assumption is that something catastrophic has happened on a domain controller that caused some form of global, irreparable Active Directory corruption. Restoring the entire Active Directory database is similar in concept to restoring individual objects, except you are restoring all of the objects, with the exception of the schema.
[important]The schema cannot be authoritatively restored. In order to roll back the schema, you must rebuild the entire forest either from scratch or from a set of system state backups that predate the schema modification. This is a small chance for this to happen in almost any environment.[/important]
Before you start doing this in a production network I highly recommend that you test this out in a lab environment to ensure every command and procedure are working as expected and you actually have experience with performing restores. Also, make sure to document everything.
To run the restore, you have to be in Directory Service Repair Mode, and you need to have restored the system from a backup, as described previously.
Once the restore process is done do not reboot the server, just close or minimize the window because there is still some work to do, and that’s to mark the restore as authoritative.
Open a command prompt and run the bellow commands pressing ENTER after each one; just make sure to replace the domain name with your own. If you have a 2003 domain controller the command is a bit different. Back then, there was the restore database option in ntdsutil that you could use to restore the entire Active Directory database, but it was removed starting with server 2008. Now you can use the restore subtree option to mark the entire domain partition as authoritative.
Ntdsutil Activate Instance ntds Authoritative restore Restore subtree "dc=vkernel,dc=local"
Click Yes on the message window that pops-up to confirm the action.
The restore process will take a few minutes, and when is done type quit to exit the ntdsutil command. Reboot the server and wait for the replication to kick-in.
And that’s it. Try to use as much as possible the restore from replication option or the nonauthoritative one since they are more safe and leave the full authoritative restore as the last one, when you can’t think of anything else you could do to revive your domain. And since we are talking about your Active Directory infrastructure use everything with caution, and test in a lab environment.
Want content like this delivered right to your
17 thoughts on “Restoring failed Active Directory Domain Controllers”
This is very informative post I had the same situation and I end up in seizing the roles to 2nd domain controller. I would appreciate if you share your further more expertise to suggest me is it safe to bring a new domain controller with the same name and IP as of the crashed one. My crashed DC had name dc02 with ip 192.168.100.12 would it be safe to bring new DC with same name and IP. Please recommend.
It’s bad when this happens, but I will go and promote a new domain controller with a new name. The IP can be the same as the failed one.
Make sure you clean every record from DNS, AD, Sites and Services for the failed DC.
Let me know how it goes.
Appreciate your quick reply.. yeas I have cleared DNS records and did proper cleanup of failed DC. What if I promote a new DC with same name , would there be any risk in that. If yes than what could the risk be.
Is not best practice to recover using the same name, except when recovering from backup. There might remain records of the failed DC and you will have some replication warning and errors.
Alright thank you very much for sharing your expertise.. I will promote the new DC having same IP and new name, will share with you how it went. Thank you again.
Hi, great article, If i have several DCs with corrupt or deleted AD info with 1 DC that has the good copy of the AD can I just run the NTDSUtil steps mentioned above to make that DC authoritative and thereby update all the bad DCs via replication? and would that need to be run in DSRM mode?
Also as an alternative method could I instead simply disable outbound replication on my bad DCs and then replicate from my good DC?
Since you have a good DC I will just go and remove the bad ones, reinstall OS, promote them back and you are done.
Hi, yes that goes without saying as an option but I wanted to know specifically about the potential solution I mentioned which I think would also be the solution with the least admin effort.
Look forward to your reply.
Restore by promoting new ones. Make sure you change their names.
Thank you, very helpful article
I’m glad you like it.
We have two domain controllers and exchange on our environment, one of the DC failed, and they have restored using full image backup pf the DC, but if cease to function well. DHCP and DNS role is working well on the restored server, but they can’t login the restored server (used the image several months ago). All the file sharing fuction and etc are working well, except exchange. Any clue? They plan to restore the failed DC using the replication method, do they need to delete the old DC from AD Site and Service. Please advice.
Note : This issue was happened on my previous office.
This is why is important to have an updated backup image. They can’t log in because the backup might be very old and the backed up DC has an expired tombstone lifetime. In this case I will just force the decommission of that DC and start from scratch, then let it replicate with the other one that works. Before promoting it to a DC I will recommend you clear the lingering objects.
Let me know how it works.
I am faced with the situation of have to do a non-authoritative restore on my PDC. I have a few questions.
When I create the new server and restore from a DPM system state backup, will it use the same name as before i.e. PDC01 will still be PDC01?
Do I need to seize the FSMO roles from the failed PDC or is that all in the DPM system state backup?
Can you give me any more advice that I might need!
Since you are doing a non-authoritative restore you don’t have to seize nothing. This is used just in case you can’t recover the failed DC. After the restore from backup your DC will have the old name. I highly recommend you do this in a test environment first. Let me know how it works.
Well…thank you 🙂