Dell PowerEdge R715 iSCSI Boot with Server Core 2008 R2

Note: This is written as it comes along and so you'll get to see the failures and hopefully a wrap up with how to make it actually work.

Alright, so here's the project: Get Windows Server Core 2008 R2 booting from iSCSI. The why of it is somewhat simple. I want to be able to iSCSI boot so I can have a set of Hyper-V host servers at my primary location completely configured and perfectly happy. Then I will replicate those LUNs to our offsite SAN hardware. When disaster strikes, I can then just configure those servers at the DR location to boot from the replicated LUNs. (in theory)

Yes, there's easier ways to do this for my scenario like Citrix Essentials for Hyper-V and other SAN replication software which would then allow me to just fail over the setup or configure it as a geo-cluster. But reality is those cost money... and it wasn't included in the budget for this project. We got the money for the hardware/OS and that's pretty much it.

Server: Dell PowerEdge R715 - 12 Core AMD - (3) 4 port Broadcom BMC5709C NICs

Storage: Dell EqualLogic SAN Group - 2 PS6000s and 2 PS4000s

(2) Dell PowerConnect 6248 switches.

iSCSI Boot Learning Material:

Broadcom NetXtreme User Guide (Dell) // Original by Broadcom

Dell Instructions to Perform Boot from iSCSI (page 21-24 & 36-38 )

In the BIOS you need to set the boot order to put the Embedded NIC first in the list, followed by DVD then local storage. Second you need to enable the embedded NIC (assuming that's what you're using for boot) to allow for iSCSI boot instead of PXE. (UPDATE: SEE BELOW FOR MORE ON THE BIOS CONFIGURATION)

Following these instructions I can get the iSCSI LUN to connect on server boot (still working on the secondary connection which seems to cause iSCSI to not even load):

But as soon as I start the Windows installer, and tell it to load the VBD (driver) per the installation instructions  (the disk isn't visible in the available drives list) ...

I LOVE WINDOWS. :( The other problem I seem to be running into is the iSCSI session to the disk drops before I get the drivers loaded. Seems like its a 5 or 6 minute time out which isn't enough time for the Windows install DVD to load and for me to install the drivers before the connection goes kaput.

HRM.... if I load the iSCSI Driver (bxois.inf), then the NDIS Driver - Win2k8 folder (bxnd.inf) and finally the VBD (bxvbd.inf) I don't blue screen. Of course that takes me well past 6 minutes which is when the iSCSI connection drops. Shizzle.

This is interesting - from the EQL logs:

iSCSI session to target '192.168.21.24:3260, iqn.2001-05.com.equallogic:0-8a0906-fa332600a-1fd0000000a4e039-hyperv01-boot' from initiator '192.168.21.33:4428, hyperv01.iscsiboot' was closed. iSCSI initiator connection failure. No response on connection for 6 seconds.

I find that interesting simply because it's consistently 6 minutes after the connection is established that connection drops with this error. I'll have to pop out the disk later and see if it does that because of something the Windows installer is doing or if its something the system is doing. But for now, I'll keep trying drivers.

Read this: Main Link --- Configuring Dell PowerEdge 11G Servers Running Windows Server 2008 for iSCSI SAN Boot (direct link). I believe this explains my blue screen issues. In a nutshell if there's anything wrong in your configuration, when you load the driver you'll get a blue screen. So, I stripped it down to the absolute basics for connections (should have started with the KISS method) - basically just gave access to the volume to the IP address I assigned to the NIC instead of using CHAP like I had planned. I also copied their basic settings and removed everything from the second connection I was working on.

And guess what...

All with no driver installation required. I guess my blue screen issue was that I was even having to load the drivers in the first place. But of course Windows rears it's ugly head again and displays the following: (See the error - click = big)

GREAAAAAT. Now what? Turns out it was my fault (yet again). In order to load drivers on the prior attempts, I had plugged in my USB thumb drive. Well that was causing some problems with the windows installer I guess because I removed the drive, rebooted the box and now the installer is running directly to my SAN LUN.

All is now well for the most part. The biggest issue I have to overcome now is getting the secondary iSCSI connection (fault tolerance for the primary) to actually work. The problem is as soon as I configure a secondary device... iSCSI boot fails to initiate and the system just hangs after the ILO configuration. But Windows is installed.

More to come.... as I expect I'll call Dell on Monday and see what they have to say about the secondary connection issue. But before I do that I'll save some hassle and see about updating all the firmware.

UPDATE: 6/27/11 -- Been on the phone for over 2 hours with Dell storage support team going over my configuration for iSCSI boot. Everything is correct from their perspective, firmware is ok. Now trying to bring a server support specialist on the phone to see if there is something we're missing, because the storage team isn't seeing it. For the record they've been really helpful so far, but no solution yet in sight.

After 4.5 hours.... nada. Same issue. ARG!

UPDATE 6/29/11 -- I am hesitant to use the word resolved since the servers are now so completely out of date that it’s not even funny (and I’m sure support will ask if I ever call in), but I’ve got iSCSI Boot working using a much older BIOS.

iSCSI Boot on the PE R715 will work (with a secondary configured) using BIOS version 1.2.1 which contains the Broadcom NetXtreme II Ethernet Boot Agent v5.2.7 – For reference the server(s) shipped with BIOS version 1.3.1 which contained the v6.0.11 boot agent. I also tried the latest BIOS 1.5.1 (forget the boot agent version). Neither the shipping or the newest version would work.

On another note I was also able to get past the blue screen / reboot loop issue using the Broadcom NetXtreme I/II Ethernet Drivers v. 14.2.4 A04 – both the 16.0.0 A00 version and the 16.2.0 A01 would blue screen the system during the driver installation and then cause the system to go into a permanent reboot loop. I testing this on both 2008 R2 SP1 Datacenter Core and Full Install with the same results.

Update 7/1/2011 -- One more interesting bit on this – It looks like Broadcom Firmware 5.2.7 (separate from the BIOS Boot Agent 5.2.7) needs to be installed on the NICs as well. I had the correct BIOS version on the other two hosts I’m building but they wouldn’t work. I installed package NETW_FRMW_WIN_R270088 and downgraded the NIC firmware from 6.2.12 and iSCSI Boot started working right away on those other systems.

More to come...

Update 8/5/2011 -- After a month of emails back and forth with Dell support, some parts replacements and whatnot... Dell can reproduce the issue every time using current bios firmware and NIC firmware. My case is now up with internal engineering as this looks like a bug in the Broadcom Boot Agent. Sitting back waiting for an official solution now. The temporary work-around is to completely configure the second boot NIC - but do not configure it as the secondary.