Using MRTG to monitor Dell EqualLogic

Alright - if you're here you likely already know what the Dell EqualLogic is and what MRTG is typically used for. You likely also know that there's a (free) download from Dell called EqualLogic SAN Headquarters that can give you a ton of good information about your SAN. It has everything from volume capacity, replica capacity, firmware versions and even I/O information for the group(s) you have. Unfortunately, there's a level of granularity that's missing from SAN HQ. If you want to know which volume or volumes are getting hammered on a daily basis, which ones have higher read or write I/Os - you're basically out of luck. The same goes for using the standard Group Manager application.

Here's an example of what I mean: In this graph you can see that my (mostly pre-production) EQL SAN is for the most part dormant but what you can also see from this is that I have a set of peaks along with a higher level of writes than reads at most times. You can't see from this which volumes are contributing to those peaks and which ones are the biggest reads vs. writes volumes. What I've done on the EQL side is I've broken out the 4 units we have (2x15k and 2x7.2k) into a 2 storage pools with a pair of EQL units per pool. I have a RAID-10 pool on the 15k units and a RAID-50 on the 7.2 units. This was done because we have a need for a pretty large amount of storage (some may deem it miniscule compared to other businesses), but anyway we have a need for large bulk storage of files, D2D backup, and then a decent amount of storage required for SQL and virtual machines. So it made sense to provide a tier of high capacity lower performance storage and a tier of lower capacity but higher performance storage. And yes I am aware that the EQL can handle automagical tiering, but from what I've read so far I don't know how well it would work for me.

With Raid-50 while you gain a lot back in storage (in my case about 8.8 TB per unit or roughly 17.6 TB in my RAID-50 storage pool) you also take a substantial hit in performance (Raid-10 vs Raid5 - not apples to apples but you get the idea) for random writes like you would expect from VMs, DBs and email. At the same time RAID-50 by all appearances improves over RAID-10 for sequential read/write operations.So what I really need to find out is which volumes really need to be on that smaller faster RAID-10 storage pool I've got and which ones can I be comfortable leaving on the RAID-50.

Back to the graph, while SAN HQ provides a nice graph showing me that I've got significantly more writes than I do reads at most times - what I don't find out from that is which volumes these particular reads/writes are coming from and at what levels.

Enter this little document from Dell - USING MRTG TO MONITOR I/O TO A PS SERIES GROUP. Using MRTG to collect information from the EQL you gain insight into the I/O of individual volumes and members. You can determine which specific volumes are the busiest and at what times of the day. You also get to see read/write ratios for specific volumes. All of which are extremely helpful and I can't imagine why Dell hasn't included this in SAN HQ.

The over all basics for configuration are: 1. Install MRTG on your windows or linux box.

2. Configure SNMP access to your EQL group. This can be done either by command line or through the EQL Group Manager > Group Configuration > SNMP > Read-only SNMP Community Name.

3. Download the MIBs from www.equallogic.com > Support > Downloads > Firmware > Downloads Page > MIBs. The two MIB files you need are eqlvolume.mib and eqlmember.mib. Place the MIBs in your MRTG working directory.

4. SSH to one of the EQL group members and run the command: mrtg-config [filename]. If you don't provide a filename it will just put out a file named mrtg.cfg. Note: This will output an MRTG configuration for the entire group so you do not need to do this on every unit.

5. FTP or SCP into that same group member and grab the mrtg.cfg file. Copy the mrtg.cfg file to your MRTG bin folder.

6. Before continuing - make sure you check the mrtg.cfg file and validate that you're working directory is correct and the MIBs directory is correct. Also note: if you have a working MRTG configuration already you can copy out everything below "### Global options for volume IO counter" into your current configuration file and the follow the remaining steps.

7. Run the following command from the MRTG\bin directory -Expect it to throw errors the first couple runs through because the log files are not in place yet. By the third run the files it expects are there and it should not error anymore. As long as this window is open you are collecting data. You might want to look into setting up MRTG as a service.

perl mrtg mrtg.cfg

8. The Dell document will tell you that you can now open server\index.htm and see your graphs starting to populate. That didn't work for me, most likely because I had a different setup than the default. I had to regenerate the index file and copy it into the working directory (run from MRTG\bin):

perl indexmaker mrtg.cfg --output=index.htm

NOTE:

The following actions will require you to regenerate the .cfg and index files:

Adding, deleting or renaming volumes, members or renaming the group.

When all of that is said and done (really not including installation of MRTG its a 2 minute process) you can sit back and wait while your data collects. After a while you'll start getting graphs for I/Os, throughput and latency for both members and volumes.

Seems rather handy no? Hopefully at some point Dell improves SAN HQ and includes this information - but seeing as they posted the MRTG document 4 years ago... I'm not holding my breath.