Author Archive for James Hong


Exchange 2010 DAG quick tips

Thanks for Exchange DAG, having a mail database that can recover from the failure isnt dream.

Admins no longer have to rely on a complicated shared disk clusters, avoiding headaches from troubleshooting a failed clusters, setting up complicated system or even having an expensive shared disks. (However, if you are running exchange, cost of shared disk should be negligible these days. )

Completely off topic, you should be able to but HP/3PAR SAN or Dell Equallogic for less than $50K (AUD).

Now to the point

These points are for the exchange admins who runs system for less than 1000 mailboxes, it may fit even 5000, but I would recommend you to follow best practice guide if you’re such size org.

1 MAPI and REPLICATED traffic on separate network ?

if you’ve read technet or several blog posts you should find that MAPI traffic should be separated from replication traffic. Now the keyword is SHOULD, its is not necessary. In my setup, we’ve got 10G ethernet and runing on top of Hypervisors. Less than 1000 users, and over 10G access layer network, you can imagine I didnt bother with dual NIC.

Single DAG network is SUPPORTED config as well as dual NIC config. It all depends on the MAPI traffic, but frankly, I would MOVE users and create new MBX servers when I get more than 500 users hitting single MBX servers. How big is your basket and how many eggs do you want to keep it together? Eh?

2 Cross site DAG ?

Continue on point 1, if you’re over multiple data centre and generally have single path to over WAN, (thx to route costs, or STP, most traffic would via a single pipe not multiple), again dual NIC was pointless for me.

3 DAG limits

A server can be only member of a single DAG, A DAG database can copied up to 16 DAG members, if you have even number of DAG members, you must have file server as the cluster witness server.

4 Alternate Witness WILL NOT take over when main Witness server is down.

read here (open new window), its a great blog from MS (why cant they make technet doc more like this), explaining how witness falls in the grand schema of things.

BTW, if you have odd number of servers, witness may not get used. Use CLUSTER command to verify what status your cluster is set as. There are several posts you can find with google that cluster quorum node was configured without witness.

5 Cross Site DAG

if you have majority number of server (thats including a witness server) at data centre thats no longer reachable, your database on the local network WILL SHUTDOWN.

eg DC A, with two mail servers and a file share as a witness, DC B, with two mail servers.

If DC B loses connectivity to DC A and mail servers in DC B isnt able to communicate to witness, it will treat the situation as non quorum status and cluster service will shutdown the mail database.

Cross site DAG isnt perfect, this is why some recommends to create multiple DAGs. (eg DAG1 for DC A users, DAG2 for DC B’s users.)

Now to troubleshoot

c:\>cluster <dagname> /quorum

if you’re monitoring the server’s services (like many admins do..) make sure you add CLUSTER SERVICE as well. If cluster service crashes DAG mail database will be dismounted together as well. (oh joy..)

i’ll try to add few more later on.


Cleaning up user profiles

having large number of users’ profile on the file server creates problem in the long run.
Number of Cookies on the share can cause headache of its own.
Using Citrix User Profile Manager reduces the headache, but its still not good enough to leave unused cookies.

To clean it up from the file server, I’ve written some simple script.

For Normal single Company

For Citrix Provider (Multi-tennant)


XenServer Idea Dump

1 Dont go over 95% disk for Xen server Storage Repository.

Running system will stop responding and you will not be able to start/stop servers correctly.

Make sure you MOVE the VMs to other SR if storage space is running short

2 how to find orphaned VDI

xe vdi-list sr-uuid=<sr id> params=uuid
This command will display list of registered Virtual Disks

eg, xe vdi-list sr-uuid=a6811919-143b-e8f8-7767-17bd1af1e968 params=uuid

then perform

ls -alh on the sr mount point

[root@/var/run/sr-mount/a6811919-143b-e8f8-7767-17bd1af1e968]ls -alh

if there are MORE files than vdi-list, that file may be an orphaned file. (dont delete yet)
xe vbd-list vdi-uuid=<disk uuid> params=all and find if the disk has any associations to any VMs.

if the name-label of the VDI is ” name-label ( RW): base copy

That disk a Original of the linked snapshot disks.

see if you can find vhd-parent

see example below.

uuid ( RO)                    : c7bcc486-86c1-48d6-9645-8d897df72d19

name-label ( RW): Citrix Profiler 5.2 C Drive

sm-config (MRO): vhd-parent: 82a7fbe1-ffb1-445d-a80d-45e355511e2d

uuid ( RO)                    : 82a7fbe1-ffb1-445d-a80d-45e355511e2d

name-label ( RW): base copy

================ example of snapshot VHDs ================

-rw-r–r–  1   96   96 6.6G Apr 12 14:26 82a7fbe1-ffb1-445d-a80d-45e355511e2d.vhd – base image

-rw-r–r–  1   96   96  35K Apr 12 14:26 1803ec76-d3b3-4a24-a805-b7422937ea9b.vhd – DIFF DISK

-rw-r–r–  1   96   96  35K Apr 12 14:26 c7bcc486-86c1-48d6-9645-8d897df72d19.vhd – Newly Created snapshot active image


3 Disable Large Receive Offload on 10G NIC Bonding

Disable the LRO (Large Receive Offload) feature for all 10 Gigabit Ethernet NIC member interfaces of a bond.

Identify the 10 Gigabit NICs:
Open XenCenter and connect to the XenServer. Click on the XenServer and navigate to the NICs tab of the XenServer. Identify the 10 Gigabit NICs either by their speed of “10000 Mbit/s” or their Device name, such as “82599EB 10-Gigabit Network Connection”.
The Linux device name for NIC0 will be eth0, NIC1 will be eth1, and so on.

Edit the /etc/rc.local file of the XenServer and add the following text to the end of the file for each of the above identified 10 Gigabit NICs:
ethtool -K <interface> lro off

If the bond consists of eth2 and eth3, add the following two lines:

ethtool -K eth2 lro off
ethtool -K eth3 lro off

Repeat step 2 above on all XenServers with 10 Gigabit NICs and reboot the servers after the modification of the /etc/rc.local file.

4 Enable PortFast on XenServer connected ports.

1 PortFast allows a switch port running Spanning Tree Protocol (STP) to go directly from blocking to forwarding mode; skipping learning and listening.

PortFast should only be enabled on ports connected to a single host.

Port cannot be a trunk port and port must be in access mode.

Ports used for storage should have PortFast enabled.

Note: It is important that you enable PortFast with caution, and only on ports that do not connect to multi-homed devices such as hubs or switches.

2. Disable Port Security on XenServer connected ports.

Port security prevents multiple MACs from being presented to the same port. In a virtual environment, you see multiple MACs presented from VMs to the same port causing your port to shutdown if you have Port Security enabled.

3. Disable Spanning Tree Protocol on XenServer connected ports.

Spanning Tree Protocol should be disabled if you are using Bonded or teamed NICs in a virtual environment. Because of the nature of Bonds and Nic teaming, Spanning Tree Protocol should be disabled to avoid failover delay issues when using bonding.

4. Disable BPDU guard on XenServer connected ports.

BPDU is a protection setting part of the STP that prevents you from attaching a network device to a switch port. When you attach a network device the port shuts down and has to be enabled by an administrator.

A PortFast port should never receive configuration BPDUs.

5 Considerations for IP Addressing in XenServer for Storage and Management Networks

6 Xenserver 5.6 Fp1 – may experience FREEZE (no HF yet , APR 2011)

1. Login to control domain (dom0) on affected box and execute below command
echo “NR_DOMAIN0_VCPUS=1” > /etc/sysconfig/unplug-vcpus
2. Reboot server
Hosts Become Unresponsive with XenServer 5.6 on Nehalem and Westmere CPUsEdit

6 XS snapshot notes

XenServer: Understanding Snapshots

Deleting Snapshot will not recover the DISK space if the storage is connected via iSCSI/FC.

NFS will have smaller foot print, but it will still have main disk + delta Disk remain even when there is no more snapshots.

You can export and import VM to reduce the disk to one or save the space.



7 How to create a virtual router/isolated network on Xenserver



8 vcpu tuning

The following section describes the procedure to modify the default setup with some example commands.

The virtual CPU (vCPU) behaviour can be modified by altering the VCPUs-params parameter of a virtual machine like the following:

vCPU pinning is the term for mapping vCPUs of a VM to specific physical resources.

You can tune a vCPU’s pinning with the following command:

[root@xenserver ~]# xe vm-param-set uuid=<VM UUID> VCPUs-params:mask=1,3,7

The VM from the above example will then run on physical CPUs 1, 3, and 7 only.

The VCPU priority weight parameters can also be modified to grant a specific VM more CPU time than others.

[root@xenserver ~]# xe vm-param-set uuid=<VM UUID> VCPUs-params:weight=512

The VM from the above example with a weight of 512 will get twice as much CPU as a domain with a weight of 256 on a busy XenServer Host where all CPU resources are in use.

Valid weights range from 1 to 65535 and the default is 256.

The CPU cap optionally fixes the maximum amount of CPU a VM can use.

[root@xenserver ~]# xe vm-param-set uuid=<VM UUID> VCPUs-params:cap=80



9 renaming volume or description on dell Equallogic results in connection failure.


In summary, do not rename volumes, do not change descriptions. Both values requires to match up with Xenservers’ data for SR to operate correctly.


10 VM not starting up with Error: Starting VM ‘Name-of-VM’ – This operation cannot be performed because the specified VDI could not be found on the storage substrate”.

mapped Disk image including DVD/CD may be missing. (SR is down or DVD image is missing) , unmount the disk, or find disconnected SR and repair it.


11 XS 5.6 can be configured to support different STEPPINGS of CPU for XenMotion/Live Migration.

You can simply join pool using XenCenter, it will perform CPU masking if it is possible to do so, otherwise manual configuration may be required.

12 Cant delete NIC/BOND ? – Cannot connect to server error

Xencetre WILL run matching command on ALL the xenservers.

If any server has problem your command WILL fail. and leave the status in discrepancy.

If any member of the pool is down, do not perform maintenance task on XenCentre. You will cause more damage if you do. REMOVE the dead server if you have to.

13 Pool master is DOWN!

If you lost pool master, only perform pool master recovery! before running any OTHER command, by running any other XE command you may cause more harm. (remember, VMs will not be affected by pool master down)

  1. Select any running XenServer within the pool that will be promoted. (Each
    member server has a copy of the management database and can take control of
    the pool without issue.)
  2. From the server’s command line, issue the following command:
    xe pool-emergency-transition-to-master
  3. Once the command has completed, recover connections to the other member
    servers using the following command:
    xe pool-recover-slaves
  4. Verify that pool management has been restored by issuing a test command at
    the CLI (xe host-list)

14 I want to install new driver

assuming rpm is driver.rpm
1 COPY/BACKUP modprobe files EVERYFILES under (you will have to recover them if you wish to REMOVE driver. )
on 5.6 SP2
then use RPM to install
>rpm -ivh driver.rpm

15 I want  to remove newly installed driver.

>rpm -e driver.rpm will remove the binary files, however it does not remove modprobe.dep file and as a result you will find that OLD driver will not start up.
find the old line (eg bnx2x) and make sure path is correctly set.

16 Iperf test is really slow, why ?

R610’s Broadcom 10G card 57711 model only performs about 2.3Gb/sec, we have opened the support ticket but only found out that by increasing the thread to significantly high number you will get higher throughput (eg 40 thread manages to get about 6Gb/sec)

Everyone is an architect

Sadly or happily my work title has word “architect” included now.

Its a great new for some, but in honest feeling, I am puzzled to see if I can have such.

I know how real architect (building/design) works years as one and also study to be one for 3-4 yrs. Their goal is to be an architect and train to be one. In IT, there is no such goal or training, I’ve started as IT analyst and moved to be a system engineer and learn to design/implement alone the way. No formal training, no formal qualification.

Should IT engineer (even term ENGINEER, would anger few real ENGINEERS today) name themselves IT architect after 5-10 yrs of the professional life?

Well who am i kidding, I just want more pay, dont we all, and title = higher grade = more pay =(



rain rain another rain

I’ve Found out heavy rain will cause another water leak under my roof. =(

Jan 2011 heavy rain had caused grief, (needed to call for the insurance to get water damage repair) and now this.

I’ve set my excess to $1000, and I dont think this damage would be that much. I guess I’ll find out eh?





you dont know your own software?

Its strange world indeed, my company uses a software that’s written by VB6 and runs within Citrix environment.

This software vendor decided to call me up asking for help integrating their software on OTHER customer’s Citrix environment.
Well, as a friendly Citrix dude, I did give him a few pointer but isnt it rather strange for the vendor to call the customer to seek help for other customer’s issue?

Its YOUR own software ppl, if you dont think it runs on Citrix or dont know how to troubleshoot software integration LEARN how to do it, seriously.

I really get frustrated when a developer calls an infrastructure engineer (usually me) saying, “I dont know anything about OS, Network, blah blah, can you fix it ?”
I have spent enough time learning C#, App debugging, network (switch/routing) so I can help myself integrating/debugging the issues I face.

Why is it always someone else’s problem when the problem is escalated to the developer ????

This is like a trend, dev who doesnt know SMTP, IIS, DBMS, firewall, TCP/IP.

or it must be just me working with horrible developers..


Solutions for Virtualizing Internet Explorer

I received some news letter email that there is a white paper release from MS for virtualizing IE.

Wow… one from MS?

Assumption for me was that ‘it has to be how to use App-V’ their flag ship application virtualization suit.

who am i kidding here, the white paper (here) is about using VIRTUAL PC and Terminal Server.

Did they really have to write up a fancy WHITE PAPER for this ?

July 2019
« Jan    

Greyeye Tweets