ZCM 11 – fixing a slow ZCC console

snail-160313_1280Over the past couple of months I’d been getting numerous reports from our techs that ZENworks Control Center was getting progressively slower causing problems with:

  • remote control sessions not starting consistently
  • Bundle assignments failing \ taking 5-10 minutes to apply
  • searching for Devices not completing

Oddly ZCC seemed to be running a bit better on my login so I started having a look around for known issues, then stumbled across this:

Ref: https://www.novell.com/support/kb/doc.php?id=7015054

Although the article suggests 11.3.1 (that we’re running) should have the issue we tried running a few of the tech accounts as Super Administrators, which seemed to help the initial login but didn’t solve the other issues above. I’ve since seen another article elsewhere that suggests 11.3.2 is required to fix the non-Super Administrator issue. However, I’m waiting for ZCM 11 SP4 to make the server and agent upgrade work worthwhile so holding off of 11.3.2 for now.

Ref: https://forums.novell.com/showthread.php/476553-Slow-login-in-ZCC-for-non-SuperAdministrator-users?p=2316919&mode=linear#post2316919

Server resources

Having only made minimal improvement with the Super Administrator fix above I turned my gaze to the servers themselves in case we were hitting a resource issue somewhere. Running “top” on the Linux primary servers didn’t show any signs of them being under heavy load, plenty of free RAM and given they’re running on an auto-tiering SSD-enabled SAN disk performance isn’t a concern either… onto the database server.

Our ZCM database runs on a dedicated Microsoft SQL Server VM, which gives a few potential pain points to watch out for. We’d already experienced issues in the past with ZPM causing massive growth of log files so it wouldn’t be a surprise if a database problem was the root cause here too.

Our database is 30GB+ so we tried upping the memory to run the whole lot in RAM but that had minimal effect (apart from creating a huge pagefile on the C: drive!) so that was scaled back to 16GB . Multiple vCPUs were already configured so nothing to change there. Disk space and datastore latencies were all looking good as well so no problems on the storage side either.

A closer look at SQL

At this point it was time to drill a bit deeper into the SQL Server itself to see if there was something within the database that could be causing our issues.

Initially I tried running the manual index defragmentation process (on top of our standard SQL Maintenance Plan) that’s referenced on the “Advanced SQL Concepts” support page

Ref: https://www.novell.com/documentation/zenworks113/zen11_cm_deployment_bp/data/b1abnlnh.html#b1abnss2

Various indexes were showing as fragmented but the end result was a marginal speed increase, which could well have been a placebo effect so no magic bullet here (although a good practice to run as per Novell’s recommendations)

By chance I stumbled back across a tool I’d used in the past called SQL Heartbeat so decided to pop it on my machine and watch the ZENWorks database for a while to see what appeared.

Ref: http://www.sqlsolutions.com/products/sql-server-monitor/sql-monitor.html

The results were almost instant, what I like about the Heartbeat tool is the graphical representation of SQL database process IDs, which makes spotting the problematic one(s) very quick. If you really don’t like 3rd party tools SQL Profiler will probably provide similar results.

A screenshot of what we found is below, watching the activity on the server it seemed that a query from one of the primaries was going round and round in circles every 1-2 minutes causing a huge spike in CPU and disk activity. The server CPU never dropped below 50% and often was staying up in the 80-100% range, no wonder ZCC was running slow!

zcm sql activity monitor high cpu

zcm sql heartbeart process monitor spid 152

SPID 152 – I choose you…

Looking at SQL Activity Monitor I could see a matching “Expensive Query” which stands out like a sore thumb in terms of the volume of reads, writes and CPU time.

zcm sql recent expensive queries

Solution

Initially I tried stopping and restarting the novell-zenserver and novell-zenloader processes on the primary server identified on the SPID. Initially it did make the process disappear but it then reappeared a few minutes later. Restarting the affected primary server also had no effect.

We raised an SR with Novell but before we got past the initial “check ZCC status” type troubleshooting steps we had a large power cut that forced some of our VM hosts offline, including the primary above that had the large query associated with it. When everything came back up the database server was back at normal resource usage and the stuck query had disappeared. Definitely goes down as an unconventional fix!

Now ZCC is lightning fast and all the issues we were experiencing disappeared with the stuck query 🙂

Tools for next time

After doing a bit of post-event research there’s a few more useful free tools out there I’ll try out next time something like this arises:

sp_WhoIsActive procedure
http://sqlblog.com/files/default.aspx

Idera SQL Check
https://www.idera.com/productssolutions/freetools/sqlcheck

Advertisements

ZCM PXEMenu: TFTP Read File failed

Just a quick post but could prove useful to anyone who heavily customises their ZCM 11 imaging servers:

When I first started working on our imaging system we only had a couple of entries in the boot menu, which was pretty much the same as the Novell out-the-box menu bar one added option for our own image.

As I started customising further we gained more and more options until I got to the point of making a couple of submenus to house the various scenarios I’d built up, including:

  • standard single image (clear existing ISD data)
  • multicast image (master and slave machine options)
  • OOBE image (run the first couple of imaging scripts to pre-install drivers but shut down leaving the machine ready to use just needing a name)
  • various diagnostic options (basic VGA for unsupported chipsets, imaging manual mode etc.)

The error

The other day I went to add another line to try out some new code then to my surprise got a call from one of our technicians saying the PXE boot was broken – sure it enough it was:

ProcessPXEMenu: TFTP Read File failed


The fix

Initially I thought I’d made a typo on one of the new lines I’d added, or perhaps forgot to upload a matching config or script file that the menu was calling. Checking back I couldn’t see any errors but did notice in WinSCP that the file was now 76 lines long.

I removed the new line, back to 75 lines total and rebooted… PXE boot worked again!

I then removed an old comment line I didn’t need anymore and replaced it with the new option I tried to add initially and sure enough the PXE boot still worked. Adding the comment line back caused the error again.

It seems that there’s some sort of size restriction on the pxemenu.txt file, whether it’s file size or a 75 line limit I can’t say for sure but definitely one to watch out for if you like to customise your imaging menu.

Solving PXE boot problems on ZCM 11

pxe boot blogDuring the last week I’ve been having a look at ZCM 11.3 in preparation for when we upgrade our production zone from 11.2.3a. I wanted to check that imaging was still going to work in the same way as before as well as testing some of our new hardware that doesn’t work with the current PXE drivers.

The test environment makes use of some of our old server and comms kit including some Dell PE2950 servers running ESXi hooked up via Cisco 3750 switches.

The DHCP server was installed on a Windows Server 2012 R2 virtual machine.
I downloaded the ZCM 11.3 appliance, imported it and ran through the setup wizard, all pretty painless so far.

With the zone configured I then tried to PXE boot a client PC but it disappointingly failed with an error

“PXE-E51 No DHCP or DHCP Proxy Offers received”

In the end a series of fixes were required to get PXE working, not all of them present in the official Novell documentation so I figured it might be useful to pull everything together in one place

Server services

By default the ZCM server doesn’t have the Proxy DHCP service enabled. Without this you’re going nowhere so log onto the server with Putty \ console and type the following

service novell-pbserv start

check it with

service novell-pbserv status

While you’re there it’s also worth setting it to auto-start using chkconfig otherwise it’s an easy step to forget if you reboot the server at some point in the future.

Firewall

The appliance also ships with the firewall enabled but this seems to block PXE boot (!)
Solution: turn it off using the YaST tool (console onto the GUI of the ZCM server for this)

Ref: https://www.novell.com/support/kb/doc.php?id=7005130

VLAN environment pre-requisites

My dev environment was set up as a series of VLANs, in this scenario make sure you have ip helper-adress configured on each VLAN interface. According to the Novell documentation you need two entries, one for your DHCP server’s address and the other for the ZCM server that’s providing the PXE service.

Ref: https://www.novell.com/documentation/zenworks11/zen11_cm_preboot_imaging/data/bve6kpq.html

You also need ip-forward rules set up on your router \ L3 switch

ip forward-protocol udp 67
ip forward-protocol udp 68

Cisco switch port settings

Despite all the fixes above the client device still wouldn’t boot from the network and was beginning to wonder if it was ever going to work. The missing link was that Portfast needs to be enabled on Cisco switches (might apply in a similar way to other vendors) to ensure the port comes up quickly enough for the PXE service to work.

Ref: https://www.novell.com/support/kb/doc.php?id=3131242

PortFast has been known to have been switched off and this has caused issues on the PXE boot sequence. PXE tends to boot faster and request DHCP faster than the switch can handle.
PortFast has been enabled so that the Switch can start talking to a device without going through the process of waiting for the switch and device to decide what speed they will communicate, by enabling Portfast the switch will open the port and enable packets to flow.
The normal time period for the Switch to open up a port is around 30 seconds, with PortFast enabled the clients can start talking as soon as they are switched on, and in the case of PXE boot services it would not wait for 30 seconds.
 

Troubleshooting tips

The server logs can be useful to help figure out how far along the path the packets are getting (or not) so you know if the problem is on the networking side or the server. To check if your DHCP requests are getting through have a look in

/var/opt/novell/log/novell-proxydhcp.log

and you should see a line like this, where 192.160.0.X is the server VLAN’s IP address.

Received packet on 0.0.0.0:68
Received packet on 192.168.0.X:67 from relay agent 192.168.0.X

You should also be able to see workstation information as they check in to the imaging system, this log file is a little further into the folder tree

/var/opt/novell/log/zenworks/preboot/novell-pbserv.log

HTA based quick launcher for Adobe CS6

As you may have already seen on here I’m quite a fan of HTAs as the mix of script and HTML formatting can prove rather useful for creating little toolkit-style apps. This time round it’s the turn of the Adobe Creative Suite to get kitted up.

The reason for this came about as part of our attempt to clean up the desktop of our Windows workstations; it’s far too easy to end up with icon overload which isn’t nice to look at and (in my personal view) not that great to use.

When we pushed out our Adobe CS6 Bundle via Zenworks we wondered what to do with the icon that hangs around after installation completes. Then had an idea – what if we could make something where the user could launch all the CS6 apps from a single shortcut without needing to go back to the Start Menu each time.

Et voila 🙂

cs6 launcher

Thanks go to iampxr on DeviantArt for creating an icon pack for Photoshop that I shrunk down and embedded into the HTA using the trusty base64 encoder I’ve used quite a few times now. There’s also a PNG version of the CS6 icons at dAKirby309 also on DeviantArt if you prefer that format.

At that point it’s just a matter of creating a suitable table format, getting the first link right then copy \ pasting the rest and changing image code and paths. A little bit of CSS later and the colour scheme matches the same dark greys used in CS6 programs to give it an authentic look.

Note: this launcher is for an x64 CS6 installation where some applications e.g. Photoshop run from the standard “Program Files” folder and the others from “Program Files (x86)”. If you’re purely 32-bit then change paths for each application as required.

If you want to give it a try grab a copy from SkyDrive – as always I recommend looking through the code to check it does what you want and that paths match your environment.

A warning for those using the Novell Image Explorer

I know this won’t apply to many people reading posts here but for those who do this could save your hours of frustration!

If you’re following the ZCM Imaging Megapost and creating driver packs with Image Explorer as per the instructions here there’s a glaring bug you need to watch out for :

  • you’ve created a ZMG file with some add-on content e.g. drivers, let’s call it DRIVERS.ZMG
  • you decide that you want to make some major changes to the package so you create a new file in Image Explorer and import your new content
  • you click File > Save As and click on the existing DRIVERS.ZMG file with the aim of overwriting it
  • Image Explorer asks if you want to Overwrite the existing file and you say Yes

You’d think this would be fairly simple and work OK… you’d be wrong!

I found the bug while testing a custom imaging process that kept failing with random behaviours that didn’t match the scripts I was working on. In desperation I opened up a file browser window in the middle of the sysprep process to check what was actually on the hard drive and was amazed to find (very) old versions of the scripts running instead of the new ones I believed had been saved using the process above! At least finding out the source of the problem stopped me from wanting to punch a hole in the case of the PC I was working on…

Seems like Image Explorer doesn’t actually complete the save process despite showing up as if it has. The workaround is to always delete the old .ZMG file before saving a new version with the same name. Basically never do this…

zmg_explorer

Automating Lanschool deployment and setting channel with Powershell

lanschool-powershell

We use Lanschool as our classroom management software and as part of our Windows 7 deployment needed to reinstall the client on all our re-imaged desktops. In the past this was done manually by visiting each room after the OS had been installed. This time round I wanted to try and remove the manual work and find an automated method instead – the installation itself is bread and butter MSI silent install, however setting the correct channel for each room was slightly more challenging.

If you’re reading this you already know that Lanschool uses “channels” to decide which PCs are controlled by the teacher machine in the room. The problem for us was that Lanschool uses purely numeric notation for this whereas our rooms are a mix of letters and numbers.

Having sat down for a while trying to figure out a formula or numbering convention it was soon obvious that method wouldn’t work… back to the drawing board! The next idea I had was to create some sort of lookup table containing all our classroom names and then assign a unique channel for each one, easy enough with Excel auto-fill 😉

After a couple of minutes I had a suitable listing created and saved in CSV format as I figured it would be the easiest format to work with, now onto the Powershell. Initially I wasn’t sure how to read the file but after a bit of Googling found what I was looking for, the aptly named import-csv function. I specified the header values here rather than in the CSV file itself to keep the source data as simple as possible. The last part of the code grabs the channel number for whatever value resides in the variable named $WorkStationRoom

$channelvar = Import-Csv C:\setchannel\lanschool.csv -Header Room,Channel | Where-Object {$_.Room -eq $WorkstationRoom } | Select-Object Channel
$channel = $channelvar.channel

Because I base the script around the location of the machine I’d already read this into the $WorkstationRoom variable by grabbing it from a custom location I make in the registry during our naming process while imaging. To read the registry use the Get-ItemProperty function. Obviously how you get this data will depend on your naming convention and \ or what data you have available on the machine to get the room number from but the example below should give an idea of how it’s done.

$RegWorkstationRoom = Get-ItemProperty -path "HKLM:\Software\HCFHE" -name "WorkstationRoom"
$WorkstationRoom = $RegWorkstationRoom.WorkstationRoom

At one point I wondered if I was going about actually getting a value in a clunky way as I always needed the second line, however it does seem to be the right thing to do. There’s a better explanation of why on this thread (see the post by the user named Graimer)

“First we get an object containing the property we need with Get-ItemProperty and then we get the value of for the property we need from that object. That will return the value of the property as a string.”

Now we have our channel number we need to set it, fortunately Lanschool provide a utility called setchannel.exe that does exactly what it says on the tin! It can be found in the utilities folder of the Lanschool install files. More info on page 29 of the Lanschool manual. Call it from Powershell like this, using the WaitForExit method as it takes a few seconds to process the channel change…

$ChannelCommand = "C:\setchannel\setchannel.exe"
$process = [Diagnostics.Process]::Start($ChannelCommand,$channel)
$process.WaitForExit()

To build some validation into the process I check to see if there actually is a value in the $WorkStationRoom variable before running setchannel. If the variable comes back with a null value I set the channel to an arbitrary value I don’t use elsewhere (in this example 999). I can then use ZCM to query the channel value stored in the registry in HKLM\Software\Lanschool\Channel to find any machines that haven’t got a “proper” value set.

Note: you’ll need to check for this as any machines that get set to the “failed” channel number will all be controlled together, regardless of where they are!

if (!$channel)  {
    Write-Host "*** ROOM NAME INVALID ***"
    Write-Host "Quit imaging, check room name and restart"
    Write-Host "channel number set to 999"
    $channel = 999
} else {
    write-host $channel
     }

To finish off the installation make a quick update the registry to disable the annoying “would you like to register Lanschool?” dialog box that insists on popping up at every boot.

Use your preferred method of updating registry entries (in our case it was an action in the ZCM Bundle) to change the teacher.exe entry in HKLM\Software\Microsoft\Windows\CurrentVersion\Run to add the IgnoreRegPrompt parameter at the end of the executable path…

C:\Program Files\LanSchool\teacher.exe IgnoreRegPrompt

When done go to the teacher PC, hover over the Lanschool icon in the system tray and you should see the channel number from the input file in the popup panel… ans there you have it, zero-touch Lanschool deployment that works a treat 🙂

ZCM 11 imaging megapost – scripted multicasting

windows 7 novell blog banner

Just when I thought I’d finished all the ZCM imaging posts I remembered one more very useful piece I hadn’t covered – how to create an interactive bash script to create multicast session with minimal effort 🙂

As discussed earlier on in the series I found that ZCM imaging out-the-box didn’t suit our requirements in terms of getting brand new metal out the box and imaged without messing around with MAC addresses or running the risk of hardware rules running riot on existing PCs. As a result it also meant that ZCM server-side multicasting wasn’t going to work for us for a couple of reasons:

1) can’t use Multicast Imaging Bundles due to ZCM not knowing about the new PCs’ existence
2) technicians don’t have SSH access to the ZCM server to create multicast sessions manually on there
3) our switches don’t route multicast packets at the moment and we didn’t fancy reconfiguring the network to do that right now

At this point talking with my colleagues brought up the fact that they used to multicast rooms manually in the past using one of the other computers in the rooms as a “master” machine then firing out the (sysprepped) image to the rest of the machines. This seemed like the best way forward but only if it could be scripted \ automated.

Of course it can 😀

Set up your scripts

Looking back on the code it’s quite a simple process really, first thing you need is to create two scripts with matching entries in pxemenu.txt etc as covered in post 6 – I named mine win7image2013.mcm.s (master machine) and win7image2013.mc.s (slave machine)

Master machine process

So what does our Master machine script need to do?

  • notify user what will happen next
  • download and apply base image
  • ask user for parameters to create multicast session
  • create session and wait for slaves to join
  • apply add-on images and reboot

First things first, clear the screen and set some colours for later on:

#! /bin/bash
echo -en "33c"
#DEFINE COLOURS TO BE USED FOR SCRIPT ECHO COMMANDS
RED='\e[0;31m'
YELLOW='\e[1;33m'
GREEN='\e[0;32m'
NC='\e[0m'

After some funky ASCII art and additional code to show machine model and network status (more on that another time) I run a series of echo commands to tell the end-user what happens during the imaging process.
Note: the 20 minute statement was based on our (slow) imaging times that have been since found to be caused by a ZCM bug, click here for a fix that will reduce that time down to under 5 minutes!

echo -e "${NC}This process will set the current machine as the multicast MASTER node"
echo "all other machines will download their image from this machine"
echo
echo "******************************************************************"
echo " First we need to download the Windows 7 base image " 
echo " this will take about 20 minutes on most hardware "
echo " all data on the disk will be wiped during ths process "
echo "******************************************************************"
echo

Next we need run the standard img command to pull down our Windows 7 base image (I run zisedit -c before this but your might not want \ need that so haven’t included it below). We ask the tech to confirm the imaging process one more time before they start, if you don’t want this just use the img command on its own

read -s -n1 -p "Are you sure you want to image this machine? y/n" confirm
if [ "$confirm" = "y" ] ; then
img rp $PROXYADDR /var/opt/novell/zenworks/content-repo/images/WIN7_BASE.zmg
else
 echo
 echo
 echo -e "${RED}<---- Multicast imaging aborted ---->${NC}"
 echo
 read -p "Press [Enter] key to reboot..."
 reboot -f
fi

Once that’s done we need to ask the user what the name of the multicast session will be and how many slave machines they want to image. Once all the slaves join the session will automatically start.

echo -e "${YELLOW}Type a unique name for this Multicast session" 
echo -e "${YELLOW}---------------------------------------------" 
echo -e "${NC}this will need to be entered on each PC to be imaged when they are booted up"
echo "e.g. use the number of the room you are imaging at the moment"
echo
read session_name
echo
echo -e "${YELLOW}How many PCs do you want to image this session?"
echo -e "${YELLOW}-----------------------------------------------" 
echo -e "${NC}Imaging will not start until all machines have joined the session" 
echo "so don't set this number larger than the number of computers in the room"
echo "imaging will start when one of 3 events occurs"
echo
echo " -> the required number of machines join the session"
echo " -> the user presses the Start Session button"
echo " -> the 3 hour timeout period expires"
echo
echo "Please enter the number of PCs to image then press Enter..."
read numberofclients

Finally create the session using the img command with the -master switch and the variables defined above . We set a long timeout value in case someone leaves the room in between starting the slave machines (the -timeout value is measured in minutes)

echo
echo -e "${NC}Ready to create session named ${GREEN}$session_name ${NC}that will image ${GREEN}$numberofclients ${NC}PCs"
read -s -n1 -p "Are you sure you want to create this session? y/n" confirm
if [ "$confirm" = "y" ] ; then
img -session $session_name -master -clients=$numberofclients -timeout=180
img rp $PROXYADDR /var/opt/novell/zenworks/content-repo/images/addon-image/SCRIPTS.zmg
img rp $PROXYADDR /var/opt/novell/zenworks/content-repo/images/addon-image/DRIVERS.zmg
reboot -f
else
 echo
 echo
 echo -e "${RED}<---- Multicast session aborted ---->${NC}"
 echo
 read -p "Press [Enter] key to reboot..."
 reboot -f
fi

Slave machine process

The slave machine is similar to the Master but only needs one question to set up the multicast (name of session to join)

echo
echo "Type the unique name for this Multicast session" 
echo "-----------------------------------------------" 
echo "enter the session name you used when setting up the MASTER machine"
echo "e.g. this may be the number of the room you are imaging at the moment"
read session_name

Then set off the session with the trusty img command
If you use add-on images you’ll need to add those on the lines afterwards

img -session $session_name -client
img rp $PROXYADDR /var/opt/novell/zenworks/content-repo/images/addon-image/SCRIPTS.zmg
img rp $PROXYADDR /var/opt/novell/zenworks/content-repo/images/addon-image/DRIVERS.zmg

Multicast speed (or lack of)

In testing we’ve been unable to get our multicast times lower than 30 minutes despite Unicast finishing in 5 minutes, whether this is down to the host machine (i3 \ SSD so doubt it), network (possibly) or ZCM’s imaging engine (possibly) I don’t know but don’t be too disappointed if it doesn’t go like lightning.

In some cases we’ve managed to get away with unicasting as the image downloads so quickly with the progress bar speed fix that machines finish the download process before the next batch have been powered on, PXE booted and loaded the imaging environment.

This wasn’t the case beforehand (when the base image took 20 minutes+ to apply) and in those circumstances our upper limit was about 16-20 machines before the server choked so make sure you check out this thread if you have newer Intel chipsets in your machines!