There is no question whether your PACS will go down or not, the
only question is when, how often and how do you prepare and anticipate it, in
other words, how can you minimize the panic factor?
Downtime is generally understood as the period during which
a system is non-functional or cannot work. Note that it does NOT include the
time that a system is potentially slowing down as it is auto-repairing a
failure, such as can be the case for a disk crash which is part of a redundant
RAID configuration, or a server failure, which is automatically taken over by a
mirrored server. It does not include scheduled downtime either, which is used
for software upgrades and maintenance.
According to Mike Cannavo, aka “the PACSman,” a typical RFP
for a PACS system requires an uptime of the mystical “five nine’s,” which is
99.999 percent, or 5 minutes/year. However, I have yet to see a PACS that is
down only 5 minutes/year, a more realistic number according to Mike is 99.5%,
which equates to about 44 hours, which includes scheduled downtime. However, it
is critical to have a measure of system performance that constitutes a
“downtime,” for example, I would consider retrieval time of an image slowing
down to one minute a downtime, while others might not.
What is a typical amount of downtime? I usually ask my PACS system
analyst (SA) students when was the last time that their system was down, and I have
found that it ranges widely, between one student whose system went down once a
week (I would not want to be on call for that system) and those who can’t even
remember the last time it was down as it was several years ago. Based on their
feedback, it appears that once every six months seems to be the norm and/or
average. The average downtime seems to be a couple of hours. Taking into
account the numbers from Mike, that seems to be not far off the norm.
Which measures can you implement to take the panic out of a
system going down, i.e. considering it being an unscheduled downtime?
1.
Have well-defined downtime procedures, which are
visible and have all the users trained on how to use them. The procedures
depend on the user, so have a little “cheat-sheet” at their desk telling them
what to do. For example, for a technologist at a modality, it might say “select
alt PACS” to send images to, for a radiologist it would say “select alt PACS
worklist,” text PACS SA, or “Use web viewer,” etc. And as mentioned, train the
users so they know what to do.
2.
Have a test system. Surprisingly enough, when I
did a poll, I found out that only about two-thirds of users has a test system
in place. Not only should there be a test PACS but also a test worklist
provider, voice recognition system, and any other critical component. The test
system is used to test updates, including patches, train users in new features,
and most importantly provide a “life-support” while the system is undergoing
scheduled maintenance or experiencing an unscheduled downtime.
3.
Use mirroring. This is different than having a
test system, a mirrored system is a fully functional, operational duplicate of
the main system, preferably at a different location. For North Texas where I am
based, that means sufficiently far away that a tornado would not hit both
centers, for southern Louisiana it would mean in another state not subject to
the same hurricane or flooding. For California that would mean not on the same
fault line subject to an earthquake.
4.
Test your downtime backup. How do you know if
your backup solution works? You’ll have to test it, which is a legal
requirement for the state of Texas for all government/state institutions. For
example, at UT Southwestern in Dallas, they will run their orders from an
external system once a year to show it can be done.
5.
Have an alternate workflow for critical areas.
One of my students told me that he burns a CD for all cases in the OR and sends
them up to the location every day, just in case the system goes down. The same
can be done on-demand for critical cases in the ICU in case PACS (or the
network) is down. Or, subsequently, one could burn CD’s in the ER for reading
at a stand-alone station in radiology.
6.
Have a dual source for the information. Many
hospitals used to have a separate web server that stored a copy of the images in
a web-accessible format that can be viewed from any PC in case the PACS is
down. Unfortunately, from a redundancy perspective, many of these web servers
have gone away as PACS systems have integrated those in their main archive. The
trend to have a separate VNA as an enterprise archive, however, gives back that
duplication.
7.
Have more than one access point. In addition to
having multiple sources of the information, having multiple access points is just
as important, such as the capability to look at images on PC’s, tablets, or
even a phone, not necessarily with the same quality but good enough for an emergency.
This is not unheard of, I know of a surgeon who takes a picture with his phone
from his view station and shares these with his surgical team on a regular basis.
8.
Reboot, re-initialize on a regular basis. In the
early days of Windows implementations there were quite a few hiccups and I
remember that we were able to reduce the downtime significantly by
auto-rebooting each computer automatically every night at midnight. Software is
sometimes funny; there can be “loose threads,” unclaimed or unreleased blocks
of memory, or multiple unnecessary background processes running that could
impact performance or reliability which is simply cleaned up by a reboot.
9.
Be aware of external factors. One of the most
common reasons for system downtime is people cutting through cables, or sharing
the wiring closets with plumbing. This is especially common when there are
multiple campuses, where there could be someone digging a hole somewhere and
impacting power or network availability. Even air-conditioning can bring a
system down. Just last week a major brand-new facility here in the north Dallas
metroplex had to shut down its server room as the A/C was down. And, for some
reason, architects like to position IT in the basement, which obviously is the
worst place for flooding and water breaks. Ideally it would be best to locate
them on the top floor of a building, but realistically that is in many cases
prime real estate.
10.
Constantly monitor your weak points and critical
components. When I visited a PACS SA room not too long ago, I saw a monitor on
one of the desktops that was scrolling a set of what looked like text strings.
Upon asking, he told me that these are his RIS feeds containing all the orders
for his PACS. He had no clue as to how to interpret the HL7 order messages, but
he knew that as soon as the screen would stop he had a problem, as orders are
not coming in anymore. As most of you know, in a typical size department, a one-hour
RIS downtime results in a full day of fix-ups at the PACS back-end so he was
very keen to monitor that data stream non-stop.
Being down does not have to result in panic. If proper
procedures and methods are in place everyone knows what to do and you have time
as an imaging and IT professional to fix he problem and get the system back up and
running. In addition, having the right infrastructure and architecture as well
as tools are essential. But system reliability is a factor, if your system is
down once a week you might want to look for another vendor.