A quick link and extract from an update on Wednesday in ESA's Science directorate web pages; it provides a nice overview of the 'return to science' situation. References to our very own VMC camera activities highlighted - and note very nice comments on teamwork! Click link to read the full report.
Full report via ESA Science & Technology
While full science operations have now been resumed, a number of tasks remain to be completed. Most important among these is the implementation of an OBCP scheduler. This will enable the spacecraft to operate autonomously for up to a week, compared to the few days that are possible with the current FAST system. Work is also in hand to resume operation of the Visual Monitoring Camera (VMC – the 'Mars webcam').
Enormous team effort
Completely redesigning the way in which Mars Express is controlled has involved an enormous amount of work for the mission control team at the European Space Operations Centre (ESOC), assisted by their counterparts at the European Space Astronomy Centre (ESAC), PI-teams, other ESA experts and partners in industry. Everyone involved with the mission is extremely grateful for their hard work.
Although the 'Express' in Mars Express highlights that the mission was developed in a short time and with a relatively modest budget, the ability to resume full operations after a very serious failure shows that the resulting design is both robust and flexible.
Mars Express has now been restored to full operational capability and its potential mission lifetime remains unchanged
"Hang on, lads; I've got a great idea..."
The Italian Job, 1969
Last month, we spoke with several of the Mars Express team here at ESOC about their almost completed activities to restore, reconfigure and return Mars Express to service.
An interview with Mars Express Spacecraft Operations Engineer Daniel Lakey
Spacecraft Operations Engineer Daniel Lakey sitting beside the SSMM
A black box, edge-length 30 cm, is at the centre of the recent trouble with Mars Express.
Daniel Lakey, an engineer working on the mission at ESOC in Darmstadt, looks down at the engineering model of said black box sitting on his desk and recalls the seemingly endless night shifts he has had to pull because of its twin mounted on Mars Express, orbiting the Red Planet many millions of kilometres away.
In mid-August 2011, Mars Express unexpectedly placed itself into safe mode – think blue screen of death and reboot on a PC – because something went wrong either with the Solid-State Mass Memory (SSMM) housed inside this black box or with the on-board channels it uses to pass data to the spacecraft’s data management system (DMS) computer.
To extend the PC analogy, imagine that the memory chips in your computer, the RAM chips, or the memory controllers that tell them what to do, suffered a fault. The memory might continue functioning, apparently normal, but whenever an electronic signal tried to access the faulty unit, the operation would fail and the system would crash. That's what happened with Mars Express.
Holiday phone call at 3 AM
"I was on holidays in England when I got the call at three o'clock in the morning. Since I'm assigned as the mission's software coordinator, the problem fell in my area of responsibility," says Lakey.
Switching into safe mode means that the spacecraft automatically turns its solar panels to the Sun for maximum energy and its antenna to Earth for good communication – ostensibly very helpful in any untoward situation – but this process uses a significant amount of vital fuel. Every unnecessary safe mode reduces the life of this hugely valuable mission, and in safe mode, normal gathering of scientific data stops.
After an initial investigation, it was found that the safe modes were being triggered by the DMS computer whenever a batch of commands transferred from the SSMM was interrupted.
The problem: Big command batches were being interrupted,
triggering a fuel-gobbling safe mode
The SSMM is a large-capacity device, and it stores large numbers of commands sent by mission controllers and the instrument scientists, as well as raw data gathered by the instruments (prior to their being radioed back to Earth).
The SSMM then delivers a constant ‘stream’ of commands to the DMS computer one at a time; when the stream was interrupted – either due to a fault in the SSMM or due to some unknown problem with the on-board communication channels – the DMS detects the problem and auto-commands the spacecraft to switch to safe mode.
Taking action - but problems persist
At first, the flight control team executed the standard recovery procedures and restarted observations, hoping that Mars Express would function normally again.
But, frustratingly, safe modes happened two more times in the next few weeks, even though the engineers had tried switching on-board systems to use back-up communication channels (there is only one SSMM), among many other normal fixes. Nothing in the routine procedures, it seemed, could prevent the frustrating safe modes from occurring.
"We had to find a solution," says Lakey, "otherwise the mission would have soon been over."
By late August, the team had already gone through many night shifts trying to coax the recalcitrant spacecraft into some sort of stable configuration, with little luck.
"But then one day, an idea came to mind – while I was standing under the shower," says Lakey, with a laugh. "It occurred to me that, since something was happening to interrupt the flow of commands, triggering the safe mode, the solution might lie in by-passing the checks between the SSSM and the DMS computer, and finding a safe way to ignore problems with the link between the two."
With a little checking, Lakey was able to determine that the problem was, in fact, an issue of 'transient communication problems' between the SSMM and the computer. "When the main computer sees this interruption, it interprets it as a serious problem and stops executing its 'To-Do' list of commands – because it doesn't know whether the list is complete," says Lakey.
Fortunately, there's another, back-up, memory inside the DMS computer that could store the command stack, but it's much, much smaller than the SSMM, holding only 117 commands vs. over 3000.
So the engineers set about reconfiguring the spacecraft's systems to transfer commands from the SSMM to the onboard computer's memory in a different way. Rather than a constant stream of commands, one at a time, the commands would be transferred as a discrete block of commands relating to one complete spacecraft activity, just before that activity started.
"I thought we could use a trick, by packing the commands into smaller stacks and telling the on-board software to act only when it received a complete package. This 'all-or-nothing' scheme means we're no longer affected by the SSMM problems, but now we have more limits on what we can schedule in one go – but that's been proven to be acceptable."
But would they buy it?
As soon as he could, Lakey presented the idea to his colleagues.
"Perhaps predictably, they reacted with an operations engineer's traditional caution and scepticism. The first answers were, 'No, no, that won't work, No way...' But, after a lot of discussion, they slowly came around to 'Oh wait... maybe we should look at this... it could work'," Lakey tells.
The solution: make command batches small
With a clear consensus and the approval of Mars Express Spacecraft Operations Manager Michel Denis, the team set to work designing operations procedures that could be implemented using reduced command stacks, working first on just a certain set of basic on-board activities. This was a huge challenge.
As designed, Mars Express normally makes use of thousands of commands; for example, it takes up to 50 separate commands to simply take a single photo of Mars using the HRSC camera. Using the new, reduced command stacks would prove worthless if engineers couldn't actually do anything with the reduced command stacks.
Thus, making the solution work entails a massive amount of reprogramming to drastically reduce the number of commands needed to do anything on board. This work is what has kept the mission operations team on extended hours since November 2011.
Smiles all around
But, to everyone's delight, the solution is working and the team is substantially finished the work of converting thousands of commands to on-board procedures to be used much more efficiently than its designers had ever envisaged.
"We are confident that all the Mars Express instruments and systems can be commanded using the reduced command stacks," says Lakey.
"Now, we only need a few commands to capture an image and we can switch on and operate all the instruments at one time," he explains.
"We can proudly say that Mars Express is working properly again – and, with luck – the fuel left could last for another ten years."