The First Bug on Mars

In 1971, the USSR delivered the first planetary rovers on skis to Mars, whose task was to puncture the surface with a rod (housing a dynamic penetrometer and a radiation densitometer) to see if Mars was solid or liquid dusty. The first probe crashed on November 27; the second soft-landed on December 2 but didn’t manage to get out of the “shell” of the lander, so that attempt didn’t count.

image1

Image from sci-fi film “The Martian”. The main character is carrying the Sojourner rover

Note. This article was originally published in Russian on habrahabr.ru. The original and translated versions are posted on our website with the permission of the author.

25 years later

On July 4, 1997, the U.S. probe arrived at Mars and brought a “sojourner” with the first bug.

The mission was at risk, but the powerful debugging functionality provided by the operating system, and professionalism of the programmers back on Earth (the guys did know their subject) enabled NASA to fix the bug in a short time.

Sojourner

image2

The mission’s cost was relatively small — $265 million.

The rover operated for 83 sols.

The rover’s name, “Sojourner”, originates from the Bible, where it means “traveler”, and was selected in an essay contest won by V. Ambroise, a 12-year-old from U.S. state of Connecticut. It is named for abolitionist and women’s rights activist Sojourner Truth.

image3

Mission results:

  • 2.3 billion bits of information
  • 16,500 images taken by the lander
  • 550 images taken by the rover
  • 15 chemical analyses of rocks and soil
  • plenty of meteorological data
  • food for thought for software testers

Priority inversion

Priority inversion occurs when two or more threads with different priorities start competing for CPU resources.

image4

The lander was carrying a radiation-hardened IBM Risc 6000 Single Chip (Rad6000 SC) 20 MIPS CPU with 128 Mbytes of RAM and 6 Mbytes of EEPROM. The operating system used was VxWorks.

image5

The rover employed a 0.1 MIPS Intel 80C85 CPU with 512 Kbytes of RAM and 176 Kbyte of flash memory solid-state storage.

image7

Three tasks with different priorities waiting around on the 1553 bus.

When collecting meteorological data, the system hung and started to reset repeatedly. The engineers on Earth ran a duplicate of the software and got down to work figuring out what was wrong. After 18 hours of studying detailed logs, they found the cause of the malfunction.

image8

image9

They only had to fix a couple of mutex flags.

How the bug was fixed

No, we did not use the vxWorks shell to change the software (although the shell is usable on the spacecraft). The process of “patching” the software on the spacecraft is a specialized process. It involves sending the differences between what you have onboard and what you want (and have on Earth) to the spacecraft. Custom software on the spacecraft (with a whole bunch of validation) modifies the onboard copy. If you want more info you can send me email.

— Glenn Reeves, team leader of Mars Pathfinder software developer team

Those interested in details were invited to email the software author at glenn.e.reeves@jpl.nasa.gov.

How the patch was uploaded?VxWorks contained a C language interpreter to execute statements on the fly during debugging. The JPL engineers decided to launch the spacecraft with this feature still enabled. A short C program was uploaded to the spacecraft, which when interpreted, changed the values of the mutex flag for priority inheritance from false to true. No more system reset occurred!

image10

Glenn Reeves, the engineer who found and fixed the bug, with a Mars Pathfinder duplicate in the background

The bug was found in preflight testing on Earth but was given a low priority.

Details

image11

A presentation by a Chinese expert

http://www.slideshare.net/jserv/priority-inversion-30367388

Conclusion

Glenn Reeves is very thankful to the engineers at Wind River for developing an operating system that enabled remote debugging even in emergency conditions like those that occurred during the mission. Interestingly, the bug was known to the engineer team, but there are “deadlines” and “priorities” that force mission leaders to launch spacecraft, being aware of unfixed “weak spots”.

By Aleksey Statsenko

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s