7.1 Design Applications of Risk Management

 

 

7.1.1 The Space Shuttle Orbiter Control Computers

 

• The space shuttle uses 5 computers for flight control. The first 4 run a primary flight control system. The fifth computer runs a separate flight control program, and is only used in the most dire emergencies. The 4 redundant systems will operate separately, then compare outputs. These should be identical, but in the event of disagreement, they can vote a conflicting system out.

 

 

7.1.2 A Mobile Service Robot for the Space Station

 

• We can see a figure depicting the SPDM for the planned space station.

*********** Include Robot Arm figure

 

 

 

• All discussions in this section are based on the space station manipulator as described in SSP30000.

 

• The basic functions (at PMC) are classified as,

Category 1 - requires tolerance for two consecutive failures in each system - fail safe/fail operational - basically required 1 prime + 1 redundant + 1 backup

Category 2 - requires tolerance for one failure in each system - failure tolerant - typically requires 1 prime + 1 backup

Category 2S - requires tolerance for one failure in the system - fail operational

 

• Examples of equipment in the different categories are,

• Category 1 - The orbiter is a time critical system

• Category 2 - MBS

• Category 2S - Safety monitoring and emergency control systems

 

• Recall the following hazard levels, also consider the control requirements,

 

 

• For the manipulator (SSRMS) hazards include,

• Criticality 1

- payload released without command

- possible collision

- payload cannot be released

- orbiter stuck to space station via SSRMS

- orbiter collides with space station because of failed capture (docking with SSRMS).

- motion of arm without command

- possible collisions

- no motion in arm in response to command

- orbiter stuck to space station via SSRMS.

 

• Dealing with failures,

- Criticality 1

- all functions must be safed within 250 ms of occurrence of fault

- Criticality 2

- report as occurs

- side effects are

- can’t report critical failure

- can’t safe a system

- can’t implement alternate operation

 

• isolation - we want to estimate the % failures that are prevented from reaching a specific module. Typically these values are,

95% isolated through ORU

90% isolated by online bits

5% maximum false error indication rate

 

• MSS Failure Tolerance Concept [Brimley]

 

 

• Failure Detection and Isolation Coverage Scheme [Brimley]

 

 

• MSS Failure Management Functional Interfaces [Brimley]

 

 

• Layered defense approach for Detection of Sensor Data Failures [Brimley]

 

 

 

 

• Failure tolerance

- fault tolerance

- single failure tolerant

- two failure tolerant for orbiter

- provide drive (EVA) for joint and LEE latch mechanisms

 

• Reconfigurations

- alternate data path/transmission

- reconfiguration time less than 271 seconds

 

• The purpose for these measures

- when the failure occurs, the software, and hardware engineers must know what their systems are to do. This is the best way to get all to agree.

 

• operation failure of computational units may include,

- invoking off-line bit checks with error checking algorithms

- operator visual inspections via cameras, etc.

- analysis of units memory through data dumps, etc.

- ground support failure isolation analysis

- exercising equipment with known algorithms

 

• Note: in the case of SSRMS the operator may use EVA units to move the arm away from contact.

 

• Operators may always elect to replace failed units, if extras available.

 

• A diagram of the MSS Failure Management Concept is shown below, [Brimley]. This depicts a scheme for dealing with faults once they are detected. Some of the acronyms used are,

FD - Failure Detection

FI - Failure Isolation

C&W - Caution & Warning

CRIT - Failure Criticality

EVA - Extra Vehicular Activity

BIT - Built-In-Test

 

 

• There is also a scheme for estimating when a system has erred. This is based on a bottom up approach where the checks for errors are made in the specific modules, and then error reports are propagated up to the high level software/hardware. The diagram below depicts the system used in the SSRM.