22. Vision Systems


• Vision systems are suited to applications where simpler sensors do not work.


• Typical components in a modern vision system.



22.1 Applications


• An example of a common vision system application is given below. The basic operation involves a belt that carries pop (soda) bottles along. As these bottles pass an optical sensor, it triggers a vision system to do a comparison. The system compares the captured image to stored images of acceptable bottles (with no foreign objects or cracks). If the bottle differs from the acceptable images beyond an acceptable margin, then a piston is fired to eject the bottle. (Note: without a separate sensor, timing for the piston firing is required). Here a PLC is used as a common industrial solution controller. - All of this equipment is available off-the-shelf ($10K-$20K). In this case the object lighting, backgrounds and contrast would be very important.




22.2 Lighting and Scenes


• There are certain features that are considered important in images,

- boundary edges

- surface texture/pattern

- colors

- etc


• Boundary edges are used when trying to determine object identity/location/orientation. This requires a high contrast between object and background so that the edges are obvious.


• Surface texture/pattern can be used to verify various features, for example - are numbered buttons in a telephone keypad in the correct positions? Some visually significant features must be present.


• Lighting,

- multiple light sources can reduce shadows (structured lighting).

- back lighting with luminescent screens can provide good contrast.

- lighting positions can reduce specular reflections (light diffusers help).

- artificial light sources provide repeatability required by vision systems that is not possible without natural light sources.


22.3 Cameras


• Cameras use available light from a scene.


• The light passes through a lens that focuses the beams on a plane inside the camera. The focal distance of the lens can be moved toward/away from the plane in the camera as the scene is moved towards/away.


• An iris may also be used to mechanically reduce the amount of light when the intensity is too high.


• The plane inside the camera that the light is focussed on can read the light a number of ways, but basically the camera scans the plane in a raster pattern.


• An electron gun video camera is shown below. - The tube works like a standard CRT, the electron beam is generated by heating a cathode to eject electrons, and applying a potential between the anode and cathode to accelerate the electrons off of the cathode. The focussing/deflecting coils can focus the beam using a similar potential change, or deflect the beam using a differential potential. The significant effect occurs at the front of the tube. The beam is scanned over the front. Where the beam is incident it will cause electrons to jump between the plates proportional to the light intensity at that point. The scanning occurs in a raster pattern, scanning many lines left to right, top to bottom. The pattern is repeated some number of times a second - the typical refresh rate is on the order of 30Hz



• Charge Coupled Device (CCD) - This is a newer solid state video capture technique. An array of cells are laid out on a semiconductor chip. A grid like array of conductors and insulators is used to move a collection of charge through the device. As the charge moves, it sweeps across the picture. As photons strike the semiconductor they knock an electron out of orbit, creating a negative and positive charge. The positive charges are then accumulated to determine light intensity. The mechanism for a single scan line is seen below.





• Color video cameras simply use colored filters to screen light before it strikes a pixel. For an RGB scan, each color is scanned 3 times.



22.4 Frame Grabber


• A simple frame grabber is pictured below,



• These items can be purchased for reasonable prices, and will become standard computer components in the near future.



22.5 Image PreProcessing


• Images are basically a set of pixels that are often less than a perfect image representation. By preprocessing, some unwanted variations/noise can be reduced, and desired features enhanced.


• Some sources of image variation/noise,

- electronic noise - this can be reduced by designing for a higher Signal to Noise Ratio (SNR).

- lighting variations cause inconsistent lighting across an image.

- equipment defects - these cause artifacts that are always present, such as stripes, or pixels stuck off or on.



22.6 Filtering


• Filtering techniques can be applied,

- thresholding

- laplace filtering

- fourier filters

- convolution

- histograms

- neighborhood averaging



22.6.1 Thresholding


• Thresholding basically sets a transition value. If a pixel is above the threshold, it is switched fully on, if it is below, it is turned fully off.




22.7 Edge Detection


• An image (already filtered) can be checked to find a sharp edge between the foreground and background intensities.


• Let’s assume that the image below has been prefiltered into foreground (1) and background (0). An edge detection step is then performed.




22.8 Segmentation


• An image can be broken into regions that can then be used for later calculations. In effect this method looks for different self contained regions, and uses region numbers instead of pixel intensities.



• A simple segmentation algorithm might be,

1. Threshold image to have values of 1 and 0.

2. Create a segmented image and fill it with zeros (set segment number variable to one).

3. Scanning the old image left to right, top to bottom.

4. If a pixel value of 1 is found, and the pixel is 0 in the segmented image, do a flood fill for the pixel onto the new image using segment number variable.

5. Increment segment # and go back to step 3.

6. Scan the segmented image left to right, top to bottom.

7. If a pixel is found to be fully contained in any segment, flood fill it with a new segment as in steps 4 and 5.



22.8.1 Segment Mass Properties


• When objects are rotated in the vision plane it may become difficult to use simple measures to tell them apart. At this point global attributes, such as perimeter lengths, length/width ratios, or areas can be used.


• The centroid of a mass can be determined with the expression for the x direction (y is identical)



• Area is simply the sum of all pixels in the segment,



• Perimeter is the number of pixels that can be counted around the outside of an object.



• Compactness can be a measure of mass distribution,



• Another measure of mass distribution is thickness,




22.9 Recognition


22.9.1 Form Fitting


• It can sometimes help to relate a shape to some other geometric primitive using compactness, perimeter, area, etc.

- ellipse

- square

- circle

- rectangle



22.9.2 Decision Trees


• In the event that a very limited number of parts is considered, a decision tree can be used. The tree should start with the most significant features first, then eventually make decisions on the least significant. Typical factors considered are,

- area

- hole area

- perimeter

- maximum, minimum and average radius

- compactness


• An example of a decision tree is given below. (Note: this can be easily implemented with if-then rules or Boolean equations)




Bar Codes


• Bar codes are a common way to encode numbers, and sometimes letters.


• The code is sequential left to right, and is characterized by bars and spaces of varied widths. The bar widths corresponds to a numerical digits. These are then encoded into ASCII characters.


• To remain noise resistant there are unused codes in the numerical sequence. If any value scanned is one of the unused values the scan is determined to be invalid.


• There are different encoding schemes.

Code 39/Codabar - these use bars of two different widths for binary encoding

Code 128 - these use different bar widths uses proportional widths to encode a range of values

UPC (Universal Product Code) -

EAN (European Article Numbering) -


• The example below shows how a number is encoded with a bar code.




22.10 Practice Problems


1. Consider a circle and an ellipse that might be viewed by a vision system. The circle has a 4” radius, whereas the ellipse has a minor and major radius of 2” and 4”. Compare the two definitions using form factors (compactness and thickness) and show how they differ.



2. Describe image resolution in vision systems.


ans. Resolution of a video image describes the number of rows and columns of pixels in a video image. A higher resolution means that there are more rows of pixels in the images, and therefore we can distinguish smaller details.


3. An image has been captured from a video camera, and stored in the matrix below.



a) Use a threshold of 100 to filter the image.



b) Perform an edge detection on the thresholded image.



c) Segment the image into distinct regions.



d) Calculate the compactness and thickness for the region above the threshold.



e) Calculate form factors including perimeter, area, centroid, compactness and minimum and maximum thickness.


4. We have four part shapes (as listed below) that will be arriving on a conveyor. We want to develop a decision tree for the vision system to tell them apart. We also need to find their centroids relative to the top left of the image so that a robot may pick them up.

Isosceles triangle 6” per side

Rectangle 2” by 8”

Triangle with side lengths 8”, 5” and 4”

Circle 5” Radius