Home > CSc-387 Parallel Processing

CHAPTER 11 : IMAGE PROCESSING ALGORITHMS

========================================

11.1. Low-level Image Processing

--------------------------------

- Gray scale: 0 (black) - 255 (white), pixels are represented in 8 bits

- Color images: (R)ed, (G)reen, (B)lue 8 bits each. 24 bits total

or pixel value may be an index to a look-up table which stores colors

- assume that the stored image uses a coordinate system as shown in:

(* T Figure 11.1 *)

- Low-level image operations: embarrassingly parallel

e.g. noise reduction, edge detection, object labelling, template matching

Computational Requirements: 1024x1024 pixels ==> 2^20 (~10^6) = 1 MBytes

--------------------------

suppose each pixel must be operated once, using a computer with

speed = 10^{-8} sec/oper, it takes 10 ms to process 2^20 pixels.

In real time, we need to process 60-85 frames/second (1 frame in 12-16 ms).

Typically, many high-complexity operations must be performed on the

image. Therefore, computational requirements are very high and often

special purpose hardware is used.

POINT PROCESSING:

-----------------

embarrassingly parallel.

THRESHOLDING: if (Xi < threshold) Xi=0; else Xi=1

CONTRAST STRETCHING: map the interval (x_l, x_h) =====> (X_L, X_H)

X_L ............................ X_H

\ \ /

\ \ /

\ \ /

\ \ /

x_l ....x_i.... x_h

x_i = (x_i - x_l)*[(X_H - X_L)/(x_h - x_l)] + X_L

GREY LEVEL REDUCTION: reduce the number of bits to represent the pixels

in order to reduce storage requirements

11.3 HISTOGRAM

============== a global operation (all pixels are used)

mainly used before a thresholding operation

(* T Figure 11.2 *)

for(i=0; i < height_max; i++)

for(j=0; j < width_max; j++) hist[p[i][j]]++ ;

To do it in parallel, processors locally update their histogram arrays

and then perform a Vector Global Sum (global sum for each vector entry).

or send it to a master processor for a global update.

11.4 SMOOTHING, SHARPENING, and NOISE REDUCTION

===============================================

- local operations which require the value of neighboring pixels

- Smoothing suppresses large fluctuations in intensity by reducing the

high frequency content

- Sharpening accentuates the transitions, enhancing the detail. Two ways:

reduce low-freq. content or accentuate changes through differentiation

- Noise reduction: suppresses noise through averaging and median filtering

Smoothing:

----------

new value for x4 (x4') is obtained by taking the average of 9 pixels

in the 3x3 region around x4:

(* T Figure 11.3 *)

new value for x4 can also be obtained by finding the MEDIAN of the

pixels. This obviously requires sorting 9 values. Efficient

approximate sorting strategies (i.e. shearsort) can be employed.

Parallel Code: suppose one processor is assigned for each pixel.

-------------- Then, each PE perfroms 4 communications with

its L, R, U, and D neighbors to get neighboring pixel values:

(* T Figure 11.4 *) (* T Figure 11.5 *)

After the values are obtained, new pixel values can be computed.

This is an embarrassingly parallel application and it is suitable

for synchronous computation in lock-step mode (SIMD type).

Weighted Masks

--------------

Operations described so far can be generalized in the form of a

"weighed mask operation" as shown in:

(* T Figure 11.7 *)

For example, mask for computing the mean is: (* T Figure 11.8 *)

Mask for noise reduction: (* T Figure 11.9 *)

A High-pass sharpening filter mask: (* T Figure 11.10 *)

===================

11.5 EDGE DETECTION

===================

Edge Detection using differentiation: (* T Figure 11.11 *)

Two important parameters to determine: (* T Figure 11.12 *)

Gradient (magnitude) = Df = SQRT[(df/dx)^2 + (df/dy)^2]

Gradient Direction = Theta(x,y) = Tan^{-1}[(df/dy) / (df/dx)]

- Gradient (magnitude) can be approximated to |df/dx| + |df/dy|

Edge Detection Masks

=====================

For discrete functions, the derivatives are approximated by

differences:

df/dx = (x5 - x3) df/dy = (x7 - x1)

Gradient = Df = |x5 - x3| + |x7 - x1|

Prewitt Operator:

-----------------

Better results can be obtained by using more pixel values.

For example:

(* T Figure 11.13 *)

df/dy = (x6 - x0) + (x7 - x1) + (x8 - x2)

df/dx = (x2 - x0) + (x5 - x3) + (x8 - x6)

Df = |df/dy| + |df/dx|

Sobel Operator:

--------------- (* T Figure 11.14 *)

df/dy = (x6 + 2x7 + x8) - (x0 + 2x1 + x2)

df/dx = (x2 + 2x5 + x8) - (x0 + 2x3 + x6)

Edge Detection Example (with Sobel operator): (* T Figure 11.15 *)

Laplace Operator: second-order derivative which can be approximated to:

-----------------

(* T Figure 11.16 *) (* T Figure 11.17 *)

D^2 f = 4.x4 - (x1+x3+x5+x7)

Notice that this computation will yield to zero for a constant change

in gray level across the mask.

- Usually, a threshold is applied after the edge operators to get

either black or white pixels:

(* T Figure 11.18 *) shows the result of Laplace Operator.

11.6 THE HOUGH TRANSFORM

========================

- Edge detection described so far identifies the pixels which might be

on an edge, but it does not tell much about how they connect to each

other to make an edge. Hough transform tries to do exactly that.

Line equation: y = ax + b b = -xa + y

- A single pixel at (x1, y1) has an infinite number of lines that

could pass through it.

- If a and b are discretized, a finite set of lines will exist.

(* T Figure 11.19 *)

- There will be a common point (S, T) in the parameter space (a-b)

into which all the points that lie on a specific line in the x-y

space will map. These pixels may also map onto other points in (a-b)

space, but (S, T) will get the most hits.

If we keep a count of these hits, then those (a, b) values with most

points mapped onto will be the best candidates for edges in the image.

- However, in order to make this method to work with

vertical/horizontal lines, we need to rewrite the line

equation in polar coordinates:

y = ax + b ============> (polar) r = xcos(A) + ysin(A)

where r is the perpendicular distance from the origin to the line,

and (A) is the angle between r and x-axis.

(* T Figure 11.20 *)

Implementation:

---------------

- r is always positive, (A) is between 0 - 360 degrees

- The parameter space is divided into small rectangular regions.

r is divided into increments of 5 and (A) increments of 10 degrees.

An accumulator is provided for each rectangular region:

(* T Figure 11.22 *)

- Accumulators of those regions that a pixel maps into are incremented.

Therefore, the computation time will depend upon the number of regions.

- If there are n pixels and k discrete values of (A) are tried,

then complexity is O(kn)

- To reduce the computational complexity, we might make k smaller.

If only the Gradient Angle at that pixel is used, then k=1.

In this case, time complexity for n pixels is O(n).

SEQUENTIAL CODE

---------------

for(x=0; x < Xmax; x++)

for(y=0; y < Ymax; y++) {

sobel(x, y, dx, dy);

magnitude = grad_mag(dx, dy);

if(magnitude > treshold) {

Theta = grad_dir(dx, dy); /* atan2() function */

Theta = Theta_quantize(Theta);

r = x * cos(Theta) + y * sin(Theta);

r = r_quantize(r);

acc[r][Theta]++; /* increment accumulator */

append(r, Theta, x, y); /* append point to line */

}

}

- In the end, the most likely lines are those with the most pixels,

and accumulators with local maxima are chosen.

PARALLEL CODE

-------------

Given

P: number of processors N*N : Image size

M1: number of discrete intervals used for r

M2: number of discrete intervals used for Theta

If [ (M1 > M2) and (P < N) and (P < M1) ]

Then a PARALLEL ALGORITHM with time complexity O(N^2/P + M1*M2*logM1)

can be developed to perform Hough Transform on the given image:

- Computation for each accumulator is independent of the others.

Image could be sliced into strips and assigned onto processors.

Each PE locally updates the accumulators corresponding to the pixels

in its own region. When done, accumulators must be merged.

Creative solutions could be adapted for the final merge step.

- One simple solution is that a Master PE could collect all the

results and add them up to form a Global Accumulator Matrix.

- Or PEs may use a Binary Tree approach for merging the local

matrices. This takes O(M1*M2*logP) where M1 and M2 are the number

of discrete intervals used for r and (Theta) respectively.

Hence M1*M2 is the size of the Accumulator Matrix.

After the matrices are merged, entries must be analyzed to find the

local maximas. This can also be done through sorting which takes

O(M1*M2*logM1*M2) = O(M1*M2*(logM1 + logM2))

- How about a special mergeANDsort operation in which processors try

to sort their accumulator values? Whenever they come across the

same accumulator index, they reduce them into one value, and

participate in the sort as one value instead of two separate values.

- Indeed, this can be a nice term project for those who are interested!

11.7.2 DISCRETE FOURIER TRANSFORM (DFT)

=======================================

Given N real discrete values x0, x1, ..., x_(N-1), DFT is computed as:

X_k = 1/N * SUM_{j=0}^{N-1} xj * e^{-2*PI*ijk/N} for k = 0,1,...N-1

Inverse DFT is:

x_k = SUM_{j=0}^{N-1} Xj * e^{2*PI*ijk/N} for k = 0,1,...N-1

- X_k s are complex values whereas xk's are real.

- Note: by definition, e^{iA} = cosA + i*sinA

- Sometimes the DFT is given without the scale factor 1/N

ONE APPLICATION: Earlier, Frequency Filtering was achieved with

weighted masks, which can be described by the CONVOLUTION operation:

h(j, k) = g(j, k) * f(j, k)

where g(j, k) describes the weighted mask and f(j, k) describes the image.

The CONVOLUTION of two functions can be obtained by taking the Fourier

transforms of each function (H(j,k), G(j,k), F(j,k)) and multiplying them:

H(j, k) = G(j, k) * F(j, k) (* T Figure 11.24 *)

- Doing it this way does not reduce the time complexity but working in

Frequency domain has other advantages, e.g. eliminating Very High

and Very Low frequencies, etc.

IMPLEMENTATION of DFT:

======================

- Sequential Time Complexity for computing DFT of N values: O(N^2)

w = e^{-2*PI*i/N};

for (k=0; k < N; k++) /* for every point */

X[k] = 0;

for (j=0; j < N; j++) /* compute summation */

X[k] = X[k} + w^{j*k} * x[j] ;

- Note that the summation step requires complex number arithmetic.

Since w^{(j-1)*k}*w^k = w^{j*k}, the code could be rewritten as:

w = e^{-2*PI*i/N};

for (k=0; k < N; k++) { /* for every point */

X[k] = 0; a = 1;

for (j=0; j < N; j++) { /* compute summation */

X[k] = X[k} + a * x[j] ;

a = a * w^k ;

}

}

PARALLEL IMPLEMENTATION of DFT:

--------------------------------

- N DFTs and N processors. Each PE computes one DFT. Pi computes Xi

- Each PE needs a copy of N discrete values x0, x1, ..., x_(N-1)

- Therefore, Tpar = O(N) with N processors

- If P << N, then each PE computes N/P DFTs.

FAST FOURIER TRANSFORM (FFT)

============================

- FFT is a fast method of obtaining DFT

- has a time complexity of O(NlogN) instead of O(N^2)

- In the following, it will be assumed that N is a power of 2.

- It can be shown that, DFT for Xk can be rewritten as:

Xk = 1/2 [ X_even + w^k * X_odd ] for k = 0,1, ..., N-1

X_{k+N/2} = 1/2 [ X_even + w^{k+N/2} * X_odd ]

= 1/2 [ X_even - w^k * X_odd ]

(because w^{N/2} = -1)

where X_even is the N/2 point DFT of the numbers with even indices

(x0, x2, x4, .....) and X_odd is the N/2 point DFT of the numbers

with odd indices (x1, x3, x5, .....).

(* T Figure 11.28 *)

- Using the above formula in a recursive manner, we can further

represent X_even and X_odd in terms of 2 summations each.

This yields a recursive divide-and-conquer approach to computing FFT:

(* T Figure 11.29 *)

(* T Figure 11.30 *)

(* T Figure 11.31 *)

SEQUENTAL IMPLEMENTATION of FFT:

--------------------------------

There are logN phases in the computation of FFT (see Figure 11.31)

each phase taking O(N) time. Therefore,

Tseq = O(NlogN)

PARALLEL IMPLEMENTATION of FFT:

--------------------------------

- N FFTs and N processors. Each PE computes one FFT.

- Pi computes Xi

- Unlike the DFT algorithm, PEs do not need to access all of the N

discrete values x0, x1, ..., x_(N-1) at once. At each step PEs

need only two values to compute (one stored locally another one

received from the neighbor processor which differs in only one bit.

- Processors run a Binary Exchange Algorithm in which they interact

(send/receive) with those processors who differ in exacly one

bit. At each phase the differing bit changes moving from left to

right.

If P=N then Tpar = O(logN) (* Linear speedup *)

- If P << N, then each PE computes N/P FFTs. After logP steps, they

can find the data they need locally. In this case,

Tpar = O(NlogN/P) (* Linear speedup *)

- PA 551, Professor Stipak
- cache:v3fyc5rlasij:www.edu.cn - Google Search
- cache:vrifemoq2-wj:www.edu.cn - Google Search
- It is forty years since Ken Kesey's One Flew Over the Cuckoo's Nest first came out
- BAB I PELATIHAN PENELITIAN TINDAKAN KELAS BAGI GURU SMP DALAM PEMBELAJARAN MENGGUNAKAN PENDEKATAN CTL Oleh Roswita Lumban Tobing, M.Hum Dwiyanto Djoko Prano
- cache:o6CfBK-TZPsJ.gslk?8835168:www.edu.cn - Google Search
- cache:ncn8up6c-eij:www.edu.cn - Google Search
- MASTER IN INTERNATIONAL ECONOMIC LAW AND POLICY (IELPO)
- THE DEVELOPMENT OF NEURAL NETWORK
- Università di Messina
- Prolonged suckling period in organic piglet production – Effects on selected immunological parameters
- cache:nozcfh-ysfqj:www.edu.cn - Google Search
- A Quantitative Analysis of Critical Factors Encompassing Economic Growth
- Recursion
- Complex seismic trace analysis and its application to time-lapse seismic surveys
- CX3002
- cache:y1OCyEyrIJ:www.edu.cn - Google Search
- China Net Assessment
- aa aa ab ad ae ag ah ai al am an ar as at aw ax ay ba be bi bo by da de do ed ef eh el em en er es et ex fa go ha he hi hm ho id if in is it jo ka la li
- Introduction
- Test #1

- refactoryepilepsy
- ARIZONA
- biocarrier
- Soil plant ecological system
- Cache
- adametz
- video quality
- inducing differentiation
- Buku ips kelas 8
- Similar solutions
- Ultrasound.html?1587723
- cementoblast
- Upconversion
- biocompatibility
- LY294002
- Public key factor
- ppt.html%3fthread-apsodx
- luciferase
- Fuzzy System
- Monitoring and control .
- complicity

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#downhi.com