2011-11-11

Download Link : Appliations

Chapter 1. Introduction

Introduction to neural networks

1.1 What is a Neural Network?

An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well.

1.2 Why use neural networks?

Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an “expert” in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer “what if” questions.

Other advantages include:

Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.

Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.

Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.

Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage.

1.3 Neural networks versus conventional computers

Neural networks take a different approach to problem solving than that of conventional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem. Unless the specific steps that the computer needs to follow are known the computer cannot solve the problem. That restricts the problem solving capability of conventional computers to problems that we already understand and know how to solve. But computers would be so much more useful if they could do things that we don’t exactly know how to do.

Neural networks process information in a similar way the human brain does. The network is composed of a large number of highly interconnected processing elements (neurons) working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The examples must be selected carefully otherwise useful time is wasted or even worse the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable.

On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem is to solved must be known and stated in small unambiguous instructions. These instructions are then converted to a high level language program and then into machine code that the computer can understand. These machines are totally predictable; if anything goes wrong is due to a software or hardware fault.

Neural networks and conventional algorithmic computers are not in competition but complement each other. There are tasks are more suited to an algorithmic approach like arithmetic operations and tasks that are more suited to neural networks. Even more, a large number of tasks, require systems that use a combination of the two approaches (normally a conventional computer is used to supervise the neural network) in order to perform at maximum efficiency.

Neural networks do not perform miracles. But if used sensibly they can produce some amazing results.

2.1 Co Evolution of Neural Networks for Control of Pursuit & Evasion

The following MPEG movie sequences illustrate behavior generated by dynamical recurrent neural network controllers co-evolved for pursuit and evasion capabilities. From an initial population of random network designs, successful designs in each generation are selected for reproduction with recombination, mutation, and gene duplication. Selection is based on measures of how well each controller performs in a number of pursuit-evasion contests. In each contest a pursuer controller and an evader controller are pitched against each other, controlling simple “visually guided” 2-dimensional autonomous virtual agents. Both the pursuer and the evader have limited amounts of energy, which is used up in movement, so they have to evolve to move economically. Each contest results in a time-series of position and orientation data for the two agents.

These time-series are then fed into a custom 3-D movie generator. It is important to note that, although the chase behaviors are genuine data, the 3D structures, surface physics, and shading are all purely for illustrative effect.

1. The pursuer is not very good at pursuing, and the evader is not very good at evading.

2. Pursuer chases evader, but soon runs out of energy, allowing the evader to escape.

3. Pursuer chases evader, but uses up all its energy just before the evader runs out of energy.

4. After a couple of close shaves, the pursuer finally catches the evader.

2.2

Learning the Distribution of Object Trajectories for Event Recognition

This research work is about the modeling of object behaviors using detailed, learnt statistical models. The techniques being developed will allow models of characteristic object behaviors to be learnt from the continuous observation of long image sequences. It is hoped that these models of characteristic behaviors will have a number of uses, particularly in automated surveillance and event recognition, allowing the surveillance problem to be approached from a lower level, without the need for high-level scene/behavioral knowledge. Other possible uses include the random generation of realistic looking object behavior for use in Virtual Reality, and long-term prediction of object behaviors to aid occlusion reasoning in object tracking.

1. The model is learnt in an unsupervised manner by tracking objects over long image sequences, and is based on a combination of a neural network implementing Vector Quantization and a type of neuron with short-term memory capabilities.

1. Learning mode

2. Models of the trajectories of pedestrians have been generated and used to assess the typicality of new trajectories (allowing the identification of `incidents of interest’ within the scene), predict future object trajectories, and randomly generate new trajectories.

2. Predict mode

2.3

Radiosity for Virtual Reality Systems (ROVER)

The synthesis of actual and computer generated photo-realistic images has been the aim of artists and graphic designers for many decades. Some of the most realistic images (see Graphics Gallery – simulated steel mill) were generated using radiosity techniques. Unlike ray tracing, radiosity models the actual interaction between the lights and the environment. In photo realistic Virtual Reality (VR) environments, the need for quick feedback based on user actions is crucial. It is generally recognised that traditional implementation of radiosity is computationally very expensive and therefore not feasible for use in VR systems where practical data sets are of huge complexity. In the original thesis, we introduce two new methods and several hybrid techniques to the radiosity research community on using radiosity in VR applications.

On the left column, flyby, walkthrough and a virtual space are first introduced and on the left. On the right, we showcase one of the two novel methods which were proposed using Neural Network technology.

Introduction to Flyby, Walkthrough and Virtual Space

Flyby

3D Walkthrough

Virtual Space

(A) ROVER Learning from Examples

Sequence 1

Sequence 5

Sequence 8

(B) ROVER Modeling

(C) ROVER Prediction

2.4

Autonomous Walker & Swimming Eel

(A) The research in this area involves combining biology, mechanical engineering and information technology in order to develop the techniques necessary to build a dynamically stable legged vehicle controlled by a neural network. This would incorporate command signals, sensory feedback and reflex circuitry in order to produce the desired movement.

Walker

(B) Simulation of the swimming lamprey (eel-like sea creature), driven by a neural network.

Swimming Lamprey

2.5

Robocup: Robot World Cup

The RoboCup Competition pits robots (real and virtual) against each other in a simulated soccer tournament. The aim of the RoboCup competition is to foster an interdisciplinary approach to robotics and agent-based AI by presenting a domain that requires large-scale coorperation and coordination in a dynamic, noisy, complex environment.

RoboCup has three different leagues to-date. The Small and Middle-Size Leagues involved physical robots; the Simulation League is for virtual, synthetic teams. This work focus on building softbots for the Simulation League.

Machine Learning for Robocup involves:

The training of player in the process of making the decision of whether (a) to dribble the ball; (b) to pass it on to another team-mate; (c) to shoot into the net.

The training of the goalkeeper in process of intelligent guessing of how the ball is going to be kick by the opponents. Complexities arise when one opponent decides to pass the ball to another player instead of attempting a score.

Evolution of a co-operative and perhaps unpredictable team.

Common AI methods used are variants of Neural Networks and Genetic Algorithms.

KRDL Soccer Softbots (3.1mb, AVI)

2.6

Using HMM’s for Audio-to-Visual Conversion

One emerging application which exploits the correlation between audio and video is speech-driven facial animation. The goal of speech-driven facial animation is to synthesize realistic video sequences from acoustic speech. Much of the previous research has implemented this audio-to-visual conversion strategy with existing techniques such as vector quantization and neural networks. Here, they examine how this conversion process can be accomplished with hidden Markov models (HMM).

(A) Tracking Demo: The parabolic contour is fit to each frame of the video sequence using a modified deformable template algorithm. The height between the two contours, and the width between the corners of the mouth can be extracted from the templates to form our visual parameter sets.

Tracking

(B) Morphing Demo: Another important piece of the speech-driven facial animation system is a visual synthesis module. Here we are attempting to synthesize the word “wow” from a single image. Each frame in the video sequence is morphed from the first frame shown below. The parameters used to morph these images were obtained by hand.

Morphing

2.7 Artificial Life: Galapagos

Galapagos is a fantastic and dangerous place where up and down have no meaning, where rivers of iridescent acid and high-energy laser mines are beautiful but deadly artifacts of some other time. Through spatially twisted puzzles and bewildering cyber-landscapes, the artificial creature called Mendel struggles to survive, and you must help him.

Mendel is a synthetic organism that can sense infrared radiation and tactile stimulus. His mind is an advanced adaptive controller featuring Non-stationary Entropic Reduction Mapping — a new form of artificial life technology developed by Anark. He can learn like your dog, he can adapt to hostile environments like a cockroach, but he can’t solve the puzzles that prevent his escape from Galapagos.

Galapagos features rich, 3D texture-mapped worlds, with continuous-motion graphics and 6 degrees of freedom. Dramatic camera movement and incredible lighting effects make your passage through Galapagos breathtaking. Explosions and other chilling effects will make you fear for your synthetic friend. Active panning 3D stereo sound will draw you into the exotic worlds of Galapagos.

Galapagos

2.8 Speechreading (Lipreading)

As part of the research program Neuroinformatik the IPVR develops a neural speechreading system as part of a user interface for a workstation. The three main parts of the system include a face tracker (done by Marco Sommerau), lip modeling and speech processing (done by Michael Vogt) and the development and application of SNNS for neural network training (done by Günter Mamier).

Automatic speechreading is based on a robust lip image analysis. In this approach, no special illumination or lip make-up is used. The analysis is based on true color video images. The system allows for realtime tracking and storage of the lip region and robust off-line lip model matching. The proposed model is based on cubic outline curves. A neural classifier detects visibility of teeth edges and other attributes. At this stage of the approach the edge between the closed lips is automatically modeled if applicable, based on a neural network’s decision.

To achieve high flexibility during lip-model development, a model description language has been defined and implemented. The language allows the definition of edge models (in general) based on knots and edge functions. Inner model forces stabilize the overall model shape. User defined image processing functions may be applied along the model edges. These functions and the inner forces contribute to an overall energy function. Adaptation of the model is done by gradient descent or simulated annealing like algorithms. The figure shows one configuration of the lip model, consisting of an upper lip edge and a lower lip edge. The model edges are defined by Bezier-functions. Outer control knots stabilize the position of the corners of the mouth.

Fig 2.8.1 The model interpreter enables a permanent measurement of model knot positions and color blends along model edges during adaptation to an utterance. The resulting parameters may be used for speech recognition tasks in further steps.

Lipread

2.9 Detection and Tracking of Moving Targets

The moving target detection and track methods here are “track before detect” methods. They correlate sensor data versus time and location, based on the nature of actual tracks. The track statistics are “learned” based on artificial neural network (ANN) training with prior real or simulated data. Effects of different clutter backgrounds are partially compensated based on space-time-adaptive processing of the sensor inputs, and further compensated based on the ANN training. Specific processing structures are adapted to the target track statistics and sensor characteristics of interest. Fusion of data over multiple wavelengths and sensors is also supported.

Compared to conventional fixed matched filter techniques, these methods have been shown to reduce false alarm rates by up to a factor of 1000 based on simulated SBIRS data for very weak ICBM targets against cloud and nuclear backgrounds, with photon, quantization, and thermal noise, and sensor jitter included. Examples of the backgrounds, and processing results, are given below.

The methods are designed to overcome the weaknesses of other advanced track-before-detect methods, such as 3+-D (space, time, etc.) matched filtering, dynamic programming (DP), and multi-hypothesis tracking (MHT). Loosely speaking, 3+-D matched filtering requires too many filters in practice for long-term track correlation; DP cannot realistically exploit the non-Markovian nature of real tracks, and strong targets mask out weak targets; and MHT cannot support the low pre-detection thresholds required for very weak targets in high clutter. They have developed and tested versions of the above (and other) methods in their research, as well as Kalman-filter probabilistic data association (KF/PDA) methods, which they use for post-detection tracking.

Space-time-adaptive methods are used to deal with correlated, non-stationary, non-Gaussian clutter, followed by a multi-stage filter sequence and soft-thresholding units that combine current and prior sensor data, plus feed back of prior outputs, to estimate the probability of target presence. The details are optimized by adaptive “training” over very large data sets, and special methods are used to maximize the efficiency of this training.

Figure 2.9 (a) Raw input backgrounds with weak targets included,

(b) Detected target sequence at the ANN processing output,

post-detection tracking not included. Video Clip

2.10 Real-time Target Identification for Security Applications

The system localizes and tracks peoples’ faces as they move through a scene. It integrates the following techniques:

Motion detection

Tracking people based upon motion

Tracking faces using an appearance model

Faces are tracked robustly by integrating motion and model-based tracking.

(A) Tracking in low resolution and poor lighting conditions

Jon

(B) Tracking two people simultaneously: lock is maintained on the faces despite unreliable motion-based body tracking.

Double Tracking

2.11

Facial Animation

Facial animations created using hierarchical B-spline as the underlying surface representation. Neural networks could be use for learning of each variation in the face expressions for animated sequences.

The (mask) model was created in SoftImage, and is an early prototype for the character “Mouse” in the YTV/ABC televisions series “ReBoot” (They do not use hierarchical splines for Reboot!). The original standard bicubic B-spline was imported to the “Dragon” editor and a hierarchy automatically constructed. The surface was attached to a jaw to allow it to open and close the mouth. Groups of control vertices were then moved around to created various facial expressions. Three of these expressions were chosen as key shapes, the spline surface was exported back to SoftImage, and the key shapes were interpolated to create the final animation.

Mask

Haida

2.12

Artificial Life for Graphics, Animation, Multimedia, and Virtual Reality

<p

Show more