个人资料Cosmos's Academic Space照片日志列表 工具 帮助
1月4日

Strike a Pose: Tracking People by Finding Stylized Poses

Pictorial Structure; CVPR05
 
Initialize by detecting stylized poses (lateral walking), then learn the appearance of limbs.
 
Configurationg using pictorial structure recognition method (dynamic programming)
 
Tracking
 
website:
1月3日

Shape Contexts

 
Define of shape context, using it in matching and correspondence problem.
 

Recovering 3D Human pose from monocular Images

Triggers, PAMI 05
 
Using 60-D shape context as descriptor of silhouette, reduce the dimension of the problem by PCA, the silhouette is encoded as 100-D vector in the end, it is then mapped to (labling) 55-D joint angle data, (this mapping is using ground truth data, the image is created by computer graphics). Use the labeled data train the learning machine (RVM).
 
 Model based vs Model free

We consider the problem of estimating and tracking 3D

configurations of complex articulated objects from monocular

images, e.g. for applications requiring 3D human body pose

and hand gesture analysis. There are two main schools of

thought on this.

Model-based approaches presuppose an explicitly

known parametric body model, and estimate the pose

either by directly inverting the kinematics (which has many

possible solutions and requires known image positions for each

body part) [28], or by numerically optimizing some form of

model-image correspondence metric over the pose variables,

using a forward rendering model to predict the images (which

is expensive and requires a good initialization, and the problem

always has many local minima [25]). An important subcase

is model-based tracking, which focuses on tracking the

pose estimate from one time step to the next starting from

a known initialization, based on an approximate dynamical

model [9, 23]. In contrast,

learning based approaches try to

avoid the need for explicit initialization and accurate 3D

modelling and rendering, and to capitalize on the fact that

the set of

typical human poses is far smaller than the set of

kinematically possible ones, by estimating (learning) a model

that directly recovers pose estimates from observable image

quantities. In particular,

example based methods explicitly

store a set of training examples whose 3D poses are known,

and estimate pose by searching for training image(s) similar

to the given input image, and interpolating from their poses

[5,18,22,27].

In this paper we take a learning based approach,

modeling People and Human Interaction

Overview

This is the second workshop of a series on "Modeling People". This new edition is  extended in scope to include models of Human ("Social") Interactions. Given the increasing capacity to acquire perceptual data with cameras and other sensors, and the number of applications, there is growing interest in several areas related to people modeling, ranging from the recognition of human dynamic behaviour, studied in both the computer vision and graphics communities, to the detection of human communication patterns in social settings, relevant to vision and other fields.

The aim of this workshop is to provide a focus for computer vision research into the modelling and understanding of people, their actions, and their interactions with other people. The intention is to bring together leading researchers in vision, graphics, and cognitive sciences to assess the current state-of-the-art, identify known problems and generate ideas for future progress towards automatic modelling and interpretation of people's movements, actions, and interactions. Given the scope of ICCV, we expect the participation of computer vision researchers investigating these areas . However, given the multimodal nature of human interaction, we also encourage the participation of researchers in related fields (audio and speech, linguistics, behavioral science, and wearable computing) who can contribute their views.

 

We invite manuscript submissions on the following topics:

  • Photo-realistic 3D modelling of people
  • Vision-based human motion capture
  • Kinematic, dynamic and statistical models of human motion
  • Human motion synthesis and retargeting of motion capture
  • Modeling human action for animation, recognition and imitation
  • Detecting people in images and videos
  • Recognition of non-verbal cues in conversations: expressions, body postures, and gestures.
  • Detection and tracking of focus-of-attention in conversations.
  • Analysis of speaker turn patterns.
  • Models that link language and non-verbal cues.
  • Recognition of social interaction patterns.
  • Analysis of the dynamic properties of social rapport.
  • Mining social networks from multi-sensor data


 

1月2日

Inference of 3D posture by classification of 3D human body shape

by Cohen etc, ICCV 03
 
Using tromogrophy like method to get scanned histogram, using SVM to do classification of posture.
12月30日

Real-time human stress monitoring system

cvpr 2005, multi-sensing bayesian network
12月22日

Background Substraction

This is a first experiement using OpenCV, the main purpose is to get familiar with the basic data structures and functions in open CV.
 
In Open CV, the basic opeartions of capture image from camera, display image in window etc, is very easy to use.
 
My first attempt is using the method described in manual, however the cvAcc function seems does not work, due to the data format, I am still not very familiar with the data structure yet. Then I tried to implemnt the algorithm by my self, however, when calculating sum_square, I find the data will over flow, to solve this problem, need to convert the data to longer bits, which also needs the knowledge of data structure, so give up in the end.
 
Finally, I chose the simpliest way, calculate the difference between current image and the average image, use a fixed criterior to do thresholding. It can be looked as a moving average window in time domain,  the window length is crucial parameter. If the length is too large, the average image will be have some 'light coutour', which will cause false detection. On the other hand, if length is too small (say 1 or 2), the difference caused by lumination will severely affect the result. Using this method detecting moving object or subtract background, can be looked as a time domain filtering, if we combine spatial infomation into it, using spatio-tempora filtering this is the idea of motion template.
 
Tomorrow, I will try to implement  some image processing algorithms.
 

Dynamic Models of Human Motion

Wren and Pentland  IEEE FG 98
 
This paper presents a framework of capture, modeling and reconstruction of 3D human motion from 2 2-D viewpoints.
 
The human body is represented by blog features, these features are captured from low level statistics of pixels, using a Gaussian distribution. The motion is estimated using a dyanmic model (state q, dq/dt, parameter p). The prediction is using Kalman filter, the difference between prediction and the real motion (called innovation sequence) is used as the input sequence to HMM for recognition.
 
The pixel labling (Gamma ij)  is based on the obervation (Oij), the pirior probability P(Oij| model k) and P(Oij | v*) where v is the tracked feature (say hand), and v* is the tracked feature's projection on 2D space.
 
Gamma ij =arg max (over k) [ P(Oij| model k) P(Oij | v*)].
 
This gives the feedback from motion prediction to the observation, thus close the loop 
11月22日

Primitives for human motion: a dynamical approach

Primitives for human motion: a dynamical approach

Del Vecchio etc

Study the primitives of human motion (called movemes), using the tool from dynamical systems theory and systems identification theory.  Some concepts like dynamical independence is defined.

This paper also addressed the segmentation and classification: to decompose continuous trajectory into component movemes. The classification is using Fisher classifier.

Experiment: 2D motion of mouse draw and mouse reach.

A brief summary of papers this week

The papers I read this week covers different levels of the human motion anaylsis.
 
Two papers are about the low level processing (or early vision), using spatial temporal filtering, and statistic methods, offering low level recognition of motion model, the advantage is it does not require prior knowledge, thus can be very general method, disadvantage is it can not describe articualted human motion, thus it can only be used in preprocessing.
 
Another two papers are about human motion modelling. They use some geometric models to model human body (or other objects made of several parts), statistical methods are also used to in the model learning process. They give good result to estimate pose, and can handle occlusion problem
 
"Real-time estimation of human body posture from monocular thermal images" introduce a model free way to human pose estimation by detecting some "significant points". This method is simple but result can be noisy, so can not be used in articulate motion analysis.
 
There are two papers are about high level of human behavior anaylysis:
 

"Primitives for human motion: a dynamical approach" uses dyanmical system theory to model primitves of human motion. To define and describe primitives for human motion is an key issue towards high level human behavior analysis, however is the dynamical system approach a good idea? Since human motion is so complicated

 

"Automatic Analysis of Multimodal Group Actions in Meetings"

 It incorporate social phsycology studies in analysis of human behavior. However, the method is very specific, can not generalize to other applications. 

 
 
 
11月21日

Space-temporal Energy Models for the perception of motion

Key word: spatial-temporal energy, early vision
 
This is a classic paper in the spatial-temporal analysis of signal. Considering motion is a pattern in (x,y,t) space, it uses orientated spatial-temporal filtering (or filter banks), and then estimate the motion energy of different motion model (motion of different direction), thus the motion model is detected by decide which motion model is dominant.
 

Probalistic Recognition of Activity Using Local Appearance

http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=784616

O.Chormat, etc  CVPR 1999 

 

In this paper space-time filters are used to generate a 12-d histogram,  each dimension of the histogram is a motion energy measure for a subband.  Then Bayes rule was used to  estimate of the probability of each pixel belonging to a particular motion model, (its called local decision). The decision of human action is based on average of local decision, a more reliable recogniton method may be done using local decision as input of HMM,

Comment: The method in this paper can be used in preprocssing of a computer vision systems, it can generate area of interest (area of motion) and recognize low level motion models, also this method can be used in detecting motion of multi-objects. No prior knowledge is used in this method.

 

 

11月18日

Active Contour

Classical algorithm!!! Cited by more than 3000 times.
 

Automatic Analysis of Multimodal Group Actions in Meetings

 
Iain McCowan etc, TPAMI 05
 
Key word: group human behavior analysis
 
Summary:
This paper's work is similar to the idea of "vivid room" in IROS a few years ago. It use multi sensor's (basicly audio and visual) to detect some prior defined 'events' .
 
The framework is extract audio-visual features, then segment and recognize events, the actions is modeled as different HMMs based on events.
 
It incorporate some knowledge from the study of social phsychology.
 
 
Comment:
It is typical work of this kind of topic, the vision and sensing part is relatively easy, the enviornment is constrained. Feature and event are not difficult to detect and recognize, the high level modeling and recognition is based on HMMs (or variation of HMM). It is very application specific, not general. The framework is always the same, the difficulty is how to put your applciation problem into this framwork, i.e. how to define the actions (or intetions, or interactions etc), how to define the primitives (or events) to fit into this framework.

Real-time estimation of human body posture from monocular thermal images

Real-time estimation of human body posture from monocular thermalimages S.Iwasawa etc, CVPR97 Key words: distance transform, significant points, center of gravity, posture estimation Brief Summary In this paper they introduced the distance transformation to estimate the Center of Gravity and the Orienation of Upper body, using heuristic methods to find end of body (top of head, tip of foot, tip of hand). And estimate other body parts (Significant Points) based on the above estimations. The method is realatively simple, so can achieve real time processing. The data is from IR camera, so no back-ground noise, and easy to segment human out. The distance transformation to estimate CoG is based on the distance of the pixel to the nearest bountry (called dij). The estimation of Orientation of Upper body is little more complicated. They use a fitted Guassian distribution as a mask to get distance gij. The estimation of Joints is based on genetic learning algorithm. There are some later work based on this distance transform and significant points methods. This can be used for coarse motion estimation, but the difficulty may be to find the boundry of human body, this is not easy in real world (if not using a IR camera) and it may also surffer from occlusion problem
11月17日

Pictorial Structures for Object Recognition

 

Pedro F. Felzenszwalb etc IJCV

 

Key words: Human body model, human motion, probabilistic human model

 

Brief Summary:

Part-based modeling and recognition of objects (e.g. face, human body).

The model of objects (e.g. human body) is called pictorial structure model introduced by Fischler and Elschlager.

Suppose I is the image, is the pictorial model defined below, L is the configuration (location) of parts. Our goal is to maximize p(L|I, ).

The model is in the form of (u, E, c) where u={u1, u2 …} are appearance parameters of parts (it is the feature depends on parts). E is a set of parts ( also called edges), and c={cij | (vi, vj) } indicate the connection parameters between parts.

The spatial relation between parts is modeled by probability p(li, lj | cij), which is usually joint Gaussian distribution (for location) and von Mises distribution for angles. 

The process is first from the training data (I1, …. In) and (L1,… Ln) to estimate the model parameters. Then use the model to recognize parts and objects from testing video sequence.

 

Unsupervised Learning of Human Motion

Unsupervised Learning of Human Motion

Yang Song etc more detail see her PhD thesis.

 

Key word: human motion, human model in vision, probabilistic model, LK feature points

 

Breif Summary:

An unsupervised learning algorithm that can obtain a probabilistic model of human (or other objects composed of a collection of parts) from unlabeled training data is presented.

The data is feature points from LK algorithm, it contains points from object and background clutter also.

The human motion is modeled by joint pdf of the position and velocity of a collection of body parts.

First, from unlabeled feature data, a learning algorithm chooses the useful features correspond to body parts and establish their correspondence across frames.  (Labeling and corresponding process)

Second, a probabilistic independence structure and parameters of pdf is estimated. (Occlusion is tolerable in this model).

Preface

This is a personal academic space, which will cover interesting topics, papers, projects etc with regard to my current research interests, which are indicated in the title of my space.
 
For the first stage of this blog, it will include the material of my daily reading, and some short summary, this is mainly for my personal convenience.
 
Iin the future, hope it will become a platform to introduce my work and research interest.
 
Hope the blogs and links here can be a resouce for people interested in academic and still in the first stage of their reseach (just like I am).
 
Hope you enjoy it!
 
Joe
 
没有相册。