| 个人资料Cosmos's Academic Space照片日志列表 | 帮助 |
|
1月4日 Strike a Pose: Tracking People by Finding Stylized PosesPictorial Structure; CVPR05
Initialize by detecting stylized poses (lateral walking), then learn the appearance of limbs.
Configurationg using pictorial structure recognition method (dynamic programming)
Tracking
website:
Recovering 3D Human pose from monocular ImagesTriggers, PAMI 05
Using 60-D shape context as descriptor of silhouette, reduce the dimension of the problem by PCA, the silhouette is encoded as 100-D vector in the end, it is then mapped to (labling) 55-D joint angle data, (this mapping is using ground truth data, the image is created by computer graphics). Use the labeled data train the learning machine (RVM).
Model based vs Model free
We consider the problem of estimating and tracking 3D configurations of complex articulated objects from monocular images, e.g. for applications requiring 3D human body poseand hand gesture analysis. There are two main schools of thought on this. Model-based approaches presuppose an explicitlyknown parametric body model, and estimate the pose either by directly inverting the kinematics (which has many possible solutions and requires known image positions for each body part) [28], or by numerically optimizing some form of model-image correspondence metric over the pose variables, using a forward rendering model to predict the images (which is expensive and requires a good initialization, and the problem always has many local minima [25]). An important subcase is model-based tracking, which focuses on tracking thepose estimate from one time step to the next starting from a known initialization, based on an approximate dynamical model [9, 23]. In contrast, learning based approaches try toavoid the need for explicit initialization and accurate 3D modelling and rendering, and to capitalize on the fact that the set of typical human poses is far smaller than the set ofkinematically possible ones, by estimating (learning) a model that directly recovers pose estimates from observable image quantities. In particular, example based methods explicitlystore a set of training examples whose 3D poses are known, and estimate pose by searching for training image(s) similar to the given input image, and interpolating from their poses [5,18,22,27]. In this paper we take a learning based approach, modeling People and Human InteractionOverviewThis is the second workshop of a series on "Modeling People". This new edition is extended in scope to include models of Human ("Social") Interactions. Given the increasing capacity to acquire perceptual data with cameras and other sensors, and the number of applications, there is growing interest in several areas related to people modeling, ranging from the recognition of human dynamic behaviour, studied in both the computer vision and graphics communities, to the detection of human communication patterns in social settings, relevant to vision and other fields. The aim of this workshop is to provide a focus for computer vision research into the modelling and understanding of people, their actions, and their interactions with other people. The intention is to bring together leading researchers in vision, graphics, and cognitive sciences to assess the current state-of-the-art, identify known problems and generate ideas for future progress towards automatic modelling and interpretation of people's movements, actions, and interactions. Given the scope of ICCV, we expect the participation of computer vision researchers investigating these areas . However, given the multimodal nature of human interaction, we also encourage the participation of researchers in related fields (audio and speech, linguistics, behavioral science, and wearable computing) who can contribute their views.
We invite manuscript submissions on the following topics:
1月2日 Inference of 3D posture by classification of 3D human body shapeby Cohen etc, ICCV 03
Using tromogrophy like method to get scanned histogram, using SVM to do classification of posture. 12月22日 Background SubstractionThis is a first experiement using OpenCV, the main purpose is to get familiar with the basic data structures and functions in open CV.
In Open CV, the basic opeartions of capture image from camera, display image in window etc, is very easy to use.
My first attempt is using the method described in manual, however the cvAcc function seems does not work, due to the data format, I am still not very familiar with the data structure yet. Then I tried to implemnt the algorithm by my self, however, when calculating sum_square, I find the data will over flow, to solve this problem, need to convert the data to longer bits, which also needs the knowledge of data structure, so give up in the end.
Finally, I chose the simpliest way, calculate the difference between current image and the average image, use a fixed criterior to do thresholding. It can be looked as a moving average window in time domain, the window length is crucial parameter. If the length is too large, the average image will be have some 'light coutour', which will cause false detection. On the other hand, if length is too small (say 1 or 2), the difference caused by lumination will severely affect the result. Using this method detecting moving object or subtract background, can be looked as a time domain filtering, if we combine spatial infomation into it, using spatio-tempora filtering this is the idea of motion template.
Tomorrow, I will try to implement some image processing algorithms.
Dynamic Models of Human MotionWren and Pentland IEEE FG 98
This paper presents a framework of capture, modeling and reconstruction of 3D human motion from 2 2-D viewpoints.
The human body is represented by blog features, these features are captured from low level statistics of pixels, using a Gaussian distribution. The motion is estimated using a dyanmic model (state q, dq/dt, parameter p). The prediction is using Kalman filter, the difference between prediction and the real motion (called innovation sequence) is used as the input sequence to HMM for recognition.
The pixel labling (Gamma ij) is based on the obervation (Oij), the pirior probability P(Oij| model k) and P(Oij | v*) where v is the tracked feature (say hand), and v* is the tracked feature's projection on 2D space.
Gamma ij =arg max (over k) [ P(Oij| model k) P(Oij | v*)].
This gives the feedback from motion prediction to the observation, thus close the loop 11月22日 Primitives for human motion: a dynamical approachPrimitives for human motion: a dynamical approach Del Vecchio etc Study the primitives of human motion (called movemes), using the tool from dynamical systems theory and systems identification theory. Some concepts like dynamical independence is defined. This paper also addressed the segmentation and classification: to decompose continuous trajectory into component movemes. The classification is using Fisher classifier. Experiment: 2D motion of mouse draw and mouse reach. A brief summary of papers this weekThe papers I read this week covers different levels of the human motion anaylsis.
Two papers are about the low level processing (or early vision), using spatial temporal filtering, and statistic methods, offering low level recognition of motion model, the advantage is it does not require prior knowledge, thus can be very general method, disadvantage is it can not describe articualted human motion, thus it can only be used in preprocessing.
Another two papers are about human motion modelling. They use some geometric models to model human body (or other objects made of several parts), statistical methods are also used to in the model learning process. They give good result to estimate pose, and can handle occlusion problem
"Real-time estimation of human body posture from monocular thermal images" introduce a model free way to human pose estimation by detecting some "significant points". This method is simple but result can be noisy, so can not be used in articulate motion analysis.
There are two papers are about high level of human behavior anaylysis:
"Primitives for human motion: a dynamical approach" uses dyanmical system theory to model primitves of human motion. To define and describe primitives for human motion is an key issue towards high level human behavior analysis, however is the dynamical system approach a good idea? Since human motion is so complicated
"Automatic Analysis of Multimodal Group Actions in Meetings" It incorporate social phsycology studies in analysis of human behavior. However, the method is very specific, can not generalize to other applications. 11月21日 Space-temporal Energy Models for the perception of motionKey word: spatial-temporal energy, early vision
This is a classic paper in the spatial-temporal analysis of signal. Considering motion is a pattern in (x,y,t) space, it uses orientated spatial-temporal filtering (or filter banks), and then estimate the motion energy of different motion model (motion of different direction), thus the motion model is detected by decide which motion model is dominant.
Probalistic Recognition of Activity Using Local Appearancehttp://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=784616 O.Chormat, etc CVPR 1999
In this paper space-time filters are used to generate a 12-d histogram, each dimension of the histogram is a motion energy measure for a subband. Then Bayes rule was used to estimate of the probability of each pixel belonging to a particular motion model, (its called local decision). The decision of human action is based on average of local decision, a more reliable recogniton method may be done using local decision as input of HMM, Comment: The method in this paper can be used in preprocssing of a computer vision systems, it can generate area of interest (area of motion) and recognize low level motion models, also this method can be used in detecting motion of multi-objects. No prior knowledge is used in this method.
11月18日 Active ContourAutomatic Analysis of Multimodal Group Actions in MeetingsIain McCowan etc, TPAMI 05
Key word: group human behavior analysis
Summary:
This paper's work is similar to the idea of "vivid room" in IROS a few years ago. It use multi sensor's (basicly audio and visual) to detect some prior defined 'events' .
The framework is extract audio-visual features, then segment and recognize events, the actions is modeled as different HMMs based on events.
It incorporate some knowledge from the study of social phsychology.
Comment:
It is typical work of this kind of topic, the vision and sensing part is relatively easy, the enviornment is constrained. Feature and event are not difficult to detect and recognize, the high level modeling and recognition is based on HMMs (or variation of HMM). It is very application specific, not general. The framework is always the same, the difficulty is how to put your applciation problem into this framwork, i.e. how to define the actions (or intetions, or interactions etc), how to define the primitives (or events) to fit into this framework. Real-time estimation of human body posture from monocular thermal imagesReal-time estimation of human body posture from monocular thermalimages
S.Iwasawa etc, CVPR97
Key words: distance transform, significant points, center of gravity, posture estimation
Brief Summary
In this paper they introduced the distance transformation to estimate the Center of Gravity and the Orienation of Upper body, using heuristic methods to find end of body (top of head, tip of foot, tip of hand). And estimate other body parts (Significant Points) based on the above estimations. The method is realatively simple, so can achieve real time processing.
The data is from IR camera, so no back-ground noise, and easy to segment human out. The distance transformation to estimate CoG is based on the distance of the pixel to the nearest bountry (called dij).
The estimation of Orientation of Upper body is little more complicated. They use a fitted Guassian distribution as a mask to get distance gij.
The estimation of Joints is based on genetic learning algorithm.
There are some later work based on this distance transform and significant points methods. This can be used for coarse motion estimation, but the difficulty may be to find the boundry of human body, this is not easy in real world (if not using a IR camera) and it may also surffer from occlusion problem 11月17日 Pictorial Structures for Object Recognition
Pedro F. Felzenszwalb etc IJCV
Key words: Human body model, human motion, probabilistic human model
Brief Summary: Part-based modeling and recognition of objects (e.g. face, human body). The model of objects (e.g. human body) is called pictorial structure model introduced by Fischler and Elschlager. Suppose I is the image, is the pictorial model defined below, L is the configuration (location) of parts. Our goal is to maximize p(L|I, ). The model is in the form of (u, E, c) where u={u1, u2 …} are appearance parameters of parts (it is the feature depends on parts). E is a set of parts ( also called edges), and c={cij | (vi, vj) } indicate the connection parameters between parts. The spatial relation between parts is modeled by probability p(li, lj | cij), which is usually joint Gaussian distribution (for location) and von Mises distribution for angles. The process is first from the training data (I1, …. In) and (L1,… Ln) to estimate the model parameters. Then use the model to recognize parts and objects from testing video sequence.
Unsupervised Learning of Human MotionUnsupervised Learning of Human Motion Yang Song etc more detail see her PhD thesis.
Key word: human motion, human model in vision, probabilistic model, LK feature points
Breif Summary: An unsupervised learning algorithm that can obtain a probabilistic model of human (or other objects composed of a collection of parts) from unlabeled training data is presented. The data is feature points from LK algorithm, it contains points from object and background clutter also. The human motion is modeled by joint pdf of the position and velocity of a collection of body parts. First, from unlabeled feature data, a learning algorithm chooses the useful features correspond to body parts and establish their correspondence across frames. (Labeling and corresponding process) Second, a probabilistic independence structure and parameters of pdf is estimated. (Occlusion is tolerable in this model). PrefaceThis is a personal academic space, which will cover interesting topics, papers, projects etc with regard to my current research interests, which are indicated in the title of my space.
For the first stage of this blog, it will include the material of my daily reading, and some short summary, this is mainly for my personal convenience.
Iin the future, hope it will become a platform to introduce my work and research interest.
Hope the blogs and links here can be a resouce for people interested in academic and still in the first stage of their reseach (just like I am).
Hope you enjoy it!
Joe |
|
|||
|
|