|
[1]
|
Ronald Ortner.
Online regret bounds for markov decision processes with deterministic
transitions.
In Yoav Freund, László Györfi, and György Turán,
editors, Proceedings of the 19th International Conference on Algorithmic
Learning Theory, volume 5254 of Lecture Notes in Artificial
Intelligence, pages 123-137, Budapest, Hungary, 2008. Springer.
[ DOI |
Pdf ]
We consider an upper confidence bound algorithm for
Markov decision processes (MDPs) with deterministic
transitions. For this algorithm we derive upper
bounds on the online regret (with respect to an
(epsilon-)optimal policy) that are logarithmic in
the number of steps taken. These bounds also match
known asymptotic bounds for the general MDP
setting. We also present corresponding lower
bounds. As an application, multi-armed bandits with
switching cost are considered.
Keywords: reinforcement learning, Markov decision processes,
multi-armed bandit problem
|
|
[2]
|
Arto Klami, Craig Saunders, Teófilo de Campos, and Samuel Kaski.
Can relevance of images be inferred from eye movements?
In MIR '08: Proceedings of the 1st ACM International Conference
on Multimedia Information Retrieval, pages 134-140, New York, NY, USA,
2008. ACM.
[ DOI |
Pdf ]
Query formulation and efficient navigation through
data to reach relevant results are undoubtedly major
challenges for image or video retrieval. Queries of
good quality are typically not available and the
search process needs to rely on relevance feedback
given by the user, which makes the search process
iterative. Giving explicit relevance feedback is
laborious, not always easy, and may even be
impossible in ubiquitous computing scenarios. A
central question then is: Is it possible to replace
or complement scarce explicit feedback with implicit
feedback inferred from various sensors not
specifically designed for the task? In this paper,
we present preliminary results on inferring the
relevance of images based on implicit feedback about
users' attention, measured using an eye tracking
device. It is shown that, in reasonably controlled
setups at least, already fairly simple features and
classifiers are capable of detecting the relevance
based on eye movements alone, without using any
explicit feedback.
Keywords: eye movements, image retrieval, implicit relevance feedback
|
|
[3]
|
He Zhang, Markus Koskela, and Jorma Laaksonen.
Report on forms of enriched relevance feedback.
Technical Report TKK-ICS-R10, Helsinki University of Technology,
Department of Information and Computer Science, Espoo, Finland, November
2008.
[ Pdf ]
This report presents a literature survey conducted
to review the current state of the art in research
concerning the use of eye movement measurements and
other non-conventional and implicit relevance
feedback modalities in content-based image and
information retrieval. We define and elaborate on
the concept of enriched relevance feedback and study
its applicability in the intersection of the
aforementioned research areas.
Keywords: enriched relevance feedback, implicit relevance
feedback, eye movement measurements, content-based
image retrieval
|
|
[4]
|
Ville Viitaniemi and Jorma Laaksonen.
Evaluation of pointer click relevance feedback in PicSOM.
Technical Report TKK-ICS-R11, Helsinki University of Technology,
Department of Information and Computer Science, Espoo, Finland, November
2008.
[ Pdf ]
This report presents the results of a series of
experiments where knowledge of the most relevant
part of images is given as additional information to
a content-based image retrieval system. The most
relevant parts have been identified by
search-task-dependent pointer clicks on the
images. As such they provide a rudimentary form of
explicit enriched relevance feedback and to some
extent mimic genuine implicit eye movement
measurements which are essential ingredients of the
PinView project.
Keywords: content-based image retrieval, enriched relevance
feedback, automatic image segmentation
|
|
[5]
|
Markus Koskela and Jorma Laaksonen.
Specification of information interfaces in PinView.
Technical Report TKK-ICS-R12, Helsinki University of Technology,
Department of Information and Computer Science, Espoo, Finland, November
2008.
[ Pdf ]
This report defines the information interfaces for
the PinView project to facilitate the planned
research of the project. Successful collaborative
research between the multiple project sites requires
that the individual efforts can directly support
each other. The report contains definitions for the
used file system structure, for various file
formats, and for data transfer between the project
sites. The report will be updated regularly during
the project.
Keywords: information interfaces, file formats, data transfer,
content-based image retrieval
|
|
[6]
|
Jorma Laaksonen.
Definition of enriched relevance feedback in PicSOM.
Technical Report TKK-ICS-R13, Helsinki University of Technology,
Department of Information and Computer Science, Espoo, Finland, November
2008.
[ Pdf ]
This report defines and implements communication
principles and data formats for transferring
enriched relevance feedback to the PicSOM
content-based image retrieval system used in the
PinView project. The modalities of enriched
relevance feedback include recorded eye movements,
pointer and keyboard events and audio including
speech. The communication is based on the AJAX
technology, where the client and server exchange XML
formatted content by using the XMLHttpRequest
method.
Keywords: content-based image retrieval, enriched relevance
feedback, file formats, data transfer, AJAX,
XMLHttpRequest
|
|
[7]
|
Alex P. Leung and Peter Auer.
An efficient search algorithm for content-based image retrieval with
user feedback.
In Proceedings of the First International Workshop on Video
Mining (VM'08) in association with IEEE International Conference on Data
Mining (ICDM 2008), pages 884-890, Pisa, Italy, December 2008.
[ DOI |
Pdf ]
We propose a probabilistic model for the relevance
feedback of users looking for target images. This
model takes into account user errors and user
uncertainty about distinguishing similarly relevant
images. Based on this model, we have developed an
algorithm, which selects images to be presented to
the user for further relevance feedback until a
satisfactory image is found. In each query session,
the algorithm maintains weights on the images in the
database which reflect the assumed relevance of the
images. Relevance feedback is used to modify these
weights. As a second ingredient, the algorithm uses
a minimax principle to select images for
presentation to the user: any response of the user
will provide significant information about his
query, such that relatively few feedback rounds are
sufficient to find a satisfactory image. We have
implemented this algorithm and have conducted
experiments on both simulated data and real data
which show promising results.
Keywords: content-based image retrieval, search algorithms,
relevance feedback
|
|
[8]
|
Peter Auer, Thomas Jaksch, and Ronald Ortner.
Near-optimal regret bounds for reinforcement learning.
In NIPS08, Advances in Neural Information Processing Systems,
pages 89-96, Vancouver, Canada, December 2008. MIT Press.
[ Pdf ]
For undiscounted reinforcement learning in Markov decision
processes (MDPs) we consider the total regret of a learning algorithm with
respect to an optimal policy. In order to describe the transition
structure of an MDP we propose a new parameter: An MDP has diameter D if
for any pair of states s, s' there is a policy which moves from s to s'
in at most D steps (on average). We present a reinforcement learning
algorithm with total regret O(DS sqrt(AT)) after T steps for any
unknown MDP with S states, A actions per state, and diameter D. This
bound holds with high probability. We also present a corresponding
lower bound of Omega(sqrt(DSAT)) on the total regret of any learning
algorithm.
|
|
[9]
|
Tom Diethe, Zakria Hussain, David R. Hardoon, and John Shawe-Taylor.
Matching pursuit kernel Fisher discriminant analysis.
In International conference on Artificial Intelligence and
Statistics (AISTATS), 2009.
[ Pdf ]
We derive a novel sparse version of Kernel Fisher
Discriminant Analysis (KFDA) using an approach based
on Matching Pursuit (MP). We call this algorithm
Matching Pursuit Kernel Fisher Discriminant Analysis
(MPKFDA). We provide generalisation error bounds
analogous to those constructed for the Robust
Minimax algorithm together with a sample compression
bounding technique. We present experimental results
on real world datasets, which show that MPKFDA is
competitive with the KFDA and the SVM on UCI
datasets, and additional experiments that show that
the MPKFDA on average outperforms KFDA and SVM in
extremely high dimensional settings.
|
|
[10]
|
Mats Sjöberg and Jorma Laaksonen.
Optimal combination of SOM search in best-matching units and map
neighborhood.
In Proceedings of 7th International Workshop on Self-Organizing
Maps (WSOM 2009), volume 5629 of Lecture Notes in Computer Science,
pages 281-289, St. Augustine, Florida, USA, 2009. Springer.
[ Pdf ]
The distribution of a class of objects, such as images
depicting a specific topic, can be studied by observing the best-matching
units (BMUs) of the objects' feature vectors on a Self-Organizing Map
(SOM). When the BMU “hits” on the map are summed up, the class
distribution may be seen as a two-dimensional histogram or discrete
probability density. Due to the SOM's topology preserving property, one
is motivated to smooth the value field and spread out the values
spatially to neighboring units, from where one may expect to find further
similar objects. In this paper we study the impact of using more map
units than just the single BMU of each feature vector in modeling the
class distribution. We demonstrate that by varying the number of units
selected in this way and varying the width of the spatial convolution one
can find an optimal combination which maximizes the class detection
performance.
|
|
[11]
|
Ville Viitaniemi and Jorma Laaksonen.
Spatial extensions to bag of visual words.
In Proceedings of ACM International Conference on Image and
Video Retrieval (CIVR 2009), Santorini, Greece, July 2009.
[ Pdf ]
The Bag of Visual Words (BoV) paradigm has successfully been
applied to image content analysis tasks such as image classification and
object detection. The basic BoV approach overlooks spatial descriptor
distribution within images. Here we describe spatial extensions to BoV
and experimentally compare them in the VOC2007 benchmark image category
detection task. In particular, we compare two ways for tiling images
geometrically: soft tiling approach-proposed here-and the traditional
hard tiling technique. The experiments also address two methods of
fusing information from several tilings of the images: post-classifier
fusion and fusion on the level of a SVM kernel.
The experiments confirm that the performance of a BoV system can be
greatly enhanced by taking the descriptors' spatial distribution into
account. The soft tiling technique performs well even with a single
tiling mask, whereas multi-mask fusion is necessary for good category
detection performance in case of hard tiling. The evaluated fusion
mechanisms performed approximately equally well.
|
|
[12]
|
Tom Diethe and Zakria Hussain.
Kernel polytope faces pursuit.
In European Conference in Machine Learning (ECML), 2009.
[ Pdf ]
Polytope Faces Pursuit (PFP) is a greedy algorithm
that approximates the sparse solutions recovered by
L1 regularised least-squares (Lasso) [4, 10] in a
similar vein to (Orthogonal) Matching Pursuit (OMP)
[16]. The algorithm is based on the geometry of the
polar polytope where at each step a basis function
is chosen by finding the maximal vertex using a
path-following method. The algorithmic complexity is
of a similar order to OMP whilst being able to solve
problems known to be hard for (O)MP. Matching
Pursuit was extended to build kernel-based solutions
to machine learning problems, resulting in the
sparse regression algorithm, Kernel Matching Pursuit
(KMP) [17]. We develop a new algorithm to build
sparse kernel-based solutions using PFP, which we
call Kernel Polytope Faces Pursuit (KPFP). We show
the usefulness of this algorithm by providing a
generalisation error bound [7] that takes into
account a natural regression loss and experimental
results on several benchmark datasets.
Keywords: Polytope Faces Pursuit, Orthogonal Matching Pursuit,
Pseudodimension, Sample Compression Bounds,
Regression, Kernel methods
|
|
[13]
|
Ivan Markovsky, Diana Sima, and Sabine Van Huffel.
Total least squares methods.
Wiley Interdisciplinary Reviews: Computational Statistics,
2(2):212-217, 2010.
[ Pdf ]
Recent advances in total least squares approaches
for solving various errors-in-variables modeling
problems are reviewed, with emphasis on the
following generalizations: - the use of weighted
norms as a measure of the data perturbation size,
capturing prior knowledge about uncertainty in the
data; - the addition of constraints on the
perturbation to preserve the structure of the data
matrix, motivated by structured data matrices
occurring in signal and image processing, systems
and control, and computer algebra; - the use of
regularization in the problem formulation, aiming at
stabilizing the solution by decreasing the effect
due to intrinsic ill-conditioning of certain
problems.
|
|
[14]
|
Julian McAuley, Teófilo de Campos, Gabriela Csurka, and Florent Perronnin.
Hierarchical image-region labeling via structured learning.
In Proceedings of The 20th British Machine Vision Conference
(BMVC 2009), London, UK, September 2009.
[ Pdf ]
We present a graphical model which encodes a series
of hierarchical constraints for classifying image
regions at multiple scales. We show that inference
in this model can be performed efficiently and
exactly, rendering it amenable to structured
learning. Rather than using feature vectors derived
from images themselves, our model is parametrised
using the outputs of a series of first-order
classifiers. Thus our model learns which classifiers
are useful at different scales, and also the
relationships between classifiers at different
scales. We present promising results on the VOC2007
and VOC2008 datasets.
Keywords: Structured learning, image segmentation
|
|
[15]
|
Kitsuchart Pasupa, Arto Klami, Craig J. Saunders, Teófilo de Campos, and
Samuel Kaski.
Can relevance of images be inferred from eye movements?
In 15th European Conference on Eye Movements (ECEM'2009),
page 50, Southamton, UK, August 2009.
[ Pdf ]
Searching for images from a large collection is a
difficult task for automated algorithms. Many
current techniques rely on items which have been
manually `tagged' with descriptors. This situation
is not ideal, as it is difficult to formulate the
initial query, and navigate the large number of hits
returned. In order to present relevant images to the
user, many systems rely on an explicit feedback
mechanism. A machine learning algorithm can be used
to present a new set of relevant images to the user
- thus increasing hit rates. In this work we use
eye movements to assist a user when performing such
a task, and ask this basic question: “Is it
possible to replace or complement scarce explicit
feedback with implicit feedback inferred from
various sensors not specifically designed for the
task?” We give initial results on a range of tasks
and experiments which extend those presented in the
Multimedia Information Retrieval conference
(MIR'08). In reasonably controlled setups, fairly
simple eye movements' features in conjunction with
machine learning techniques are capable of judging
the relevance of an image based on eye movements
alone, without using any explicit feedback -
therefore potentially assisting the user in a task.
|
|
[16]
|
László Kozma, Arto Klami, and Samuel Kaski.
GaZIR: Gaze-based zooming interface for image retrieval.
In Eleventh International Conference on Multimodal Interfaces
(ICMI-MLMI), Cambridge, Massachusetts, USA, 2009.
[ DOI |
Pdf ]
We introduce GaZIR, a gaze-based interface for
browsing and searching for images. The system
computes on-line predictions of relevance of images
based on implicit feedback, and when the user zooms
in, the images predicted to be the most relevant are
brought out. The key novelty is that the relevance
feedback is inferred from implicit cues obtained in
real-time from the gaze pattern, using an estimator
learned during a separate training phase. The
natural zooming interface can be connected to
anycontent-based information retrieval engine
operating on user feedback. We show with experiments
on one engine that there is sufficient amount of
information in the gazepatterns to make the
estimated relevance feedback a viable choice to
complement or even replace explicit feedback by
pointing-and-clicking.
|
|
[17]
|
Martin Antenreiter, Ronald Ortner, and Peter Auer.
Combining classifiers for improved multilabel image classification.
In Learning from Multi-label Data, MLD Workshop at ECML 2009,
pages 16-27, Bled, Slovenia, 2009.
[ Pdf ]
We propose a stacking-like method for multilabel
image classification. Our approach combines the
output of binary base learners, which use different
features for image description, in a simple and
straightforward way: The confidence values of the
base learners are fed into a support vector machine
(SVM) in order to improve prediction
accuracy. Experiments on the datasets of the Pascal
Visual Object Classes challenges (VOC) of 2006
and 2007 show that our method significantly improves
over the performance of the base learners. Our
approach also works better than more naive
approaches for combining features or classifiers.
|
|
[18]
|
Peter Auer and Alex P. Leung.
Relevance feedback models for content-based image retrieval.
In Multimedia Analysis, Processing and Communications, pages
59-79. Springer, 2009.
[ DOI |
Pdf ]
We investigate models for content-based image retrieval with
relevance feedback, in particular focusing on the exploration-exploitation
dilemma. We propose quantitative models for the user behavior and
investigate implications of these models. Three search algorithms for
efficient searches based on the user models are proposed and evaluated.
In the first model a user queries a database for the most (or a
sufficiently) relevant image. The user gives feedback to the system
by selecting the most relevant image from a number of images presented by
the system. In the second model we consider a filtering task where
relevant images should be extracted from a database and presented to the
user. The feedback of the user is a binary classification of each
presented image as relevant or irrelevant. While these models are
related, they differ significantly in the kind of feedback provided by
the user. This requires very different mechanisms to trade off
exploration (finding out what the user wants) and exploitation (serving
images which the system believes relevant for the user).
|
|
[19]
|
Kitsuchart Pasupa, Craig Saunders, Sandor Szedmak, Arto Klami, Samuel Kaski,
and Steve Gunn.
Learning to rank images from eye movements.
In Proceedings of the 12th IEEE International Conference on
Computer Vision, (ICCV'2009), Workshops on Human-Computer Interaction,
(HCI'2009), pages 2009-2016, Kyoto, Japan, October 2009.
[ DOI |
Pdf ]
Combining multiple information sources can improve
the accuracy of search in information
retrieval. This paper presents a new image search
strategy which combines image features together with
implicit feedback from users' eye movements, using
them to rank images. In order to better deal with
larger data sets, we present a perceptron
formulation of the Ranking Support Vector Machine
algorithm. We present initial results on inferring
the rank of images presented in a page based on
simple image features and implicit feedback of
users. The results show that the perceptron
algorithm improves the results, and that fusing eye
movements and image histograms gives better rankings
to images than either of these features alone.
|
|
[20]
|
Ronald Ortner.
Exploiting similarity information in reinforcement learning -
similarity models for multi-armed bandits and mdps.
In Proceedings of the 2nd International Conference on Agents and
Artificial Intelligence (ICAART 2010), pages 203-210, Valencia, Spain,
2010. INSTICC Press.
[ Pdf ]
This paper considers reinforcement learning problems
with additional similarity information. We start
with the simple setting of multi-armed bandits in
which the learner knows for each arm its color,
where it is assumed that arms of the same color have
close mean rewards. An algorithm is presented that
shows that this color information can be used to
improve the dependency of online regret bounds on
the number of arms. Further, we discuss to what
extent this approach can be extended to the more
general case of Markov decision processes. For the
simplest case where the same color for actions means
similar rewards and identical transition
probabilities, an algorithm and a corresponding
online regret bound are given. For the general case
where transition probabilities of same-colored
actions imply only close but not necessarily
identical transition probabilities we give upper and
lower bounds on the error by action aggregation with
respect to the color information. These bounds also
imply that the general case is far more difficult to
handle.
|
|
[21]
|
Ivan Markovsky.
Matrix centering and low-rank approximation.
Technical report, University of Southampton, School of Electronics
and Computer Science, 2009.
[ Pdf ]
Low-rank approximation, better known in the machine
learning literature as principal component analysis,
is a data modeling tool. Data centering, on the
other hand, is a common preprocessing step. This
paper studies the combination of low-rank
approximation with data centering. In the case of no
weights and no constraints, the common preprocessing
practice of mean subtraction leads to optimal
results and therefore need not be changed. The
formal proof of this fact, however, reveals that
preprocessing by mean subtraction is not the only
way to optimally preprocess the data. In the case of
weighted and/or constrained low-rank approximation
problems, the common preprocessing practice leads to
suboptimal results and therefore should be
changed. The paper shows, how classical solution
methods for weighted and structured low-rank
approximation can be modified for doing optimal
preprocessing at the same time as low-rank
approximation.
|
|
[22]
|
Kitsuchart Pasupa, Sandor Szedmak, and David Hardoon.
Image ranking with eye movements.
In Proceedings of the 23rd Annual Conference on Neural
Information Processing Systems (NIPS'2009), Workshop on Advances in Ranking,
pages 37-42, Whistler, Canada, December 2009.
[ Pdf ]
In order to help users navigate an image search
system, one could provide explicit rank information
on a set of images. These rankings are learnt so to
present a new set of relevant images. Although,
requiring explicit information may not be feasible
in some cases, we consider the setting where the
user provides implicit feedback, eye movements, to
assist in such a task. This paper explores the idea
of implicitly incorporating eye movement features in
an image ranking task. Previous work had
demonstrated that combining eye movement and image
features improved the retrieval accuracy. Despite
promising results the proposed approach is
unrealistic as no eye movements are given a-priori
for new images. We propose a novel search approach
which combines image together with eye movements
features in a tensor Ranking Support Vector Machine,
and show that by extracting the individual
source-specific weight vectors we are able to
construct a new image-based semantic space which
outperforms in retrieval accuracy.
|
|
[23]
|
David Hardoon and Kitsuchart Pasupa.
Image ranking with implicit feedback from eye movements.
In Proceedings of the 6th Biennial Symposium on Eye Tracking
Research & Applications (ETRA'2010), Austin, USA, 2010.
[ Pdf ]
In order to help users navigate an image search
system, one could provide explicit information on a
small set of images as to which of them are relevant
or not to their task. These rankings are learned in
order to present a user with a new set of images
that are relevant to their task. Requiring such
explicit information may not be feasible in a number
of cases, we consider the setting where the user
provides implicit feedback, eye movements, to assist
when performing such a task. This paper explores the
idea of implicitly incorporating eye movement
features in an image ranking task where only images
are available during testing. Previous work had
demonstrated that combining eye movement and image
features improved on the retrieval accuracy when
compared to using each of the sources
independently. Despite these encouraging results the
proposed approach is unrealistic as no eye movements
will be presented a-priori for new images (i.e. only
after the ranked images are presented would one be
able to measure a user's eye movements on
them). We propose a novel search methodology which
combines image features together with implicit
feedback from users' eye movements in a tensor
ranking Support Vector Machine and show that it is
possible to extract the individual source-specific
weight vectors. Furthermore, we demonstrate that the
decomposed image weight vector is able to construct
a new image-based semantic space that outperforms
the retrieval accuracy than when solely using the
image-features.
|
|
[24]
|
Ronald Ortner.
Online regret bounds for Markov decision processes with
deterministic transitions.
Theoretical Computer Science, 411(29-30):2684-2695, 2010.
[ DOI |
Pdf ]
We consider an upper confidence bound algorithm for
learning in Markov decision processes with
deterministic transitions. For this algorithm we
derive upper bounds on the online regret with
respect to an (epsilon-)optimal policy that are
logarithmic in the number of steps taken. We also
present a corresponding lower bound. As an
application, multi-armed bandits with switching cost
are considered.
|
|
[25]
|
Ivan Markovsky.
Algorithms and literate programs for weighted low-rank approximation
with missing data.
Technical report, University of Southampton, School of Electronics
and Computer Science, 2009.
[ Pdf ]
Linear models identification from data with missing
values is posed as a weighted low-rank approximation
problem with weights related to the missing values
equal to zero. Alternating projections and variable
projections methods for solving the resulting
problem are outlined and implemented in a literate
programming style, using Matlab/Octave's scripting
language. The methods are evaluated on synthetic
data and real data from the MovieLens data sets.
|
|
[26]
|
Zakria Hussain, Kitsuchart Pasupa, and John Shawe-Taylor.
Learning relevant eye movement feature spaces across users.
In Proceedings of the 6th Biennial Symposium on Eye Tracking
Research & Applications (ETRA'2010), Austin, USA, March 2010.
[ DOI |
Pdf ]
In this paper we predict the relevance of images
based on a low-dimensional feature space found using
several users' eye movements. Each user is given an
image-based search task, during which their eye
movements are extracted using a Tobii eye
tracker. The users also provide us with explicit
feedback regarding the relevance of images. We
demonstrate that by using a greedy Nystrom algorithm
on the eye movement features of different users, we
can find a suitable low-dimensional feature space
for learning. We validate the suitability of this
feature space by projecting the eye movement
features of a new user into this space, training an
online learning algorithm using these features, and
showing that the number of mistakes (regret over
time) made in predicting relevant images is lower
than when using the original eye movement
features. We also plot Recall-Precision and ROC
curves, and use a sign test to verify the
statistical significance of our results.
|
|
[27]
|
Ivan Markovsky.
Algorithms and literate programs for weighted low-rank approximation
with missing data.
In A. Iske, E. Georgoulis, and J. Levesley, editors, Springer
Proceedings in Mathematics, volume 3, pages 255-273. Springer, 2011.
[ DOI |
Pdf ]
Linear models identification from data with missing
values is posed as a weighted low-rank approximation
problem with weights related to the missing values
equal to zero. Alternating projections and variable
projections methods for solving the resulting
problem are outlined and implemented in a literate
programming style, using Matlab/Octave's scripting
language. The methods are evaluated on synthetic
data and real data from the MovieLens data sets.
|
|
[28]
|
Julian J. McAuley, Teófilo de Campos, and Tibério S. Caetano.
Unified graph matching in euclidean spaces.
In Conference on Computer Vision and Pattern Recognition
(CVPR), pages 1871-1878, San Francisco, June 2010. IEEE.
[ DOI |
Pdf ]
Graph matching is a classical problem in pattern
recognition with many applications, particularly
when the graphs are embedded in Euclidean spaces, as
is often the case for computer vision. There are
several variants of the matching problem, concerned
with isometries, isomorphisms, homeomorphisms, and
node attributes; different approaches exist for each
variant. We show how structured estimation methods
from machine learning can be used to combine such
variants into a single version of graph matching. In
this paradigm, the extent to which our datasets
reveal isometries, isomorphisms, homeomorphisms, and
other properties is automatically accounted for in
the learning process so that any such specific
qualification of graph matching loses meaning. We
present experiments with real computer vision data
showing the leverage of this unified formulation.
|
|
[29]
|
Florent Perronnin, Yan Liu, Jorge Sanchez, and Herve Poirier.
Large-scale image retrieval with compressed fisher vectors.
In Conference on Computer Vision and Pattern Recognition
(CVPR), pages 3384-3391, San Francisco, CA, USA, June 2010. IEEE.
[ DOI ]
The problem of large-scale image search has been
traditionally addressed with the bag-of-visual-words
(BOV). In this article, we propose to use as an
alternative the Fisher kernel framework. We first
show why the Fisher representation is well-suited to
the retrieval problem: it describes an image by what
makes it different from other images. One drawback
of the Fisher vector is that it is high-dimensional
and, as opposed to the BOV, it is dense. The
resulting memory and computational costs do not make
Fisher vectors directly amenable to large-scale
retrieval. Therefore, we compress Fisher vectors to
reduce their memory footprint and speed-up the
retrieval. We compare three binarization approaches:
a simple approach devised for this representation
and two standard compression techniques. We show on
two publicly available datasets that compressed
Fisher vectors perform very well using as little as
a few hundreds of bits per image, and significantly
better than a very recent compressed BOV approach.
|
|
[30]
|
Thomas Jaksch, Ronald Ortner, and Peter Auer.
Near-optimal regret bounds for reinforcement learning.
The Journal of Machine Learning Research, 11:1563-1600, April
2010.
[ Pdf ]
For undiscounted reinforcement learning in Markov
decision processes (MDPs) we consider the total
regret of a learning algorithm with respect to an
optimal policy. In order to describe the transition
structure of an MDP we propose a new parameter: An
MDP has diameter D if for any pair of states s,s'
there is a policy which moves from s to s' in at
most D steps (on average). We present a
reinforcement learning algorithm with total regret
Õ(DS√AT) after T steps for any
unknown MDP with S states, A actions per state, and
diameter D. A corresponding lower bound of Ω(√DSAT)
on the total regret of any learning algorithm is
given as well. These results are complemented by a
sample complexity bound on the number of suboptimal
steps taken by our algorithm. This bound can be used
to achieve a (gap-dependent) regret bound that is
logarithmic in T. Finally, we also consider a
setting where the MDP is allowed to change a fixed
number of l times. We present a modification of our
algorithm that is able to deal with this setting and
show a regret bound of Õ(l^1/3 T^2/3DS √A).
|
|
[31]
|
Muayed Al-Huseiny, Sasan Mahmoodi, and Mark Nixon.
Gait learning-based regenerative model: a level set approach.
In 20th International Conference on Pattern Recognition, ICPR,
2010.
[ DOI |
Pdf ]
We propose a learning method for gait synthesis from
a sequence of shapes(frames) with the ability to
extrapolate to novel data. It involves the
application of PCA, first to reduce the data
dimensionality to certain features, and second to
model corresponding features derived from the
training gait cycles as a Gaussian
distribution. This approach transforms a non
Gaussian shape deformation problem into a Gaussian
one by considering features of entire gait cycles as
vectors in a Gaussian space. We show that these
features which we formulate as continuous functions
can be modeled by PCA. We also use this model to
in-between (generate intermediate unknown) shapes in
the training cycle. Furthermore, this paper
demonstrates that the derived features can be used
in the identification of pedestrians.
Keywords: Computer Vision, Gait Analysis, Level Sets, PCA,
Statistical Models
|
|
[32]
|
Ivan Markovsky.
Bibliography on total least squares and related methods.
Statistics and Its Interface, 3(3):329-334, 2010.
[ Url |
Pdf ]
The class of total least squares methods has been
growing since the basic total least squares method
was proposed by Golub and Van Loan in the
70's. Efficient and robust computational algorithms
were developed and properties of the resulting
estimators were established in the
errors-in-variables setting. At the same time the
developed methods were applied in diverse areas,
leading to broad literature on the subject. This
paper collects the main references and guides the
reader in finding details about the total least
squares methods and their applications. In addition,
the paper comments on similarities and differences
between the total least squares and the singular
spectrum analysis methods.
|
|
[33]
|
Peter Auer and Ronald Ortner.
Ucb revisited: improved regret bounds for the stochastic multi-armed
bandit problem.
Periodica Mathematica Hungarica, 61(1-2):55-65, 2010.
[ DOI |
Pdf ]
In the stochastic multi-armed bandit problem we
consider a modification of the UCB algorithm of Auer
et al. [4]. For this modified algorithm we give an
improved bound on the regret with respect to the
optimal reward. While for the original UCB algorithm
the regret in K-armed bandits after T trials is
bounded by const· (Klog(T))/(Δ),
where Δ measures the distance between a
suboptimal arm and the optimal arm, for the modified
UCB algorithm we show an upper bound on the regret
of const · (Klog(TΔ2 ))/(Δ).
Keywords: multi-armed bandit problem, regret
|
|
[34]
|
Peter Auer, Zakria Hussain, Samuel Kaski, Arto Klami, Jussi Kujala, Jorma
Laaksonen, Alex P. Leung, Kitsuchart Pasupa, and John Shawe-Taylor.
Pinview: Implicit feedback in content-based image retrieval.
In ICML Workshop on Reinforcement Learning and Search in Very
Large Spaces, Haifa, Israel, June 2010.
[ Pdf ]
This paper describes Pinview, a content-based image
retrieval system that exploits implicit relevance
feedback during a search session. The goal is to
retrieve interesting images and the relevance
feedback could be eye movements or clicks on the
images. Pinview contains several novel methods that
infer the intent of the user. From relevance
feedback and visual features of images Pinview
learns a similarity metric between images which
depends on the current interests of the user. It
then retrieves images with a specialized
reinforcement learning algorithm that balances the
tradeoff between exploring new images and exploiting
the already inferred interests of the user. In
practise, Pinview is integrated to the content-based
image retrieval system PicSOM, so it is possible to
apply it to realworld image databases. Preliminary
experiments show that eye movements provide a rich
input modality from which it is possible to learn
the interests of the user.
|
|
[35]
|
Arto Klami.
Inferring task-relevant image regions from gaze data.
In IEEE International Workshop on Machine Learning for Signal
Processing, MLSP, Kittilä, Finland, 2010.
[ DOI |
Pdf ]
A number of studies have recently used eye movements
of a user inspecting the content as implicit
relevance feedback for proactive retrieval
systems. Typically binary feedback for images or
text paragraphs is inferred from the gaze
pattern. We seek to make such feedback richer for
image retrieval, by inferring which parts of the
image the user found relevant. For this purpose, we
present a novel Bayesian mixture model for inferring
possible target regions directly from gaze data
alone, and show how the relevance of those regions
can then be inferred using a simple classifier that
is independent of the content or the task.
|
|
[36]
|
Zakria Hussain, Alex P. Leung, Kitsuchart Pasupa, David R. Hardoon, Peter Auer,
and John Shawe-Taylor.
Exploration-exploitation of eye movement enriched multiple feature
spaces for content-based image retrieval.
In European Conference on Machine Learning and Principles and
Practice of Knowledge Discovery in Databases, ECMLPKDD, pages 554-569.
Springer, 2010.
[ DOI |
Pdf ]
In content-based image retrieval (CBIR) with
relevance feedback we would like to retrieve
relevant images based on their content features and
the feedback given by users. In this paper we view
CBIR as an Exploration-Exploitation problem and
apply a kernel version of the LinRel algorithm to
solve it. By using multiple feature extraction
methods and utilising the feedback given by users,
we adopt a strategy of multiple kernel learning to
find a relevant feature space for the kernel LinRel
algorithm. We call this algorithm
LinRelMKL. Furthermore, when we have access to eye
movement data of users viewing images we can enrich
our (multiple) feature spaces by using a tensor
kernel SVM. When learning in this enriched space we
show that we can significantly improve the search
results over the LinRel and LinRelMKL
algorithms. Our results suggest that the use of
exploration-exploitation with multiple feature
spaces is an efficient way of constructing CBIR
systems, and that when eye movement features are
available, they should be used to help improve
CBIR.
Keywords: content-based image retrieval, LinRel, images, eye
movements, multiple kernel learning, tensor kernel
SVM
|
|
[37]
|
Peter Auer, Zakria Hussain, Samuel Kaski, Arto Klami, Jussi Kujala, Jorma
Laaksonen, Alex P. Leung, Kitsuchart Pasupa, and John Shawe-Taylor.
Pinview: Implicit feedback in content-based image retrieval.
In Workshop on Applications of Pattern Analysis, WAPA, pages
51-57, 2010.
[ Pdf ]
This paper describes Pinview, a content-based image
retrieval system that exploits implicit relevance
feedback during a search session. Pinview contains
several novel methods that infer the intent of the
user. From relevance feedback, such as eye movements
or clicks, and visual features of images Pinview
learns a similarity metric between images which
depends on the current interests of the user. It
then retrieves images with a specialized
reinforcement learning algorithm that balances the
tradeoff between exploring new images and exploiting
the already inferred interests of the user. In
practise, we have integrated Pinview to the
content-based image retrieval system PicSOM, in
order to apply it to real- world image
databases. Preliminary experiments show that eye
movements provide a rich input modality from which
it is possible to learn the interests of the user.
|
|
[38]
|
Sandor Szedmak, Yizhao Ni, and Steve R. Gunn.
Maximum margin learning with incomplete data: Learning networks
instead of tables.
In Workshop on Applications of Pattern Analysis, WAPA, 2010.
[ Pdf ]
In this paper we address the problem of predicting
when the available data is incomplete. We show that
changing the generally accepted table-wise view of
the sample items into a graph representable one
allows us to solve these kind of problems in a very
concise way by using the well known convex,
one-class classification based, optimisation
framework. The use of the one-class formulation in
the learning phase and in the prediction as well
makes the entire procedure highly consistent. The
graph representation can express the complex
interdependencies among the data sources. The
underlying optimisation problem can be transformed
into a on-line algorithm, e.g. a perceptron type
one, and in this way it can deal with data sets of
million items. This framework covers and encompasses
supervised, semi-supervised and some unsupervised
learning problems. Furthermore, the data sources can
be chosen as not only simple binary variables or
vectors but text documents, images or even graphs
with complex internal structures.
|
|
[39]
|
Teemu Ruokolainen, Tanel Alumäe, and Marcus Dobrinkat.
Using dependency grammar features in whole sentence maximum entropy
language model for speech recognition.
In Fourth International Conference Baltic HLT, 2010.
[ DOI |
Pdf ]
In automatic speech recognition, the standard choice
for a language model is the well-known n-gram
model. The n-grams are used to predict the
probability of a word given its n-1 preceding
words. However, the n-gram model is not able to
explicitly learn grammatical relations of the
sentence. In the present work, in order to augment
the n-gram model with grammatical features, we apply
the Whole Sentence Maximum Entropy framework. The
grammatical features are head-modifier relations
between pairs of words, together with the labels of
the relationships, obtained with the dependency
grammar. We evaluate the model in a large vocabulary
speech recognition task with Wall Street Journal
speech corpus. The results show a substantial
improvement in both test set perplexity and word
error rate.
|
|
[40]
|
Muayed Al-Huseiny, Sasan Mahmoodi, and Mark Nixon.
Robust rigid shape registration method using a level set formulation.
In 6th International Symposium on Visual Computing, volume
6454, pages 252-261, Las Vegas, Nevada, USA, 2010. Springer.
[ DOI |
Pdf ]
This paper presents a fast algorithm for robust
registration of shapes implicitly represented by
signed distance functions(SDFs). The proposed
algorithm aims to recover the transformation
parameters(scaling, rotation, and translation) by
minimizing the dissimilarity between two shapes. To
achieve a robust and fast algorithm, linear
orthogonal transformations are employed to minimize
the dissimilarity measures. The algorithm is applied
to various shape registration problems, to address
issues such as topological invariance, shape
complexity, and convergence speed and stability. The
outcomes are compared with other state-of-the-art
shape registration algorithms to show the advantages
of the new technique.
|
|
[41]
|
Sasan Mahmoodi.
Anisotropic diffusion for noise removal of band pass signals.
Signal Processing, 91(5):1298-1307, 2011.
[ DOI |
Pdf ]
A noise removal method for band pass signals based
on the anisotropic diffusion algorithm originally
put forward by Perona and Malik is proposed in this
paper. The anisotropic smoothing algorithm proposed
here is for band pass signals modulated with a
constant carrier frequency. A partial differential
equation to smooth band pass noisy signals is
derived. The propagator of this differential
equation is also analytically calculated in this
paper. An appropriate linear operator is then
considered here for such band pass signals to form
an anisotropic diffusion algorithm. The algorithm
proposed here demonstrates better performance for
band pass noisy signals containing discontinuities
in comparison with the traditional Perona–Malik (PM)
algorithm and is robust in the presence of excessive
noise with SNR less than unity.
Keywords: Noise removal; Band pass signals; Perona–Malik
algorithm; Anisotropic diffusion
|
|
[42]
|
Sasan Mahmoodi.
Scale invariant filtering design and analysis for edge detection.
Proc.of the Royal Society A: Mathematical, Physical, and
Engineering Sciences, 467(2130):1719-1738, 2010.
[ DOI |
Pdf ]
Existing edge detection filters work well on
straight edges but make significant errors near
sharp corners by producing rounded corners. This is
due to the fact that the edge maps produced by these
filters are scale variant. We enhance Canny’s
optimality criteria to incorporate detection
performance near corners as an explicit design
objective. The resulting optimal filter, termed
‘Bessel integral filter’, can be derived
analytically and exhibits superior performance over
recent alternatives, both in terms of numerical
accuracy and experimental fidelity. A noise-free
localization index is also derived here to account
for the detection accuracy of discontinuities
forming sharp corners in the absence of noise. We
prove here that edges detected by the filters that
are not optimal with respect to this noise-free
localization index are scale variant. However, the
Bessel integral filter proposed here is optimal with
respect to the noise-free localization index and
therefore it is a scale-invariant filter.
Keywords: edge detection, Canny-like criteria, Bessel integral
filter, scale-space smoothing, two-dimensional
filtering design, scale-invariant filters
|
|
[43]
|
Sasan Mahmoodi and Steve Gunn.
Scale space smoothing, image feature extraction and bessel filters.
In 17th Scandinavian Conference on Image Analysis, Ystad,
Sweden, May 2011. Springer.
[ Url |
Pdf ]
The Green function of Mumford-Shah functional in the
absence of discontinuities is known to be a modified
Bessel function of the second kind and zero
degree. Such a Bessel function is regularized here
and used as a filter for feature extraction. It is
demonstrated in this paper that a Bessel filter does
not follow the scale space smoothing property of
bounded linear filters such as Gaussian filters. The
features extracted by the Bessel filter are
therefore scale invariant. Edges, blobs, and
junctions are features considered here to show that
the extracted features remain unchanged by varying
the scale of a Bessel filter. The scale invariance
property of Bessel filters for edges is analytically
proved here. We conjecture that Bessel filters also
enjoy this scale invariance property for other kinds
of features.
|
|
[44]
|
Sasan Mahmoodi and Steve Gunn.
Snake based unsupervised texture segmentation using gaussian markov
random field models.
In 18th IEEE International Conference on Image Processing,
Brussels, Belgium, September 2011. IEEE.
[ Url |
Pdf ]
A functional for unsupervised texture segmentation
is investigated in this paper. An auto-normal model
based on Markov Random Fields is employed to model
textures. The functional investigated here is
optimized with respect to the model parameters and
the evolving contour to simultaneously estimate
model parameters and find the boundaries between
textures. Experimental results applied on the
textures of the Brodatz album demonstrate the
superior performance and the higher speed of
convergence of this algorithm in comparison with a
traditional stochastic algorithm in the literature.
|
|
[45]
|
He Zhang, Teemu Ruokolainen, Jorma Laaksonen, Christina Hochleitner, and Rudolf
Traunmüller.
Gaze- and speech-enhanced content-based image retrieval in image
tagging.
In International Conference on Artificial Neural Networks,
ICANN. Springer, 2011.
[ Pdf ]
We describe a setup and experiments where users are
checking and correcting image tags given by an
automatic tagging system. We study how much the
application of a content-based image retrieval
(CBIR) method speeds up the process of finding and
correcting the erroneously-tagged images. We also
analyze the use of implicit relevance feedback from
the user's gaze tracking patterns as a method for
boosting up the CBIR performance. Finally, we use
automatic speech recognition for giving the correct
tags for those images that were wrongly tagged. The
experiments show a large variance in the tagging
task performance, which we believe is primarily
caused by the users' subjectivity in image contents
as well as their varying familiarity with the gaze
tracking and speech recognition setups. The results
suggest potentials for gaze and/or speech enhanced
CBIR method in image tagging, at least for some
users.
Keywords: content-based image retrieval, automatic image
tagging, gaze tracking, speech recognition
|
|
[46]
|
Zakria Hussain and John Shawe-Taylor.
Improved loss bounds for multiple kernel learning.
In International Conference on Artificial Intelligence and
Statistics, AISTATS, volume 15, Fort Lauderdale, FL, USA, 2011.
[ Pdf ]
We propose two new generalization error bounds for
multiple kernel learning (MKL). First, using the
bound of Srebro and Ben-David (2006) as a starting
point, we derive a new version which uses a simple
counting argument for the choice of kernels in order
to generate a tighter bound when 1-norm reg-
ularization (sparsity) is imposed in the ker- nel
learning problem. The second bound is a Rademacher
complexity bound which is ad- ditive in the
(logarithmic) kernel complexity and margin
term. This dependence is supe- rior to all
previously published Rademacher bounds for learning
a convex combination of kernels, including the
recent bound of Cortes et al. (2010), which exhibits
a multiplicative interaction. We illustrate the
tightness of our bounds with simulations.
|