FP7   EU

Publications

[1] Ronald Ortner. Online regret bounds for markov decision processes with deterministic transitions. In Yoav Freund, László Györfi, and György Turán, editors, Proceedings of the 19th International Conference on Algorithmic Learning Theory, volume 5254 of Lecture Notes in Artificial Intelligence, pages 123-137, Budapest, Hungary, 2008. Springer. [ DOI | Pdf ]
We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the online regret (with respect to an (epsilon-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptotic bounds for the general MDP setting. We also present corresponding lower bounds. As an application, multi-armed bandits with switching cost are considered.

Keywords: reinforcement learning, Markov decision processes, multi-armed bandit problem
[2] Arto Klami, Craig Saunders, Teófilo de Campos, and Samuel Kaski. Can relevance of images be inferred from eye movements? In MIR '08: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pages 134-140, New York, NY, USA, 2008. ACM. [ DOI | Pdf ]
Query formulation and efficient navigation through data to reach relevant results are undoubtedly major challenges for image or video retrieval. Queries of good quality are typically not available and the search process needs to rely on relevance feedback given by the user, which makes the search process iterative. Giving explicit relevance feedback is laborious, not always easy, and may even be impossible in ubiquitous computing scenarios. A central question then is: Is it possible to replace or complement scarce explicit feedback with implicit feedback inferred from various sensors not specifically designed for the task? In this paper, we present preliminary results on inferring the relevance of images based on implicit feedback about users' attention, measured using an eye tracking device. It is shown that, in reasonably controlled setups at least, already fairly simple features and classifiers are capable of detecting the relevance based on eye movements alone, without using any explicit feedback.

Keywords: eye movements, image retrieval, implicit relevance feedback
[3] He Zhang, Markus Koskela, and Jorma Laaksonen. Report on forms of enriched relevance feedback. Technical Report TKK-ICS-R10, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report presents a literature survey conducted to review the current state of the art in research concerning the use of eye movement measurements and other non-conventional and implicit relevance feedback modalities in content-based image and information retrieval. We define and elaborate on the concept of enriched relevance feedback and study its applicability in the intersection of the aforementioned research areas.

Keywords: enriched relevance feedback, implicit relevance feedback, eye movement measurements, content-based image retrieval
[4] Ville Viitaniemi and Jorma Laaksonen. Evaluation of pointer click relevance feedback in PicSOM. Technical Report TKK-ICS-R11, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report presents the results of a series of experiments where knowledge of the most relevant part of images is given as additional information to a content-based image retrieval system. The most relevant parts have been identified by search-task-dependent pointer clicks on the images. As such they provide a rudimentary form of explicit enriched relevance feedback and to some extent mimic genuine implicit eye movement measurements which are essential ingredients of the PinView project.

Keywords: content-based image retrieval, enriched relevance feedback, automatic image segmentation
[5] Markus Koskela and Jorma Laaksonen. Specification of information interfaces in PinView. Technical Report TKK-ICS-R12, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report defines the information interfaces for the PinView project to facilitate the planned research of the project. Successful collaborative research between the multiple project sites requires that the individual efforts can directly support each other. The report contains definitions for the used file system structure, for various file formats, and for data transfer between the project sites. The report will be updated regularly during the project.

Keywords: information interfaces, file formats, data transfer, content-based image retrieval
[6] Jorma Laaksonen. Definition of enriched relevance feedback in PicSOM. Technical Report TKK-ICS-R13, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report defines and implements communication principles and data formats for transferring enriched relevance feedback to the PicSOM content-based image retrieval system used in the PinView project. The modalities of enriched relevance feedback include recorded eye movements, pointer and keyboard events and audio including speech. The communication is based on the AJAX technology, where the client and server exchange XML formatted content by using the XMLHttpRequest method.

Keywords: content-based image retrieval, enriched relevance feedback, file formats, data transfer, AJAX, XMLHttpRequest
[7] Alex P. Leung and Peter Auer. An efficient search algorithm for content-based image retrieval with user feedback. In Proceedings of the First International Workshop on Video Mining (VM'08) in association with IEEE International Conference on Data Mining (ICDM 2008), pages 884-890, Pisa, Italy, December 2008. [ DOI | Pdf ]
We propose a probabilistic model for the relevance feedback of users looking for target images. This model takes into account user errors and user uncertainty about distinguishing similarly relevant images. Based on this model, we have developed an algorithm, which selects images to be presented to the user for further relevance feedback until a satisfactory image is found. In each query session, the algorithm maintains weights on the images in the database which reflect the assumed relevance of the images. Relevance feedback is used to modify these weights. As a second ingredient, the algorithm uses a minimax principle to select images for presentation to the user: any response of the user will provide significant information about his query, such that relatively few feedback rounds are sufficient to find a satisfactory image. We have implemented this algorithm and have conducted experiments on both simulated data and real data which show promising results.

Keywords: content-based image retrieval, search algorithms, relevance feedback
[8] Peter Auer, Thomas Jaksch, and Ronald Ortner. Near-optimal regret bounds for reinforcement learning. In NIPS08, Advances in Neural Information Processing Systems, pages 89-96, Vancouver, Canada, December 2008. MIT Press. [ Pdf ]
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s' there is a policy which moves from s to s' in at most D steps (on average). We present a reinforcement learning algorithm with total regret O(DS sqrt(AT)) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Omega(sqrt(DSAT)) on the total regret of any learning algorithm.

[9] Tom Diethe, Zakria Hussain, David R. Hardoon, and John Shawe-Taylor. Matching pursuit kernel Fisher discriminant analysis. In International conference on Artificial Intelligence and Statistics (AISTATS), 2009. [ Pdf ]
We derive a novel sparse version of Kernel Fisher Discriminant Analysis (KFDA) using an approach based on Matching Pursuit (MP). We call this algorithm Matching Pursuit Kernel Fisher Discriminant Analysis (MPKFDA). We provide generalisation error bounds analogous to those constructed for the Robust Minimax algorithm together with a sample compression bounding technique. We present experimental results on real world datasets, which show that MPKFDA is competitive with the KFDA and the SVM on UCI datasets, and additional experiments that show that the MPKFDA on average outperforms KFDA and SVM in extremely high dimensional settings.

[10] Mats Sjöberg and Jorma Laaksonen. Optimal combination of SOM search in best-matching units and map neighborhood. In Proceedings of 7th International Workshop on Self-Organizing Maps (WSOM 2009), volume 5629 of Lecture Notes in Computer Science, pages 281-289, St. Augustine, Florida, USA, 2009. Springer. [ Pdf ]
The distribution of a class of objects, such as images depicting a specific topic, can be studied by observing the best-matching units (BMUs) of the objects' feature vectors on a Self-Organizing Map (SOM). When the BMU “hits” on the map are summed up, the class distribution may be seen as a two-dimensional histogram or discrete probability density. Due to the SOM's topology preserving property, one is motivated to smooth the value field and spread out the values spatially to neighboring units, from where one may expect to find further similar objects. In this paper we study the impact of using more map units than just the single BMU of each feature vector in modeling the class distribution. We demonstrate that by varying the number of units selected in this way and varying the width of the spatial convolution one can find an optimal combination which maximizes the class detection performance.

[11] Ville Viitaniemi and Jorma Laaksonen. Spatial extensions to bag of visual words. In Proceedings of ACM International Conference on Image and Video Retrieval (CIVR 2009), Santorini, Greece, July 2009. [ Pdf ]
The Bag of Visual Words (BoV) paradigm has successfully been applied to image content analysis tasks such as image classification and object detection. The basic BoV approach overlooks spatial descriptor distribution within images. Here we describe spatial extensions to BoV and experimentally compare them in the VOC2007 benchmark image category detection task. In particular, we compare two ways for tiling images geometrically: soft tiling approach-proposed here-and the traditional hard tiling technique. The experiments also address two methods of fusing information from several tilings of the images: post-classifier fusion and fusion on the level of a SVM kernel.

The experiments confirm that the performance of a BoV system can be greatly enhanced by taking the descriptors' spatial distribution into account. The soft tiling technique performs well even with a single tiling mask, whereas multi-mask fusion is necessary for good category detection performance in case of hard tiling. The evaluated fusion mechanisms performed approximately equally well.

[12] Tom Diethe and Zakria Hussain. Kernel polytope faces pursuit. In European Conference in Machine Learning (ECML), 2009. [ Pdf ]
Polytope Faces Pursuit (PFP) is a greedy algorithm that approximates the sparse solutions recovered by L1 regularised least-squares (Lasso) [4, 10] in a similar vein to (Orthogonal) Matching Pursuit (OMP) [16]. The algorithm is based on the geometry of the polar polytope where at each step a basis function is chosen by finding the maximal vertex using a path-following method. The algorithmic complexity is of a similar order to OMP whilst being able to solve problems known to be hard for (O)MP. Matching Pursuit was extended to build kernel-based solutions to machine learning problems, resulting in the sparse regression algorithm, Kernel Matching Pursuit (KMP) [17]. We develop a new algorithm to build sparse kernel-based solutions using PFP, which we call Kernel Polytope Faces Pursuit (KPFP). We show the usefulness of this algorithm by providing a generalisation error bound [7] that takes into account a natural regression loss and experimental results on several benchmark datasets.

Keywords: Polytope Faces Pursuit, Orthogonal Matching Pursuit, Pseudodimension, Sample Compression Bounds, Regression, Kernel methods
[13] Ivan Markovsky, Diana Sima, and Sabine Van Huffel. Total least squares methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2(2):212-217, 2010. [ Pdf ]
Recent advances in total least squares approaches for solving various errors-in-variables modeling problems are reviewed, with emphasis on the following generalizations: - the use of weighted norms as a measure of the data perturbation size, capturing prior knowledge about uncertainty in the data; - the addition of constraints on the perturbation to preserve the structure of the data matrix, motivated by structured data matrices occurring in signal and image processing, systems and control, and computer algebra; - the use of regularization in the problem formulation, aiming at stabilizing the solution by decreasing the effect due to intrinsic ill-conditioning of certain problems.

[14] Julian McAuley, Teófilo de Campos, Gabriela Csurka, and Florent Perronnin. Hierarchical image-region labeling via structured learning. In Proceedings of The 20th British Machine Vision Conference (BMVC 2009), London, UK, September 2009. [ Pdf ]
We present a graphical model which encodes a series of hierarchical constraints for classifying image regions at multiple scales. We show that inference in this model can be performed efficiently and exactly, rendering it amenable to structured learning. Rather than using feature vectors derived from images themselves, our model is parametrised using the outputs of a series of first-order classifiers. Thus our model learns which classifiers are useful at different scales, and also the relationships between classifiers at different scales. We present promising results on the VOC2007 and VOC2008 datasets.

Keywords: Structured learning, image segmentation
[15] Kitsuchart Pasupa, Arto Klami, Craig J. Saunders, Teófilo de Campos, and Samuel Kaski. Can relevance of images be inferred from eye movements? In 15th European Conference on Eye Movements (ECEM'2009), page 50, Southamton, UK, August 2009. [ Pdf ]
Searching for images from a large collection is a difficult task for automated algorithms. Many current techniques rely on items which have been manually `tagged' with descriptors. This situation is not ideal, as it is difficult to formulate the initial query, and navigate the large number of hits returned. In order to present relevant images to the user, many systems rely on an explicit feedback mechanism. A machine learning algorithm can be used to present a new set of relevant images to the user - thus increasing hit rates. In this work we use eye movements to assist a user when performing such a task, and ask this basic question: “Is it possible to replace or complement scarce explicit feedback with implicit feedback inferred from various sensors not specifically designed for the task?” We give initial results on a range of tasks and experiments which extend those presented in the Multimedia Information Retrieval conference (MIR'08). In reasonably controlled setups, fairly simple eye movements' features in conjunction with machine learning techniques are capable of judging the relevance of an image based on eye movements alone, without using any explicit feedback - therefore potentially assisting the user in a task.

[16] László Kozma, Arto Klami, and Samuel Kaski. GaZIR: Gaze-based zooming interface for image retrieval. In Eleventh International Conference on Multimodal Interfaces (ICMI-MLMI), Cambridge, Massachusetts, USA, 2009. [ DOI | Pdf ]
We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming interface can be connected to anycontent-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gazepatterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking.

[17] Martin Antenreiter, Ronald Ortner, and Peter Auer. Combining classifiers for improved multilabel image classification. In Learning from Multi-label Data, MLD Workshop at ECML 2009, pages 16-27, Bled, Slovenia, 2009. [ Pdf ]
We propose a stacking-like method for multilabel image classification. Our approach combines the output of binary base learners, which use different features for image description, in a simple and straightforward way: The confidence values of the base learners are fed into a support vector machine (SVM) in order to improve prediction accuracy. Experiments on the datasets of the Pascal Visual Object Classes challenges (VOC) of 2006 and 2007 show that our method significantly improves over the performance of the base learners. Our approach also works better than more naive approaches for combining features or classifiers.

[18] Peter Auer and Alex P. Leung. Relevance feedback models for content-based image retrieval. In Multimedia Analysis, Processing and Communications, pages 59-79. Springer, 2009. [ DOI | Pdf ]
We investigate models for content-based image retrieval with relevance feedback, in particular focusing on the exploration-exploitation dilemma. We propose quantitative models for the user behavior and investigate implications of these models. Three search algorithms for efficient searches based on the user models are proposed and evaluated. In the first model a user queries a database for the most (or a sufficiently) relevant image. The user gives feedback to the system by selecting the most relevant image from a number of images presented by the system. In the second model we consider a filtering task where relevant images should be extracted from a database and presented to the user. The feedback of the user is a binary classification of each presented image as relevant or irrelevant. While these models are related, they differ significantly in the kind of feedback provided by the user. This requires very different mechanisms to trade off exploration (finding out what the user wants) and exploitation (serving images which the system believes relevant for the user).

[19] Kitsuchart Pasupa, Craig Saunders, Sandor Szedmak, Arto Klami, Samuel Kaski, and Steve Gunn. Learning to rank images from eye movements. In Proceedings of the 12th IEEE International Conference on Computer Vision, (ICCV'2009), Workshops on Human-Computer Interaction, (HCI'2009), pages 2009-2016, Kyoto, Japan, October 2009. [ DOI | Pdf ]
Combining multiple information sources can improve the accuracy of search in information retrieval. This paper presents a new image search strategy which combines image features together with implicit feedback from users' eye movements, using them to rank images. In order to better deal with larger data sets, we present a perceptron formulation of the Ranking Support Vector Machine algorithm. We present initial results on inferring the rank of images presented in a page based on simple image features and implicit feedback of users. The results show that the perceptron algorithm improves the results, and that fusing eye movements and image histograms gives better rankings to images than either of these features alone.

[20] Ronald Ortner. Exploiting similarity information in reinforcement learning - similarity models for multi-armed bandits and mdps. In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence (ICAART 2010), pages 203-210, Valencia, Spain, 2010. INSTICC Press. [ Pdf ]
This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.

[21] Ivan Markovsky. Matrix centering and low-rank approximation. Technical report, University of Southampton, School of Electronics and Computer Science, 2009. [ Pdf ]
Low-rank approximation, better known in the machine learning literature as principal component analysis, is a data modeling tool. Data centering, on the other hand, is a common preprocessing step. This paper studies the combination of low-rank approximation with data centering. In the case of no weights and no constraints, the common preprocessing practice of mean subtraction leads to optimal results and therefore need not be changed. The formal proof of this fact, however, reveals that preprocessing by mean subtraction is not the only way to optimally preprocess the data. In the case of weighted and/or constrained low-rank approximation problems, the common preprocessing practice leads to suboptimal results and therefore should be changed. The paper shows, how classical solution methods for weighted and structured low-rank approximation can be modified for doing optimal preprocessing at the same time as low-rank approximation.

[22] Kitsuchart Pasupa, Sandor Szedmak, and David Hardoon. Image ranking with eye movements. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS'2009), Workshop on Advances in Ranking, pages 37-42, Whistler, Canada, December 2009. [ Pdf ]
In order to help users navigate an image search system, one could provide explicit rank information on a set of images. These rankings are learnt so to present a new set of relevant images. Although, requiring explicit information may not be feasible in some cases, we consider the setting where the user provides implicit feedback, eye movements, to assist in such a task. This paper explores the idea of implicitly incorporating eye movement features in an image ranking task. Previous work had demonstrated that combining eye movement and image features improved the retrieval accuracy. Despite promising results the proposed approach is unrealistic as no eye movements are given a-priori for new images. We propose a novel search approach which combines image together with eye movements features in a tensor Ranking Support Vector Machine, and show that by extracting the individual source-specific weight vectors we are able to construct a new image-based semantic space which outperforms in retrieval accuracy.

[23] David Hardoon and Kitsuchart Pasupa. Image ranking with implicit feedback from eye movements. In Proceedings of the 6th Biennial Symposium on Eye Tracking Research & Applications (ETRA'2010), Austin, USA, 2010. [ Pdf ]
In order to help users navigate an image search system, one could provide explicit information on a small set of images as to which of them are relevant or not to their task. These rankings are learned in order to present a user with a new set of images that are relevant to their task. Requiring such explicit information may not be feasible in a number of cases, we consider the setting where the user provides implicit feedback, eye movements, to assist when performing such a task. This paper explores the idea of implicitly incorporating eye movement features in an image ranking task where only images are available during testing. Previous work had demonstrated that combining eye movement and image features improved on the retrieval accuracy when compared to using each of the sources independently. Despite these encouraging results the proposed approach is unrealistic as no eye movements will be presented a-priori for new images (i.e. only after the ranked images are presented would one be able to measure a user's eye movements on them). We propose a novel search methodology which combines image features together with implicit feedback from users' eye movements in a tensor ranking Support Vector Machine and show that it is possible to extract the individual source-specific weight vectors. Furthermore, we demonstrate that the decomposed image weight vector is able to construct a new image-based semantic space that outperforms the retrieval accuracy than when solely using the image-features.

[24] Ronald Ortner. Online regret bounds for Markov decision processes with deterministic transitions. Theoretical Computer Science, 411(29-30):2684-2695, 2010. [ DOI | Pdf ]
We consider an upper confidence bound algorithm for learning in Markov decision processes with deterministic transitions. For this algorithm we derive upper bounds on the online regret with respect to an (epsilon-)optimal policy that are logarithmic in the number of steps taken. We also present a corresponding lower bound. As an application, multi-armed bandits with switching cost are considered.

[25] Ivan Markovsky. Algorithms and literate programs for weighted low-rank approximation with missing data. Technical report, University of Southampton, School of Electronics and Computer Science, 2009. [ Pdf ]
Linear models identification from data with missing values is posed as a weighted low-rank approximation problem with weights related to the missing values equal to zero. Alternating projections and variable projections methods for solving the resulting problem are outlined and implemented in a literate programming style, using Matlab/Octave's scripting language. The methods are evaluated on synthetic data and real data from the MovieLens data sets.

[26] Zakria Hussain, Kitsuchart Pasupa, and John Shawe-Taylor. Learning relevant eye movement feature spaces across users. In Proceedings of the 6th Biennial Symposium on Eye Tracking Research & Applications (ETRA'2010), Austin, USA, March 2010. [ DOI | Pdf ]
In this paper we predict the relevance of images based on a low-dimensional feature space found using several users' eye movements. Each user is given an image-based search task, during which their eye movements are extracted using a Tobii eye tracker. The users also provide us with explicit feedback regarding the relevance of images. We demonstrate that by using a greedy Nystrom algorithm on the eye movement features of different users, we can find a suitable low-dimensional feature space for learning. We validate the suitability of this feature space by projecting the eye movement features of a new user into this space, training an online learning algorithm using these features, and showing that the number of mistakes (regret over time) made in predicting relevant images is lower than when using the original eye movement features. We also plot Recall-Precision and ROC curves, and use a sign test to verify the statistical significance of our results.

[27] Ivan Markovsky. Algorithms and literate programs for weighted low-rank approximation with missing data. In A. Iske, E. Georgoulis, and J. Levesley, editors, Springer Proceedings in Mathematics, volume 3, pages 255-273. Springer, 2011. [ DOI | Pdf ]
Linear models identification from data with missing values is posed as a weighted low-rank approximation problem with weights related to the missing values equal to zero. Alternating projections and variable projections methods for solving the resulting problem are outlined and implemented in a literate programming style, using Matlab/Octave's scripting language. The methods are evaluated on synthetic data and real data from the MovieLens data sets.

[28] Julian J. McAuley, Teófilo de Campos, and Tibério S. Caetano. Unified graph matching in euclidean spaces. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1871-1878, San Francisco, June 2010. IEEE. [ DOI | Pdf ]
Graph matching is a classical problem in pattern recognition with many applications, particularly when the graphs are embedded in Euclidean spaces, as is often the case for computer vision. There are several variants of the matching problem, concerned with isometries, isomorphisms, homeomorphisms, and node attributes; different approaches exist for each variant. We show how structured estimation methods from machine learning can be used to combine such variants into a single version of graph matching. In this paradigm, the extent to which our datasets reveal isometries, isomorphisms, homeomorphisms, and other properties is automatically accounted for in the learning process so that any such specific qualification of graph matching loses meaning. We present experiments with real computer vision data showing the leverage of this unified formulation.

[29] Florent Perronnin, Yan Liu, Jorge Sanchez, and Herve Poirier. Large-scale image retrieval with compressed fisher vectors. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 3384-3391, San Francisco, CA, USA, June 2010. IEEE. [ DOI ]
The problem of large-scale image search has been traditionally addressed with the bag-of-visual-words (BOV). In this article, we propose to use as an alternative the Fisher kernel framework. We first show why the Fisher representation is well-suited to the retrieval problem: it describes an image by what makes it different from other images. One drawback of the Fisher vector is that it is high-dimensional and, as opposed to the BOV, it is dense. The resulting memory and computational costs do not make Fisher vectors directly amenable to large-scale retrieval. Therefore, we compress Fisher vectors to reduce their memory footprint and speed-up the retrieval. We compare three binarization approaches: a simple approach devised for this representation and two standard compression techniques. We show on two publicly available datasets that compressed Fisher vectors perform very well using as little as a few hundreds of bits per image, and significantly better than a very recent compressed BOV approach.

[30] Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning. The Journal of Machine Learning Research, 11:1563-1600, April 2010. [ Pdf ]
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps (on average). We present a reinforcement learning algorithm with total regret Õ(DSAT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. A corresponding lower bound of Ω(√DSAT) on the total regret of any learning algorithm is given as well. These results are complemented by a sample complexity bound on the number of suboptimal steps taken by our algorithm. This bound can be used to achieve a (gap-dependent) regret bound that is logarithmic in T. Finally, we also consider a setting where the MDP is allowed to change a fixed number of l times. We present a modification of our algorithm that is able to deal with this setting and show a regret bound of Õ(l^1/3 T^2/3DS √A).

[31] Muayed Al-Huseiny, Sasan Mahmoodi, and Mark Nixon. Gait learning-based regenerative model: a level set approach. In 20th International Conference on Pattern Recognition, ICPR, 2010. [ DOI | Pdf ]
We propose a learning method for gait synthesis from a sequence of shapes(frames) with the ability to extrapolate to novel data. It involves the application of PCA, first to reduce the data dimensionality to certain features, and second to model corresponding features derived from the training gait cycles as a Gaussian distribution. This approach transforms a non Gaussian shape deformation problem into a Gaussian one by considering features of entire gait cycles as vectors in a Gaussian space. We show that these features which we formulate as continuous functions can be modeled by PCA. We also use this model to in-between (generate intermediate unknown) shapes in the training cycle. Furthermore, this paper demonstrates that the derived features can be used in the identification of pedestrians.

Keywords: Computer Vision, Gait Analysis, Level Sets, PCA, Statistical Models
[32] Ivan Markovsky. Bibliography on total least squares and related methods. Statistics and Its Interface, 3(3):329-334, 2010. [ Url | Pdf ]
The class of total least squares methods has been growing since the basic total least squares method was proposed by Golub and Van Loan in the 70's. Efficient and robust computational algorithms were developed and properties of the resulting estimators were established in the errors-in-variables setting. At the same time the developed methods were applied in diverse areas, leading to broad literature on the subject. This paper collects the main references and guides the reader in finding details about the total least squares methods and their applications. In addition, the paper comments on similarities and differences between the total least squares and the singular spectrum analysis methods.

[33] Peter Auer and Ronald Ortner. Ucb revisited: improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):55-65, 2010. [ DOI | Pdf ]
In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const· (Klog(T))/(Δ), where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const · (Klog(TΔ2 ))/(Δ).

Keywords: multi-armed bandit problem, regret
[34] Peter Auer, Zakria Hussain, Samuel Kaski, Arto Klami, Jussi Kujala, Jorma Laaksonen, Alex P. Leung, Kitsuchart Pasupa, and John Shawe-Taylor. Pinview: Implicit feedback in content-based image retrieval. In ICML Workshop on Reinforcement Learning and Search in Very Large Spaces, Haifa, Israel, June 2010. [ Pdf ]
This paper describes Pinview, a content-based image retrieval system that exploits implicit relevance feedback during a search session. The goal is to retrieve interesting images and the relevance feedback could be eye movements or clicks on the images. Pinview contains several novel methods that infer the intent of the user. From relevance feedback and visual features of images Pinview learns a similarity metric between images which depends on the current interests of the user. It then retrieves images with a specialized reinforcement learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user. In practise, Pinview is integrated to the content-based image retrieval system PicSOM, so it is possible to apply it to realworld image databases. Preliminary experiments show that eye movements provide a rich input modality from which it is possible to learn the interests of the user.

[35] Arto Klami. Inferring task-relevant image regions from gaze data. In IEEE International Workshop on Machine Learning for Signal Processing, MLSP, Kittilä, Finland, 2010. [ DOI | Pdf ]
A number of studies have recently used eye movements of a user inspecting the content as implicit relevance feedback for proactive retrieval systems. Typically binary feedback for images or text paragraphs is inferred from the gaze pattern. We seek to make such feedback richer for image retrieval, by inferring which parts of the image the user found relevant. For this purpose, we present a novel Bayesian mixture model for inferring possible target regions directly from gaze data alone, and show how the relevance of those regions can then be inferred using a simple classifier that is independent of the content or the task.

[36] Zakria Hussain, Alex P. Leung, Kitsuchart Pasupa, David R. Hardoon, Peter Auer, and John Shawe-Taylor. Exploration-exploitation of eye movement enriched multiple feature spaces for content-based image retrieval. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECMLPKDD, pages 554-569. Springer, 2010. [ DOI | Pdf ]
In content-based image retrieval (CBIR) with relevance feedback we would like to retrieve relevant images based on their content features and the feedback given by users. In this paper we view CBIR as an Exploration-Exploitation problem and apply a kernel version of the LinRel algorithm to solve it. By using multiple feature extraction methods and utilising the feedback given by users, we adopt a strategy of multiple kernel learning to find a relevant feature space for the kernel LinRel algorithm. We call this algorithm LinRelMKL. Furthermore, when we have access to eye movement data of users viewing images we can enrich our (multiple) feature spaces by using a tensor kernel SVM. When learning in this enriched space we show that we can significantly improve the search results over the LinRel and LinRelMKL algorithms. Our results suggest that the use of exploration-exploitation with multiple feature spaces is an efficient way of constructing CBIR systems, and that when eye movement features are available, they should be used to help improve CBIR.

Keywords: content-based image retrieval, LinRel, images, eye movements, multiple kernel learning, tensor kernel SVM
[37] Peter Auer, Zakria Hussain, Samuel Kaski, Arto Klami, Jussi Kujala, Jorma Laaksonen, Alex P. Leung, Kitsuchart Pasupa, and John Shawe-Taylor. Pinview: Implicit feedback in content-based image retrieval. In Workshop on Applications of Pattern Analysis, WAPA, pages 51-57, 2010. [ Pdf ]
This paper describes Pinview, a content-based image retrieval system that exploits implicit relevance feedback during a search session. Pinview contains several novel methods that infer the intent of the user. From relevance feedback, such as eye movements or clicks, and visual features of images Pinview learns a similarity metric between images which depends on the current interests of the user. It then retrieves images with a specialized reinforcement learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user. In practise, we have integrated Pinview to the content-based image retrieval system PicSOM, in order to apply it to real- world image databases. Preliminary experiments show that eye movements provide a rich input modality from which it is possible to learn the interests of the user.

[38] Sandor Szedmak, Yizhao Ni, and Steve R. Gunn. Maximum margin learning with incomplete data: Learning networks instead of tables. In Workshop on Applications of Pattern Analysis, WAPA, 2010. [ Pdf ]
In this paper we address the problem of predicting when the available data is incomplete. We show that changing the generally accepted table-wise view of the sample items into a graph representable one allows us to solve these kind of problems in a very concise way by using the well known convex, one-class classification based, optimisation framework. The use of the one-class formulation in the learning phase and in the prediction as well makes the entire procedure highly consistent. The graph representation can express the complex interdependencies among the data sources. The underlying optimisation problem can be transformed into a on-line algorithm, e.g. a perceptron type one, and in this way it can deal with data sets of million items. This framework covers and encompasses supervised, semi-supervised and some unsupervised learning problems. Furthermore, the data sources can be chosen as not only simple binary variables or vectors but text documents, images or even graphs with complex internal structures.

[39] Teemu Ruokolainen, Tanel Alumäe, and Marcus Dobrinkat. Using dependency grammar features in whole sentence maximum entropy language model for speech recognition. In Fourth International Conference Baltic HLT, 2010. [ DOI | Pdf ]
In automatic speech recognition, the standard choice for a language model is the well-known n-gram model. The n-grams are used to predict the probability of a word given its n-1 preceding words. However, the n-gram model is not able to explicitly learn grammatical relations of the sentence. In the present work, in order to augment the n-gram model with grammatical features, we apply the Whole Sentence Maximum Entropy framework. The grammatical features are head-modifier relations between pairs of words, together with the labels of the relationships, obtained with the dependency grammar. We evaluate the model in a large vocabulary speech recognition task with Wall Street Journal speech corpus. The results show a substantial improvement in both test set perplexity and word error rate.

[40] Muayed Al-Huseiny, Sasan Mahmoodi, and Mark Nixon. Robust rigid shape registration method using a level set formulation. In 6th International Symposium on Visual Computing, volume 6454, pages 252-261, Las Vegas, Nevada, USA, 2010. Springer. [ DOI | Pdf ]
This paper presents a fast algorithm for robust registration of shapes implicitly represented by signed distance functions(SDFs). The proposed algorithm aims to recover the transformation parameters(scaling, rotation, and translation) by minimizing the dissimilarity between two shapes. To achieve a robust and fast algorithm, linear orthogonal transformations are employed to minimize the dissimilarity measures. The algorithm is applied to various shape registration problems, to address issues such as topological invariance, shape complexity, and convergence speed and stability. The outcomes are compared with other state-of-the-art shape registration algorithms to show the advantages of the new technique.

[41] Sasan Mahmoodi. Anisotropic diffusion for noise removal of band pass signals. Signal Processing, 91(5):1298-1307, 2011. [ DOI | Pdf ]
A noise removal method for band pass signals based on the anisotropic diffusion algorithm originally put forward by Perona and Malik is proposed in this paper. The anisotropic smoothing algorithm proposed here is for band pass signals modulated with a constant carrier frequency. A partial differential equation to smooth band pass noisy signals is derived. The propagator of this differential equation is also analytically calculated in this paper. An appropriate linear operator is then considered here for such band pass signals to form an anisotropic diffusion algorithm. The algorithm proposed here demonstrates better performance for band pass noisy signals containing discontinuities in comparison with the traditional Perona–Malik (PM) algorithm and is robust in the presence of excessive noise with SNR less than unity.

Keywords: Noise removal; Band pass signals; Perona–Malik algorithm; Anisotropic diffusion
[42] Sasan Mahmoodi. Scale invariant filtering design and analysis for edge detection. Proc.of the Royal Society A: Mathematical, Physical, and Engineering Sciences, 467(2130):1719-1738, 2010. [ DOI | Pdf ]
Existing edge detection filters work well on straight edges but make significant errors near sharp corners by producing rounded corners. This is due to the fact that the edge maps produced by these filters are scale variant. We enhance Canny’s optimality criteria to incorporate detection performance near corners as an explicit design objective. The resulting optimal filter, termed ‘Bessel integral filter’, can be derived analytically and exhibits superior performance over recent alternatives, both in terms of numerical accuracy and experimental fidelity. A noise-free localization index is also derived here to account for the detection accuracy of discontinuities forming sharp corners in the absence of noise. We prove here that edges detected by the filters that are not optimal with respect to this noise-free localization index are scale variant. However, the Bessel integral filter proposed here is optimal with respect to the noise-free localization index and therefore it is a scale-invariant filter.

Keywords: edge detection, Canny-like criteria, Bessel integral filter, scale-space smoothing, two-dimensional filtering design, scale-invariant filters
[43] Sasan Mahmoodi and Steve Gunn. Scale space smoothing, image feature extraction and bessel filters. In 17th Scandinavian Conference on Image Analysis, Ystad, Sweden, May 2011. Springer. [ Url | Pdf ]
The Green function of Mumford-Shah functional in the absence of discontinuities is known to be a modified Bessel function of the second kind and zero degree. Such a Bessel function is regularized here and used as a filter for feature extraction. It is demonstrated in this paper that a Bessel filter does not follow the scale space smoothing property of bounded linear filters such as Gaussian filters. The features extracted by the Bessel filter are therefore scale invariant. Edges, blobs, and junctions are features considered here to show that the extracted features remain unchanged by varying the scale of a Bessel filter. The scale invariance property of Bessel filters for edges is analytically proved here. We conjecture that Bessel filters also enjoy this scale invariance property for other kinds of features.

[44] Sasan Mahmoodi and Steve Gunn. Snake based unsupervised texture segmentation using gaussian markov random field models. In 18th IEEE International Conference on Image Processing, Brussels, Belgium, September 2011. IEEE. [ Url | Pdf ]
A functional for unsupervised texture segmentation is investigated in this paper. An auto-normal model based on Markov Random Fields is employed to model textures. The functional investigated here is optimized with respect to the model parameters and the evolving contour to simultaneously estimate model parameters and find the boundaries between textures. Experimental results applied on the textures of the Brodatz album demonstrate the superior performance and the higher speed of convergence of this algorithm in comparison with a traditional stochastic algorithm in the literature.

[45] He Zhang, Teemu Ruokolainen, Jorma Laaksonen, Christina Hochleitner, and Rudolf Traunmüller. Gaze- and speech-enhanced content-based image retrieval in image tagging. In International Conference on Artificial Neural Networks, ICANN. Springer, 2011. [ Pdf ]
We describe a setup and experiments where users are checking and correcting image tags given by an automatic tagging system. We study how much the application of a content-based image retrieval (CBIR) method speeds up the process of finding and correcting the erroneously-tagged images. We also analyze the use of implicit relevance feedback from the user's gaze tracking patterns as a method for boosting up the CBIR performance. Finally, we use automatic speech recognition for giving the correct tags for those images that were wrongly tagged. The experiments show a large variance in the tagging task performance, which we believe is primarily caused by the users' subjectivity in image contents as well as their varying familiarity with the gaze tracking and speech recognition setups. The results suggest potentials for gaze and/or speech enhanced CBIR method in image tagging, at least for some users.

Keywords: content-based image retrieval, automatic image tagging, gaze tracking, speech recognition
[46] Zakria Hussain and John Shawe-Taylor. Improved loss bounds for multiple kernel learning. In International Conference on Artificial Intelligence and Statistics, AISTATS, volume 15, Fort Lauderdale, FL, USA, 2011. [ Pdf ]
We propose two new generalization error bounds for multiple kernel learning (MKL). First, using the bound of Srebro and Ben-David (2006) as a starting point, we derive a new version which uses a simple counting argument for the choice of kernels in order to generate a tighter bound when 1-norm reg- ularization (sparsity) is imposed in the ker- nel learning problem. The second bound is a Rademacher complexity bound which is ad- ditive in the (logarithmic) kernel complexity and margin term. This dependence is supe- rior to all previously published Rademacher bounds for learning a convex combination of kernels, including the recent bound of Cortes et al. (2010), which exhibits a multiplicative interaction. We illustrate the tightness of our bounds with simulations.