FP7   EU

Publications

[1] Ronald Ortner. Online regret bounds for markov decision processes with deterministic transitions. In Yoav Freund, László Györfi, and György Turán, editors, Proceedings of the 19th International Conference on Algorithmic Learning Theory, volume 5254 of Lecture Notes in Artificial Intelligence, pages 123-137. Springer, 2008. [ Pdf ]
We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the online regret (with respect to an (epsilon-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptotic bounds for the general MDP setting. We also present corresponding lower bounds. As an application, multi-armed bandits with switching cost are considered.

Keywords: reinforcement learning, Markov decision processes, multi-armed bandit problem
[2] Arto Klami, Craig Saunders, Teófilo de Campos, and Samuel Kaski. Can relevance of images be inferred from eye movements? In MIR '08: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pages 134-140, New York, NY, USA, 2008. ACM. [ Pdf ]
Query formulation and efficient navigation through data to reach relevant results are undoubtedly major challenges for image or video retrieval. Queries of good quality are typically not available and the search process needs to rely on relevance feedback given by the user, which makes the search process iterative. Giving explicit relevance feedback is laborious, not always easy, and may even be impossible in ubiquitous computing scenarios. A central question then is: Is it possible to replace or complement scarce explicit feedback with implicit feedback inferred from various sensors not specifically designed for the task? In this paper, we present preliminary results on inferring the relevance of images based on implicit feedback about users' attention, measured using an eye tracking device. It is shown that, in reasonably controlled setups at least, already fairly simple features and classifiers are capable of detecting the relevance based on eye movements alone, without using any explicit feedback.

Keywords: eye movements, image retrieval, implicit relevance feedback
[3] He Zhang, Markus Koskela, and Jorma Laaksonen. Report on forms of enriched relevance feedback. Technical Report TKK-ICS-R10, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report presents a literature survey conducted to review the current state of the art in research concerning the use of eye movement measurements and other non-conventional and implicit relevance feedback modalities in content-based image and information retrieval. We define and elaborate on the concept of enriched relevance feedback and study its applicability in the intersection of the aforementioned research areas.

Keywords: enriched relevance feedback, implicit relevance feedback, eye movement measurements, content-based image retrieval
[4] Ville Viitaniemi and Jorma Laaksonen. Evaluation of pointer click relevance feedback in PicSOM. Technical Report TKK-ICS-R11, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report presents the results of a series of experiments where knowledge of the most relevant part of images is given as additional information to a content-based image retrieval system. The most relevant parts have been identified by search-task-dependent pointer clicks on the images. As such they provide a rudimentary form of explicit enriched relevance feedback and to some extent mimic genuine implicit eye movement measurements which are essential ingredients of the PinView project.

Keywords: content-based image retrieval, enriched relevance feedback, automatic image segmentation
[5] Markus Koskela and Jorma Laaksonen. Specification of information interfaces in PinView. Technical Report TKK-ICS-R12, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report defines the information interfaces for the PinView project to facilitate the planned research of the project. Successful collaborative research between the multiple project sites requires that the individual efforts can directly support each other. The report contains definitions for the used file system structure, for various file formats, and for data transfer between the project sites. The report will be updated regularly during the project.

Keywords: information interfaces, file formats, data transfer, content-based image retrieval
[6] Jorma Laaksonen. Definition of enriched relevance feedback in PicSOM. Technical Report TKK-ICS-R13, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland, November 2008. [ Pdf ]
This report defines and implements communication principles and data formats for transferring enriched relevance feedback to the PicSOM content-based image retrieval system used in the PinView project. The modalities of enriched relevance feedback include recorded eye movements, pointer and keyboard events and audio including speech. The communication is based on the AJAX technology, where the client and server exchange XML formatted content by using the XMLHttpRequest method.

Keywords: content-based image retrieval, enriched relevance feedback, file formats, data transfer, AJAX, XMLHttpRequest
[7] Alex P. Leung and Peter Auer. An efficient search algorithm for content-based image retrieval with user feedback. In Proceedings of the First International Workshop on Video Mining (VM'08) in association with IEEE International Conference on Data Mining (ICDM 2008), Pisa, Italy, December 2008. [ Pdf ]
We propose a probabilistic model for the relevance feedback of users looking for target images. This model takes into account user errors and user uncertainty about distinguishing similarly relevant images. Based on this model, we have developed an algorithm, which selects images to be presented to the user for further relevance feedback until a satisfactory image is found. In each query session, the algorithm maintains weights on the images in the database which reflect the assumed relevance of the images. Relevance feedback is used to modify these weights. As a second ingredient, the algorithm uses a minimax principle to select images for presentation to the user: any response of the user will provide significant information about his query, such that relatively few feedback rounds are sufficient to find a satisfactory image. We have implemented this algorithm and have conducted experiments on both simulated data and real data which show promising results.

Keywords: content-based image retrieval, search algorithms, relevance feedback
[8] Peter Auer, Thomas Jaksch, and Ronald Ortner. Near-optimal regret bounds for reinforcement learning. In NIPS08, Advances in Neural Information Processing Systems, pages 89-96, Vancouver, Canada, December 2008. MIT Press. [ Pdf ]
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s' there is a policy which moves from s to s' in at most D steps (on average). We present a reinforcement learning algorithm with total regret O(DS sqrt(AT)) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Omega(sqrt(DSAT)) on the total regret of any learning algorithm.

[9] Tom Diethe, Zakria Hussain, David R. Hardoon, and John Shawe-Taylor. Matching pursuit kernel Fisher discriminant analysis. In International conference on Artificial Intelligence and Statistics (AISTATS), 2009. [ Pdf ]
We derive a novel sparse version of Kernel Fisher Discriminant Analysis (KFDA) using an approach based on Matching Pursuit (MP). We call this algorithm Matching Pursuit Kernel Fisher Discriminant Analysis (MPKFDA). We provide generalisation error bounds analogous to those constructed for the Robust Minimax algorithm together with a sample compression bounding technique. We present experimental results on real world datasets, which show that MPKFDA is competitive with the KFDA and the SVM on UCI datasets, and additional experiments that show that the MPKFDA on average outperforms KFDA and SVM in extremely high dimensional settings.

[10] Mats Sjöberg and Jorma Laaksonen. Optimal combination of SOM search in best-matching units and map neighborhood. In Proceedings of 7th International Workshop on Self-Organizing Maps (WSOM 2009), volume 5629 of Lecture Notes in Computer Science, pages 281-289, St. Augustine, Florida, USA, 2009. Springer. [ Pdf ]
The distribution of a class of objects, such as images depicting a specific topic, can be studied by observing the best-matching units (BMUs) of the objects' feature vectors on a Self-Organizing Map (SOM). When the BMU “hits” on the map are summed up, the class distribution may be seen as a two-dimensional histogram or discrete probability density. Due to the SOM's topology preserving property, one is motivated to smooth the value field and spread out the values spatially to neighboring units, from where one may expect to find further similar objects. In this paper we study the impact of using more map units than just the single BMU of each feature vector in modeling the class distribution. We demonstrate that by varying the number of units selected in this way and varying the width of the spatial convolution one can find an optimal combination which maximizes the class detection performance.

[11] Ville Viitaniemi and Jorma Laaksonen. Spatial extensions to bag of visual words. In Proceedings of ACM International Conference on Image and Video Retrieval (CIVR 2009), Santorini, Greece, July 2009. [ Pdf ]
The Bag of Visual Words (BoV) paradigm has successfully been applied to image content analysis tasks such as image classification and object detection. The basic BoV approach overlooks spatial descriptor distribution within images. Here we describe spatial extensions to BoV and experimentally compare them in the VOC2007 benchmark image category detection task. In particular, we compare two ways for tiling images geometrically: soft tiling approach-proposed here-and the traditional hard tiling technique. The experiments also address two methods of fusing information from several tilings of the images: post-classifier fusion and fusion on the level of a SVM kernel.

The experiments confirm that the performance of a BoV system can be greatly enhanced by taking the descriptors' spatial distribution into account. The soft tiling technique performs well even with a single tiling mask, whereas multi-mask fusion is necessary for good category detection performance in case of hard tiling. The evaluated fusion mechanisms performed approximately equally well.

[12] Tom Diethe and Zakria Hussain. Kernel polytope faces pursuit. In European Conference in Machine Learning (ECML), 2009. [ Pdf ]
Polytope Faces Pursuit (PFP) is a greedy algorithm that approximates the sparse solutions recovered by L1 regularised least-squares (Lasso) [4, 10] in a similar vein to (Orthogonal) Matching Pursuit (OMP) [16]. The algorithm is based on the geometry of the polar polytope where at each step a basis function is chosen by finding the maximal vertex using a path-following method. The algorithmic complexity is of a similar order to OMP whilst being able to solve problems known to be hard for (O)MP. Matching Pursuit was extended to build kernel-based solutions to machine learning problems, resulting in the sparse regression algorithm, Kernel Matching Pursuit (KMP) [17]. We develop a new algorithm to build sparse kernel-based solutions using PFP, which we call Kernel Polytope Faces Pursuit (KPFP). We show the usefulness of this algorithm by providing a generalisation error bound [7] that takes into account a natural regression loss and experimental results on several benchmark datasets.

Keywords: Polytope Faces Pursuit, Orthogonal Matching Pursuit, Pseudodimension, Sample Compression Bounds, Regression, Kernel methods
[13] Ivan Markovsky, Diana Sima, and Sabine Van Huffel. Total least squares methods. Wiley Interdisciplinary Reviews: Computational Statistics, (2):212-217, 2010. [ Pdf ]
Recent advances in total least squares approaches for solving various errors-in-variables modeling problems are reviewed, with emphasis on the following generalizations: - the use of weighted norms as a measure of the data perturbation size, capturing prior knowledge about uncertainty in the data; - the addition of constraints on the perturbation to preserve the structure of the data matrix, motivated by structured data matrices occurring in signal and image processing, systems and control, and computer algebra; - the use of regularization in the problem formulation, aiming at stabilizing the solution by decreasing the effect due to intrinsic ill-conditioning of certain problems.

[14] Julian McAuley, Teófilo de Campos, Gabriela Csurka, and Florent Perronnin. Hierarchical image-region labeling via structured learning. In Proceedings of The 20th British Machine Vision Conference (BMVC 2009), London, UK, September 2009. [ Pdf ]
We present a graphical model which encodes a series of hierarchical constraints for classifying image regions at multiple scales. We show that inference in this model can be performed efficiently and exactly, rendering it amenable to structured learning. Rather than using feature vectors derived from images themselves, our model is parametrised using the outputs of a series of first-order classifiers. Thus our model learns which classifiers are useful at different scales, and also the relationships between classifiers at different scales. We present promising results on the VOC2007 and VOC2008 datasets.

Keywords: Structured learning, image segmentation
[15] Kitsuchart Pasupa, Arto Klami, Craig J. Saunders, Teófilo de Campos, and Samuel Kaski. Can relevance of images be inferred from eye movements? In 15th European Conference on Eye Movements (ECEM'2009), page 50, Southamton, UK, August 2009. [ Pdf ]
Searching for images from a large collection is a difficult task for automated algorithms. Many current techniques rely on items which have been manually `tagged' with descriptors. This situation is not ideal, as it is difficult to formulate the initial query, and navigate the large number of hits returned. In order to present relevant images to the user, many systems rely on an explicit feedback mechanism. A machine learning algorithm can be used to present a new set of relevant images to the user - thus increasing hit rates. In this work we use eye movements to assist a user when performing such a task, and ask this basic question: “Is it possible to replace or complement scarce explicit feedback with implicit feedback inferred from various sensors not specifically designed for the task?” We give initial results on a range of tasks and experiments which extend those presented in the Multimedia Information Retrieval conference (MIR'08). In reasonably controlled setups, fairly simple eye movements' features in conjunction with machine learning techniques are capable of judging the relevance of an image based on eye movements alone, without using any explicit feedback - therefore potentially assisting the user in a task.

[16] László Kozma, Arto Klami, and Samuel Kaski. GaZIR: Gaze-based zooming interface for image retrieval. In Eleventh International Conference on Multimodal Interfaces (ICMI-MLMI), Cambridge, Massachusetts, USA, 2009. [ Pdf ]
We introduce GaZIR, a gaze-based interface for browsing and searching for images. The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. The key novelty is that the relevance feedback is inferred from implicit cues obtained in real-time from the gaze pattern, using an estimator learned during a separate training phase. The natural zooming interface can be connected to anycontent-based information retrieval engine operating on user feedback. We show with experiments on one engine that there is sufficient amount of information in the gazepatterns to make the estimated relevance feedback a viable choice to complement or even replace explicit feedback by pointing-and-clicking.

[17] Martin Antenreiter, Ronald Ortner, and Peter Auer. Combining classifiers for improved multilabel image classification. In Learning from Multi-label Data, MLD Workshop at ECML 2009, pages 16-27, 2009. [ Pdf ]
We propose a stacking-like method for multilabel image classification. Our approach combines the output of binary base learners, which use different features for image description, in a simple and straightforward way: The confidence values of the base learners are fed into a support vector machine (SVM) in order to improve prediction accuracy. Experiments on the datasets of the Pascal Visual Object Classes challenges (VOC) of 2006 and 2007 show that our method significantly improves over the performance of the base learners. Our approach also works better than more naive approaches for combining features or classifiers.

[18] Peter Auer and Alex P. Leung. Relevance feedback models for content-based image retrieval. In Multimedia Analysis, Processing and Communications. Springer, 2009. [ Pdf ]
We investigate models for content-based image retrieval with relevance feedback, in particular focusing on the exploration-exploitation dilemma. We propose quantitative models for the user behavior and investigate implications of these models. Three search algorithms for efficient searches based on the user models are proposed and evaluated. In the first model a user queries a database for the most (or a sufficiently) relevant image. The user gives feedback to the system by selecting the most relevant image from a number of images presented by the system. In the second model we consider a filtering task where relevant images should be extracted from a database and presented to the user. The feedback of the user is a binary classification of each presented image as relevant or irrelevant. While these models are related, they differ significantly in the kind of feedback provided by the user. This requires very different mechanisms to trade off exploration (finding out what the user wants) and exploitation (serving images which the system believes relevant for the user).

[19] Kitsuchart Pasupa, Craig Saunders, Sandor Szedmak, Arto Klami, Samuel Kaski, and Steve Gunn. Learning to rank images from eye movements. In Proceedings of the 12th IEEE International Conference on Computer Vision, (ICCV'2009), Workshops on Human-Computer Interaction, (HCI'2009), Kyoto, Japan, October 2009. [ Pdf ]
Combining multiple information sources can improve the accuracy of search in information retrieval. This paper presents a new image search strategy which combines image features together with implicit feedback from users' eye movements, using them to rank images. In order to better deal with larger data sets, we present a perceptron formulation of the Ranking Support Vector Machine algorithm. We present initial results on inferring the rank of images presented in a page based on simple image features and implicit feedback of users. The results show that the perceptron algorithm improves the results, and that fusing eye movements and image histograms gives better rankings to images than either of these features alone.

[20] Ronald Ortner. Exploiting similarity information in reinforcement learning - similarity models for multi-armed bandits and mdps. In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence (ICAART 2010), pages 203-210, 2010. [ Pdf ]
This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.

[21] Ivan Markovsky. Matrix centering and low-rank approximation. Technical report, University of Southampton, School of Electronics and Computer Science, 2009. [ Pdf ]
Low-rank approximation, better known in the machine learning literature as principal component analysis, is a data modeling tool. Data centering, on the other hand, is a common preprocessing step. This paper studies the combination of low-rank approximation with data centering. In the case of no weights and no constraints, the common preprocessing practice of mean subtraction leads to optimal results and therefore need not be changed. The formal proof of this fact, however, reveals that preprocessing by mean subtraction is not the only way to optimally preprocess the data. In the case of weighted and/or constrained low-rank approximation problems, the common preprocessing practice leads to suboptimal results and therefore should be changed. The paper shows, how classical solution methods for weighted and structured low-rank approximation can be modified for doing optimal preprocessing at the same time as low-rank approximation.

[22] Kitsuchart Pasupa, Sandor Szedmak, and David Hardoon. Image ranking with eye movements. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS'2009), Workshop on Advances in Ranking, pages 37-42, Whistler, Canada, December 2009. [ Pdf ]
In order to help users navigate an image search system, one could provide explicit rank information on a set of images. These rankings are learnt so to present a new set of relevant images. Although, requiring explicit information may not be feasible in some cases, we consider the setting where the user provides implicit feedback, eye movements, to assist in such a task. This paper explores the idea of implicitly incorporating eye movement features in an image ranking task. Previous work had demonstrated that combining eye movement and image features improved the retrieval accuracy. Despite promising results the proposed approach is unrealistic as no eye movements are given a-priori for new images. We propose a novel search approach which combines image together with eye movements features in a tensor Ranking Support Vector Machine, and show that by extracting the individual source-specific weight vectors we are able to construct a new image-based semantic space which outperforms in retrieval accuracy.

[23] David Hardoon and Kitsuchart Pasupa. Image ranking with implicit feedback from eye movements. In Proceedings of the 6th Biennial Symposium on Eye Tracking Research & Applications (ETRA'2010), Austin, USA, 2010. [ Pdf ]
In order to help users navigate an image search system, one could provide explicit information on a small set of images as to which of them are relevant or not to their task. These rankings are learned in order to present a user with a new set of images that are relevant to their task. Requiring such explicit information may not be feasible in a number of cases, we consider the setting where the user provides implicit feedback, eye movements, to assist when performing such a task. This paper explores the idea of implicitly incorporating eye movement features in an image ranking task where only images are available during testing. Previous work had demonstrated that combining eye movement and image features improved on the retrieval accuracy when compared to using each of the sources independently. Despite these encouraging results the proposed approach is unrealistic as no eye movements will be presented a-priori for new images (i.e. only after the ranked images are presented would one be able to measure a user's eye movements on them). We propose a novel search methodology which combines image features together with implicit feedback from users' eye movements in a tensor ranking Support Vector Machine and show that it is possible to extract the individual source-specific weight vectors. Furthermore, we demonstrate that the decomposed image weight vector is able to construct a new image-based semantic space that outperforms the retrieval accuracy than when solely using the image-features.

[24] Ronald Ortner. Online regret bounds for Markov decision processes with deterministic transitions. Theoretical Computer Science, 2010. [ Pdf ]
We consider an upper confidence bound algorithm for learning in Markov decision processes with deterministic transitions. For this algorithm we derive upper bounds on the online regret with respect to an (epsilon-)optimal policy that are logarithmic in the number of steps taken. We also present a corresponding lower bound. As an application, multi-armed bandits with switching cost are considered.

[25] Ivan Markovsky. Algorithms and literate programs for weighted low-rank approximation with missing data. Technical report, University of Southampton, School of Electronics and Computer Science, 2009. [ Pdf ]
Linear models identification from data with missing values is posed as a weighted low-rank approximation problem with weights related to the missing values equal to zero. Alternating projections and variable projections methods for solving the resulting problem are outlined and implemented in a literate programming style, using Matlab/Octave's scripting language. The methods are evaluated on synthetic data and real data from the MovieLens data sets.

[26] Zakria Hussain, Kitsuchart Pasupa, and John Shawe-Taylor. Learning relevant eye movement feature spaces across users. In Proceedings of the 6th Biennial Symposium on Eye Tracking Research & Applications (ETRA'2010), Austin, USA, March 2010. [ Pdf ]
In this paper we predict the relevance of images based on a low-dimensional feature space found using several users' eye movements. Each user is given an image-based search task, during which their eye movements are extracted using a Tobii eye tracker. The users also provide us with explicit feedback regarding the relevance of images. We demonstrate that by using a greedy Nystrom algorithm on the eye movement features of different users, we can find a suitable low-dimensional feature space for learning. We validate the suitability of this feature space by projecting the eye movement features of a new user into this space, training an online learning algorithm using these features, and showing that the number of mistakes (regret over time) made in predicting relevant images is lower than when using the original eye movement features. We also plot Recall-Precision and ROC curves, and use a sign test to verify the statistical significance of our results.

[27] Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning. The Journal of Machine Learning Research, 11:1563-1600, April 2010. [ Pdf ]
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps (on average). We present a reinforcement learning algorithm with total regret Õ(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. A corresponding lower bound of Ω(√DSAT) on the total regret of any learning algorithm is given as well. These results are complemented by a sample complexity bound on the number of suboptimal steps taken by our algorithm. This bound can be used to achieve a (gap-dependent) regret bound that is logarithmic in T. Finally, we also consider a setting where the MDP is allowed to change a fixed number of l times. We present a modification of our algorithm that is able to deal with this setting and show a regret bound of Õ(l^1/3 T^2/3DS √A).