2024

Anjith George, Sébastien Marcel. Heterogeneous Face Recognition Using Domain Invariant Units. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024. doi:10.1109/ICASSP48485.2024.10447481 | URL
Abstract: Heterogeneous Face Recognition (HFR) aims to expand the applicability of Face Recognition (FR) systems to challenging scenarios, enabling the matching of face images across different domains, such as matching thermal images to visible spectra. However, the development of HFR systems is challenging because of the significant domain gap between modalities and the lack of availability of large-scale paired multi-channel data. In this work, we leverage a pretrained face recognition model as a teacher network to learn domain-invariant network layers called Domain-Invariant Units (DIU) to reduce the domain gap. The proposed DIU can be trained effectively even with a limited amount of paired training data, in a contrastive distillation framework. This proposed approach has the potential to enhance pretrained models, making them more adaptable to a wider range of variations in data. We extensively evaluate our approach on multiple challenging benchmarks, demonstrating superior performance compared to state-of-the-art methods.
Anjith George, Sébastien Marcel. Modality Agnostic Heterogeneous Face Recognition with Switch Style Modulators. arXiv.org, 2024. doi:10.48550/arXiv.2407.08640 | URL
Abstract: Heterogeneous Face Recognition (HFR) systems aim to enhance the capability of face recognition in challenging cross-modal authentication scenarios. However, the significant domain gap between the source and target modalities poses a considerable challenge for cross-domain matching. Existing literature primarily focuses on developing HFR approaches for specific pairs of face modalities, necessitating the explicit training of models for each source-target combination. In this work, we introduce a novel framework designed to train a modality-agnostic HFR method capable of handling multiple modalities during inference, all without explicit knowledge of the target modality labels. We achieve this by implementing a computationally efficient automatic routing mechanism called Switch Style Modulation Blocks (SSMB) that trains various domain expert modulators which transform the feature maps adaptively reducing the domain gap. Our proposed SSMB can be trained end-to-end and seamlessly integrated into pre-trained face recognition models, transforming them into modality-agnostic HFR models. We have performed extensive evaluations on HFR benchmark datasets to demonstrate its effectiveness. The source code and protocols will be made publicly available.
A. Unnervik, Hatef Otroshi Shahreza, Anjith George, Sébastien Marcel. Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks. arXiv.org, 2024. doi:10.48550/arXiv.2402.18718 | URL
Abstract: Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes.
Hatef Otroshi-Shahreza, Christophe Ecabert, Anjith George, A. Unnervik, Sébastien Marcel, Nicolò Di Domenico, Guido Borghi, Davide Maltoni, Fadi Boutros, Julia Vogel, N. Damer, Ángela Sánchez-Pérez, Enrique Mas-Candela, Jorge Calvo-Zaragoza, Bernardo Biesseck, Pedro Vidal, Roger Granada, David Menotti, Ivan Deandres-Tame, Simone Maurizio La Cava, S. Concas, Pietro Melzi, Rubén Tolosana, R. Vera-Rodríguez, Gianpaolo Perelli, G. Orrú, G. L. Marcialis, Julian Fiérrez. SDFR: Synthetic Data for Face Recognition Competition. IEEE International Conference on Automatic Face & Gesture Recognition, 2024. doi:10.1109/FG59268.2024.10581946 | URL
Abstract: Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns. With the recent advances in generative models, recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets. This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024) and established to investigate the use of synthetic data for training face recognition models. The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones. In the first task, the face recognition backbone was fixed and the dataset size was limited, while the second task provided almost complete freedom on the model backbone, the dataset, and the training pipeline. The submitted models were trained on existing and also new synthetic datasets and used clever methods to improve training with synthetic data. The submissions were evaluated and ranked on a diverse set of seven benchmarking datasets. The paper gives an overview of the submitted face recognition models and reports achieved performance compared to baseline models trained on real and synthetic datasets. Furthermore, the evaluation of submissions is extended to bias assessment across different demography groups. Lastly, an outlook on the current state of the research in training face recognition models using synthetic data is presented, and existing problems as well as potential future directions are also discussed.
Pietro Melzi, Rubén Tolosana, R. Vera-Rodríguez, Minchul Kim, C. Rathgeb, Xiaoming Liu, Ivan Deandres-Tame, A. Morales, Julian Fiérrez, J. Ortega-Garcia, Weisong Zhao, Xiangyu Zhu, Zheyu Yan, Xiao-Yu Zhang, Jinlin Wu, Zhen Lei, Suvidha Tripathi, Mahak Kothari, Md Haider Zama, Debayan Deb, Bernardo Biesseck, Pedro Vidal, Roger Granada, Guilherme Fickel, Gustavo Führ, David Menotti, A. Unnervik, Anjith George, Christophe Ecabert, Hatef Otroshi-Shahreza, Parsa Rahimi, Sébastien Marcel, Ioannis Sarridis, C. Koutlis, Georgia Baltsou, Symeon Papadopoulos, Christos Diou, Nicolò Di Domenico, Guido Borghi, Lorenzo Pellegrini, Enrique Mas-Candela, Ángela Sánchez-Pérez, A. Atzori, Fadi Boutros, N. Damer, G. Fenu, M. Marras. FRCSyn-onGoing: Benchmarking and comprehensive evaluation of real and synthetic data to improve face recognition systems. Information Fusion, 2024. doi:10.1016/j.inffus.2024.102322 | URL
Abstract: No abstract available.
Ivan Deandres-Tame, Rubén Tolosana, Pietro Melzi, R. Vera-Rodríguez, Minchul Kim, C. Rathgeb, Xiaoming Liu, A. Morales, Julian Fiérrez, J. Ortega-Garcia, Zhizhou Zhong, Y. Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, A.P. Grigorev, Denis Timoshenko, K. Asfaw, C. Low, Hao Liu, Chuyi Wang, Qing Zuo, Zhixiang He, Hatef Otroshi-Shahreza, Anjith George, A. Unnervik, Parsa Rahimi, Sébastien Marcel, Pedro C. Neto, Marco Huber, J. Kolf, N. Damer, Fadi Boutros, Jaime S. Cardoso, Ana F. Sequeira, A. Atzori, G. Fenu, M. Marras, Vitomir Štruc, Jiang Yu, Zhangjie Li, Jichun Li, Weisong Zhao, Zhen Lei, Xiangyu Zhu, Xiao-Yu Zhang, Bernardo Biesseck, Pedro Vidal, Luiz Coelho, Roger Granada, David Menotti. Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data. arXiv.org, 2024. doi:10.48550/arXiv.2404.10378 | URL
Abstract: No abstract available.
Pavel Korshunov, Anjith George, Gökhan Özbulak, Sébastien Marcel. Vulnerability of Face age Verification to Replay Attacks. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024. doi:10.1109/ICASSP48485.2024.10447255 | URL
Abstract: Presentation attacks on biometric systems have long created significant security risks. The increase in the adoption of age verification systems, which ensure that only age-appropriate content is consumed online, raises the question of vulnerability of such systems to replay presentation attacks. In this paper, we analyze the vulnerability of face age verification to simple replay attacks and assess whether presentation attack detection (PAD) systems created for biometrics can be effective at detecting similar attacks on age verification. We used three types of attacks captured with iPhone 12, Galaxy S9, and Huawei Mate 30 phones from iPad Pro, which replayed the images from a commonly used UTKFace dataset of faces with true age labels. We evaluated four state of the art face age verification algorithms, including simple classification, distribution-based, regression via classification, and adaptive distribution approaches. We show that these algorithms are vulnerable to the attacks, since the accuracy of age verification on replayed images is only a couple of percentage points different compared to when the original images are used, which means an age verification system cannot distinguish attacks from bona fide images. Using two state of the art presentation attack detection systems, DeepPixBiS and CDCN, trained to detect similar attacks on biometrics, we demonstrate that they struggle to detect both: the types of attacks that are possible in age verification scenario and the type of bona fide images that are commonly used. These results highlight the need for the development of age verification specific attack detection systems for age verification to become practical.
Anjith George, Sébastien Marcel. From Modalities to Styles: Rethinking the Domain Gap in Heterogeneous Face Recognition. IEEE Transactions on Biometrics Behavior and Identity Science, 2024. doi:10.1109/TBIOM.2024.3365350 | URL
Abstract: Heterogeneous Face Recognition (HFR) focuses on matching faces from different domains, for instance, thermal to visible images, making Face Recognition (FR) systems more versatile for challenging scenarios. However, the domain gap between these domains and the limited large-scale datasets in the target HFR modalities make it challenging to develop robust HFR models from scratch. In our work, we view different modalities as distinct styles and propose a method to modulate feature maps of the target modality to address the domain gap. We present a new Conditional Adaptive Instance Modulation (CAIM ) module that seamlessly fits into existing FR networks, turning them into HFR-ready systems. The CAIM block modulates intermediate feature maps, efficiently adapting to the style of the source modality and bridging the domain gap. Our method enables end-to-end training using a small set of paired samples. We extensively evaluate the proposed approach on various challenging HFR benchmarks, showing that it outperforms state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available

2023

Pietro Melzi, Rubén Tolosana, R. Vera-Rodríguez, Minchul Kim, C. Rathgeb, Xiaoming Liu, Ivan Deandres-Tame, A. Morales, Julian Fiérrez, J. Ortega-Garcia, Weisong Zhao, Xiangyu Zhu, Zheyu Yan, Xiao-Yu Zhang, Jinlin Wu, Zhen Lei, Suvidha Tripathi, Mahak Kothari, Md Haider Zama, Debayan Deb, Bernardo Biesseck, Pedro Vidal, R. Granada, Guilherme P. Fickel, Gustavo Fuhr, D. Menotti, A. Unnervik, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Parsa Rahimi, Sébastien Marcel, Ioannis Sarridis, C. Koutlis, Georgia Baltsou, Symeon Papadopoulos, Christos Diou, Nicolò Di Domenico, Guido Borghi, Lorenzo Pellegrini, Enrique Mas-Candela, 'Angela S'anchez-P'erez, A. Atzori, Fadi Boutros, N. Damer, G. Fenu, M. Marras. FRCSyn Challenge at WACV 2024: Face Recognition Challenge in the Era of Synthetic Data. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2023. doi:10.1109/WACVW60836.2024.00100 | URL
Abstract: Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.
Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, S. Marcel. EdgeFace: Efficient Face Recognition Model for Edge Devices. IEEE Transactions on Biometrics Behavior and Identity Science, 2023. doi:10.1109/TBIOM.2024.3352164 | URL
Abstract: In this paper, we present EdgeFace - a lightweight and efficient face recognition network inspired by the hybrid architecture of EdgeNeXt. By effectively combining the strengths of both CNN and Transformer models, and a low rank linear layer, EdgeFace achieves excellent face recognition performance optimized for edge devices. The proposed EdgeFace network not only maintains low computational costs and compact storage, but also achieves high face recognition accuracy, making it suitable for deployment on edge devices. The proposed EdgeFace model achieved the top ranking among models with fewer than 2M parameters in the IJCB 2023 Efficient Face Recognition Competition. Extensive experiments on challenging benchmark face datasets demonstrate the effectiveness and efficiency of EdgeFace in comparison to state-of-the-art lightweight models and deep face recognition models. Our EdgeFace model with 1.77M parameters achieves state of the art results on LFW (99.73%), IJB-B (92.67%), and IJB-C (94.85%), outperforming other efficient models with larger computational complexities. The code to replicate the experiments will be made available publicly.
ˇZ. Emerˇsiˇc, T. Ohki, M. Akasaka, T. Arakawa, S. Maeda, M. Okano, Y. Sato, Anjith George, S. Marcel, I. I. Ganapathi, S. S. Ali, S. Javed, N. Werghi, S. G. Is¸ık, E. Sarıtas¸, H. K. Ekenel, V. Hudovernik, J. Kolf, F. Boutros, N. Damer, Grishma Sharma, A. Kamboj, A. Nigam, D. Jain, G. C´amara-Ch´avez, P. Peer, V. ˇStruc. The Unconstrained Ear Recognition Challenge 2023: Maximizing Performance and Minimizing Bias*. 2023 IEEE International Joint Conference on Biometrics (IJCB), 2023. doi:10.1109/IJCB57857.2023.10449062 | URL
Abstract: The paper provides a summary of the 2023 Unconstrained Ear Recognition Challenge (UERC), a benchmarking effort focused on ear recognition from images acquired in uncontrolled environments. The objective of the challenge was to evaluate the effectiveness of current ear recognition techniques on a challenging ear dataset while analyzing the techniques from two distinct aspects, i.e., verification performance and bias with respect to specific demographic factors, i.e., gender and ethnicity. Seven research groups participated in the challenge and submitted a seven distinct recognition approaches that ranged from descriptor-based methods and deep-learning models to ensemble techniques that relied on multiple data representations to maximize performance and minimize bias. A comprehensive investigation into the performance of the submitted models is presented, as well as an in-depth analysis of bias and associated performance differentials due to differences in gender and ethnicity. The results of the challenge suggest that a wide variety of models (e.g., transformers, convolutional neural networks, ensemble models) is capable of achieving competitive recognition results, but also that all of the models still exhibit considerable performance differentials with respect to both gender and ethnicity. To promote further development of unbiased and effective ear recognition models, the starter kit of UERC 2023 together with the baseline model, and training and test data is made available from: http://ears.fri.uni-lj.si/
J. Kolf, Fadi Boutros, Jurek Elliesen, Markus Theuerkauf, N. Damer, Mohamad Alansari, Oussama Abdul Hay, Sara Alansari, S. Javed, N. Werghi, Klemen Grm, Vitomir vStruc, F. Alonso-Fernandez, Kevin Hernandez Diaz, J. Bigun, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Ketan Kotwal, S. Marcel, Iurii Medvedev, Bo-Hao Jin, D. Nunes, Ahmad Hassanpour, Pankaj Khatiwada, A. Toor, Bian Yang. EFaR 2023: Efficient Face Recognition Competition. 2023 IEEE International Joint Conference on Biometrics (IJCB), 2023. doi:10.1109/IJCB57857.2023.10448917 | URL
Abstract: This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size. The evaluation of submissions is extended to bias, cross-quality, and large-scale recognition benchmarks. Overall, the paper gives an overview of the achieved performance values of the submitted solutions as well as a diverse set of baselines. The submitted solutions use small, efficient network architectures to reduce the computational cost, some solutions apply model quantization. An outlook on possible techniques that are underrepresented in current solutions is given as well.
Anjith George, S. Marcel. Bridging the Gap: Heterogeneous Face Recognition with Conditional Adaptive Instance Modulation. arXiv.org, 2023. doi:10.48550/arXiv.2307.07032 | URL
Abstract: Heterogeneous Face Recognition (HFR) aims to match face images across different domains, such as thermal and visible spectra, expanding the applicability of Face Recognition (FR) systems to challenging scenarios. However, the domain gap and limited availability of large-scale datasets in the target domain make training robust and invariant HFR models from scratch difficult. In this work, we treat different modalities as distinct styles and propose a framework to adapt feature maps, bridging the domain gap. We introduce a novel Conditional Adaptive Instance Modulation (CAIM) module that can be integrated into pre-trained FR networks, transforming them into HFR networks. The CAIM block modulates intermediate feature maps, to adapt the style of the target modality effectively bridging the domain gap. Our proposed method allows for end-to-end training with a minimal number of paired samples. We extensively evaluate our approach on multiple challenging benchmarks, demonstrating superior performance compared to state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available.
Hatef Otroshi, Anjith George, S. Marcel. SynthDistill: Face Recognition with Knowledge Distillation from Synthetic Data. 2023 IEEE International Joint Conference on Biometrics (IJCB), 2023. doi:10.1109/IJCB57857.2023.10448642 | URL
Abstract: State-of-the-art face recognition networks are often computationally expensive and cannot be used for mobile applications. Training lightweight face recognition models also requires large identity-labeled datasets. Meanwhile, there are privacy and ethical concerns with collecting and using large face recognition datasets. While generating synthetic datasets for training face recognition models is an alternative option, it is challenging to generate synthetic data with sufficient intra-class variations. In addition, there is still a considerable gap between the performance of models trained on real and synthetic data. In this paper, we propose a new framework (named SynthDistill) to train lightweight face recognition models by distilling the knowledge of a pretrained teacher face recognition model using synthetic data. We use a pretrained face generator network to generate synthetic face images and use the synthesized images to learn a lightweight student network. We use synthetic face images without identity labels, mitigating the problems in the intra-class variation generation of synthetic datasets. Instead, we propose a novel dynamic sampling strategy from the intermediate latent space of the face generator network to include new variations of the challenging images while further exploring new face images in the training batch. The results on five different face recognition datasets demonstrate the superiority of our lightweight model compared to models trained on previous synthetic datasets, achieving a verification accuracy of 99.52% on the LFW dataset with a lightweight network. The results also show that our proposed framework significantly reduces the gap between training with real and synthetic data. The source code for replicating the experiments is publicly released.

2022

M. Ibsen, C. Rathgeb, Fabian Brechtel, Ruben Klepp, K. Pöppelmann, Anjith George, S. Marcel, C. Busch. Attacking Face Recognition With T-Shirts: Database, Vulnerability Assessment, and Detection. IEEE Access, 2022. doi:10.1109/ACCESS.2023.3282780 | URL
Abstract: Face recognition systems are widely deployed for biometric authentication. Despite this, it is well-known that, without any safeguards, face recognition systems are highly vulnerable to presentation attacks. In response to this security issue, several promising methods for detecting presentation attacks have been proposed which show high performance on existing benchmarks. However, an ongoing challenge is the generalization of presentation attack detection methods to unseen and new attack types. To this end, we propose a new T-shirt Face Presentation Attack (TFPA) database of 1,608 T-shirt attacks using 100 unique presentation attack instruments. In an extensive evaluation, we show that this type of attack can compromise the security of face recognition systems and that some state-of-the-art attack detection mechanisms trained on popular benchmarks fail to robustly generalize to the new attacks. Further, we propose three new methods for detecting T-shirt attack images, one which relies on the statistical differences between depth maps of bona fide images and T-shirt attacks, an anomaly detection approach trained on features only extracted from bona fide RGB images, and a fusion approach which achieves competitive detection performance.
Anjith George, S. Marcel. ROBUST FACE PRESENTATION ATTACK DETECTION WITH MULTI-CHANNEL NEURAL NETWORKS. , 2022. DOI not available | URL
Abstract: Vulnerability against presentation attacks remains a challenging issue limiting the reliable use of face recognition systems. Though several methods have been proposed in the literature for the detection of presentation attacks, the majority of these methods fail in generalizing to unseen attacks and environments. Since the quality of attack instruments keeps getting better, the difference between bonafide and attack samples is diminishing making it harder to distinguish them using the visible spectrum alone. In this context, multi-channel presentation attack detection methods have been proposed as a solution to secure face recognition systems. Even with multiple channels, special care needs to be taken to ensure that the model generalizes well in challenging scenarios. In this chapter, we present three different strategies to use multi-channel information for presentation attack detection. Specifically, we present different architecture choices for fusion, along with ad-hoc loss functions as opposed to standard classification objective. We conduct an extensive set of experiments in the HQ-WMCA dataset, which contains a wide variety of attacks and sensing channels together with challenging unseen attack evaluation protocols. We make the protocol, source codes, and data publicly available to enable further extensions of the work.
Anjith George, A. Mohammadi, S. Marcel. Prepended Domain Transformer: Heterogeneous Face Recognition Without Bells and Whistles. IEEE Transactions on Information Forensics and Security, 2022. doi:10.1109/TIFS.2022.3217738 | URL
Abstract: Heterogeneous Face Recognition (HFR) refers to matching face images captured in different domains, such as thermal to visible images (VIS), sketches to visible images, near-infrared to visible, and so on. This is particularly useful in matching visible spectrum images to images captured from other modalities. Though highly useful, HFR is challenging because of the domain gap between the source and target domain. Often, large-scale paired heterogeneous face image datasets are absent, preventing training models specifically for the heterogeneous task. In this work, we propose a surprisingly simple, yet, very effective method for matching face images across different sensing modalities. The core idea of the proposed approach is to add a novel neural network block called Prepended Domain Transformer (PDT) in front of a pre-trained face recognition (FR) model to address the domain gap. Retraining this new block with few paired samples in a contrastive learning setup was enough to achieve state-of-the-art performance in many HFR benchmarks. The PDT blocks can be retrained for several source-target combinations using the proposed general framework. The proposed approach is architecture agnostic, meaning they can be added to any pre-trained FR models. Further, the approach is modular and the new block can be trained with a minimal set of paired samples, making it much easier for practical deployment. The source code and protocols will be made available publicly.
Anjith George, David Geissbuhler, S. Marcel. A Comprehensive Evaluation on Multi-channel Biometric Face Presentation Attack Detection. arXiv.org, 2022. DOI not available | URL
Abstract: The vulnerability against presentation attacks is a crucial problem undermining the wide-deployment of face recognition systems. Though presentation attack detection (PAD) systems try to address this problem, the lack of generalization and robustness continues to be a major concern. Several works have shown that using multi-channel PAD systems could alleviate this vulnerability and result in more robust systems. However, there is a wide selection of channels available for a PAD system such as RGB, Near Infrared, Shortwave Infrared, Depth, and Thermal sensors. Having a lot of sensors increases the cost of the system, and therefore an understanding of the performance of different sensors against a wide variety of attacks is necessary while selecting the modalities. In this work, we perform a comprehensive study to understand the effectiveness of various imaging modalities for PAD. The studies are performed on a multi-channel PAD dataset, collected with 14 different sensing modalities considering a wide range of 2D, 3D, and partial attacks. We used the multi-channel convolutional network-based architecture, which uses pixel-wise binary supervision. The model has been evaluated with different combinations of channels, and different image qualities on a variety of challenging known and unknown attack protocols. The results reveal interesting trends and can act as pointers for sensor selection for safety-critical presentation attack detection systems. The source codes and protocols to reproduce the results are made available publicly making it possible to extend this work to other architectures.

2021

Anjith George, S. Marcel. Multi-channel Face Presentation Attack Detection Using Deep Learning. Advances in Computer Vision and Pattern Recognition, 2021. doi:10.1007/978-3-030-74697-1_13 | URL
Abstract: No abstract available.
Anjith George, S. Marcel. Cross Modal Focal Loss for RGBD Face Anti-Spoofing. Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00779 | URL
Abstract: Automatic methods for detecting presentation attacks are essential to ensure the reliable use of facial recognition technology. Most of the methods available in the literature for presentation attack detection (PAD) fails in generalizing to unseen attacks. In recent years, multi-channel methods have been proposed to improve the robustness of PAD systems. Often, only a limited amount of data is available for additional channels, which limits the effectiveness of these methods. In this work, we present a new framework for PAD that uses RGB and depth channels together with a novel loss function. The new architecture uses complementary information from the two modalities while reducing the impact of overfitting. Essentially, a cross-modal focal loss function is proposed to modulate the loss contribution of each channel as a function of the confidence of individual channels. Extensive evaluations in two publicly available datasets demonstrate the effectiveness of the proposed approach.
Sandip Purnapatra, Nic Smalt, Keivan Bahmani, Priyanka Das, David Yambay, A. Mohammadi, Anjith George, T. Bourlai, S. Marcel, S. Schuckers, Meiling Fang, N. Damer, Fadi Boutros, Arjan Kuijper, Alperen Kantarci, Basar Demir, Zafer Yildiz, Zabi Ghafoory, Hasan Dertli, H. K. Ekenel, Son Vu, V. Christophides, Liang Dashuang, Zhang Guanghao, Hao Zhanlong, Liu Junfu, Jin Yufeng, Samo Liu, Samuel Huang, Salieri Kuei, Jag Mohan Singh, Raghavendra Ramachandra. Face Liveness Detection Competition (LivDet-Face) - 2021. 2021 IEEE International Joint Conference on Biometrics (IJCB), 2021. doi:10.1109/IJCB52358.2021.9484359 | URL
Abstract: Liveness Detection (LivDet)-Face is an international competition series open to academia and industry. The competition’s objective is to assess and report state-of-the-art in liveness / Presentation Attack Detection (PAD) for face recognition. Impersonation and presentation of false samples to the sensors can be classified as presentation attacks and the ability for the sensors to detect such attempts is known as PAD. LivDet-Face 2021 * will be the first edition of the face liveness competition. This competition serves as an important benchmark in face presentation attack detection, offering (a) an independent assessment of the current state of the art in face PAD, and (b) a common evaluation protocol, availability of Presentation Attack Instruments (PAI) and live face image dataset through the Biometric Evaluation and Testing (BEAT) platform. The competition can be easily followed by researchers after it is closed, in a platform in which participants can compare their solutions against the LivDet-Face winners.

2020

Anjith George, S. Marcel. Can Your Face Detector Do Anti-spoofing? Face Presentation Attack Detection with a Multi-Channel Face Detector. arXiv.org, 2020. DOI not available | URL
Abstract: In a typical face recognition pipeline, the task of the face detector is to localize the face region. However, the face detector localizes regions that look like a face, irrespective of the liveliness of the face, which makes the entire system susceptible to presentation attacks. In this work, we try to reformulate the task of the face detector to detect real faces, thus eliminating the threat of presentation attacks. While this task could be challenging with visible spectrum images alone, we leverage the multi-channel information available from off the shelf devices (such as color, depth, and infrared channels) to design a multi-channel face detector. The proposed system can be used as a live-face detector obviating the need for a separate presentation attack detection module, making the system reliable in practice without any additional computational overhead. The main idea is to leverage a single-stage object detection framework, with a joint representation obtained from different channels for the PAD task. We have evaluated our approach in the multi-channel WMCA dataset containing a wide variety of attacks to show the effectiveness of the proposed framework.
Anjith George, S. Marcel. On the Effectiveness of Vision Transformers for Zero-shot Face Anti-Spoofing. 2021 IEEE International Joint Conference on Biometrics (IJCB), 2020. doi:10.1109/IJCB52358.2021.9484333 | URL
Abstract: The vulnerability of face recognition systems to presentation attacks has limited their application in security-critical scenarios. Automatic methods of detecting such malicious attempts are essential for the safe use of facial recognition technology. Although various methods have been suggested for detecting such attacks, most of them over-fit the training set and fail in generalizing to unseen attacks and environments. In this work, we use transfer learning from the vision transformer model for the zero-shot anti-spoofing task. The effectiveness of the proposed approach is demonstrated through experiments in publicly available datasets. The proposed approach outperforms the state-of-the-art methods in the zero-shot protocols in the HQ-WMCA and SiW-M datasets by a large margin. Besides, the model achieves a significant boost in cross-database performance as well.
Anjith George, S. Marcel. Learning One Class Representations for Face Presentation Attack Detection Using Multi-Channel Convolutional Neural Networks. IEEE Transactions on Information Forensics and Security, 2020. doi:10.1109/TIFS.2020.3013214 | URL
Abstract: Face recognition has evolved as a widely used biometric modality. However, its vulnerability against presentation attacks poses a significant security threat. Though presentation attack detection (PAD) methods try to address this issue, they often fail in generalizing to unseen attacks. In this work, we propose a new framework for PAD using a one-class classifier, where the representation used is learned with a Multi-Channel Convolutional Neural Network (MCCNN). A novel loss function is introduced, which forces the network to learn a compact embedding for bonafide class while being far from the representation of attacks. A one-class Gaussian Mixture Model is used on top of these embeddings for the PAD task. The proposed framework introduces a novel approach to learn a robust PAD system from bonafide and available (known) attack classes. This is particularly important as collecting bonafide data and simpler attacks are much easier than collecting a wide variety of expensive attacks. The proposed system is evaluated on the publicly available WMCA multi-channel face PAD database, which contains a wide variety of 2D and 3D attacks. Further, we have performed experiments with MLFP and SiW-M datasets using RGB channels only. Superior performance in unseen attack protocols shows the effectiveness of the proposed approach. Software, data, and protocols to reproduce the results are made available publicly.
G. Heusch, Anjith George, David Geissbuhler, Z. Mostaani, S. Marcel. Deep Models and Shortwave Infrared Information to Detect Face Presentation Attacks. IEEE Transactions on Biometrics Behavior and Identity Science, 2020. doi:10.1109/TBIOM.2020.3010312 | URL
Abstract: This paper addresses the problem of face presentation attack detection using different image modalities. In particular, the usage of short wave infrared (SWIR) imaging is considered. Face presentation attack detection is performed using recent models based on Convolutional Neural Networks using only carefully selected SWIR image differences as input. Conducted experiments show superior performance over similar models acting on either color images or on a combination of different modalities (visible, NIR, thermal and depth), as well as on a SVM-based classifier acting on SWIR image differences. Experiments have been carried on a new public and freely available database, containing a wide variety of attacks. Video sequences have been recorded thanks to several sensors resulting in 14 different streams in the visible, NIR, SWIR and thermal spectra, as well as depth data. The best proposed approach is able to almost perfectly detect all impersonation attacks while ensuring low bonafide classification errors. On the other hand, obtained results show that obfuscation attacks are more difficult to detect. We hope that the proposed database will foster research on this challenging problem. Finally, all the code and instructions to reproduce presented experiments is made available to the research community.
G. Heusch, Anjith George, David Geissbühler, Z. Mostaani, S. Marcel. High-Quality Wide Multi-Channel Attack (HQ-WMCA). , 2020. doi:10.34777/0T0B-EZ97 | URL
Abstract: No abstract available.
Z. Mostaani, Anjith George, G. Heusch, David Geissbuhler, S. Marcel. The High-Quality Wide Multi-Channel Attack (HQ-WMCA) database. arXiv.org, 2020. DOI not available | URL
Abstract: The High-Quality Wide Multi-Channel Attack database (HQ-WMCA) database extends the previous Wide Multi-Channel Attack database(WMCA), with more channels including color, depth, thermal, infrared (spectra), and short-wave infrared (spectra), and also a wide variety of attacks.

2019

Anjith George, S. Marcel. Deep Pixel-wise Binary Supervision for Face Presentation Attack Detection. International Conference on Biometrics, 2019. doi:10.1109/ICB45273.2019.8987370 | URL
Abstract: Face recognition has evolved as a prominent biometric authentication modality. However, vulnerability to presentation attacks curtails its reliable deployment. Automatic detection of presentation attacks is essential for secure use of face recognition technology in unattended scenarios. In this work, we introduce a Convolutional Neural Network (CNN) based framework for presentation attack detection, with deep pixel-wise supervision. The framework uses only frame level information making it suitable for deployment in smart devices with minimal computational and time overhead. We demonstrate the effectiveness of the proposed approach in public datasets for both intra as well as cross-dataset experiments. The proposed approach achieves an HTER of 0% in Replay Mobile dataset and an ACER of 0.42% in Protocol-1 of OULU dataset outperforming state of the art methods.
Anjith George, Z. Mostaani, David Geissenbuhler, Olegs Nikisins, André Anjos, S. Marcel. Biometric Face Presentation Attack Detection With Multi-Channel Convolutional Neural Network. IEEE Transactions on Information Forensics and Security, 2019. doi:10.1109/TIFS.2019.2916652 | URL
Abstract: Face recognition is a mainstream biometric authentication method. However, the vulnerability to presentation attacks (a.k.a. spoofing) limits its usability in unsupervised applications. Even though there are many methods available for tackling presentation attacks (PA), most of them fail to detect sophisticated attacks such as silicone masks. As the quality of presentation attack instruments improves over time, achieving reliable PA detection with visual spectra alone remains very challenging. We argue that analysis in multiple channels might help to address this issue. In this context, we propose a multi-channel Convolutional Neural Network-based approach for presentation attack detection (PAD). We also introduce the new Wide Multi-Channel presentation Attack (WMCA) database for face PAD which contains a wide variety of 2D and 3D presentation attacks for both impersonation and obfuscation attacks. Data from different channels such as color, depth, near-infrared, and thermal are available to advance the research in face PAD. The proposed method was compared with feature-based approaches and found to outperform the baselines achieving an ACER of 0.3% on the introduced dataset. The database and the software to reproduce the results are made available publicly.
Anjith George. Image based Eye Gaze Tracking and its Applications. arXiv.org, 2019. DOI not available | URL
Abstract: Eye movements play a vital role in perceiving the world. Eye gaze can give a direct indication of the users point of attention, which can be useful in improving human-computer interaction. Gaze estimation in a non-intrusive manner can make human-computer interaction more natural. Eye tracking can be used for several applications such as fatigue detection, biometric authentication, disease diagnosis, activity recognition, alertness level estimation, gaze-contingent display, human-computer interaction, etc. Even though eye-tracking technology has been around for many decades, it has not found much use in consumer applications. The main reasons are the high cost of eye tracking hardware and lack of consumer level applications. In this work, we attempt to address these two issues. In the first part of this work, image-based algorithms are developed for gaze tracking which includes a new two-stage iris center localization algorithm. We have developed a new algorithm which works in challenging conditions such as motion blur, glint, and varying illumination levels. A person independent gaze direction classification framework using a convolutional neural network is also developed which eliminates the requirement of user-specific calibration. In the second part of this work, we have developed two applications which can benefit from eye tracking data. A new framework for biometric identification based on eye movement parameters is developed. A framework for activity recognition, using gaze data from a head-mounted eye tracker is also developed. The information from gaze data, ego-motion, and visual features are integrated to classify the activities.
Olegs Nikisins, Anjith George, S. Marcel. Domain Adaptation in Multi-Channel Autoencoder based Features for Robust Face Anti-Spoofing. International Conference on Biometrics, 2019. doi:10.1109/ICB45273.2019.8987247 | URL
Abstract: While the performance of face recognition systems has improved significantly in the last decade, they are proved to be highly vulnerable to presentation attacks (spoofing). Most of the research in the field of face presentation attack detection (PAD), was focused on boosting the performance of the systems within a single database. Face PAD datasets are usually captured with RGB cameras, and have very limited number of both bona-fide samples and presentation attack instruments. Training face PAD systems on such data leads to poor performance, even in the closed-set scenario, especially when sophisticated attacks are involved. We explore two paths to boost the performance of the face PAD system against challenging attacks. First, by using multichannel (RGB, Depth and NIR) data, which is still easily accessible in a number of mass production devices. Second, we develop a novel Autoencoders + MLP based face PAD algorithm. Moreover, instead of collecting more data for training of the proposed deep architecture, the domain adaptation technique is proposed, transferring the knowledge of facial appearance from RGB to multi-channel domain. We also demonstrate, that learning the features of individual facial regions, is more discriminative than the features learned from an entire face. The proposed system is tested on a very recent publicly available multi-channel PAD database with a wide variety of presentation attacks.

2018

Anjith George, A. Routray. Recognition of Activities from Eye Gaze and Egocentric Video. arXiv.org, 2018. DOI not available | URL
Abstract: This paper presents a framework for recognition of human activity from egocentric video and eye tracking data obtained from a head-mounted eye tracker. Three channels of information such as eye movement, ego-motion, and visual features are combined for the classification of activities. Image features were extracted using a pre-trained convolutional neural network. Eye and ego-motion are quantized, and the windowed histograms are used as the features. The combination of features obtains better accuracy for activity classification as compared to individual features.
Anjith George, A. Routray. ESCaF: Pupil Centre Localization Algorithm with Candidate Filtering. arXiv.org, 2018. DOI not available | URL
Abstract: Algorithms for accurate localization of pupil centre is essential for gaze tracking in real world conditions. Most of the algorithms fail in real world conditions like illumination variations, contact lenses, glasses, eye makeup, motion blur, noise, etc. We propose a new algorithm which improves the detection rate in real world conditions. The proposed algorithm uses both edges as well as intensity information along with a candidate filtering approach to identify the best pupil candidate. A simple tracking scheme has also been added which improves the processing speed. The algorithm has been evaluated in Labelled Pupil in the Wild (LPW) dataset, largest in its class which contains real world conditions. The proposed algorithm outperformed the state of the art algorithms while achieving real-time performance.

2017

Anwesha Sengupta, A. Dasgupta, Aritra Chaudhuri, Anjith George, A. Routray, Rajlakshmi Guha. A Multimodal System for Assessing Alertness Levels Due to Cognitive Loading. IEEE transactions on neural systems and rehabilitation engineering, 2017. doi:10.1109/TNSRE.2017.2672080 | URL
Abstract: This paper proposes a scheme for assessing the alertness levels of an individual using simultaneous acquisition of multimodal physiological signals and fusing the information into a single metric for quantification of alertness. The system takes electroencephalogram, high-speed image sequence, and speech data as inputs. Certain parameters are computed from each of these measures as indicators of alertness and a metric is proposed using a fusion of the parameters for indicating alertness level of an individual at an instant. The scheme has been validated experimentally using standard neuropsychological tests, such as the Visual Response Test (VRT), Auditory Response Test (ART), a Letter Counting (LC) task, and the Stroop Test. The tests are used both as cognitive tasks to induce mental fatigue as well as tools to gauge the present degree of alertness of the subject. Correlation between the measures has been studied and the experimental variables have been statistically analyzed using measures such as multivariate linear regression and analysis of variance. Correspondence of trends obtained from biomarkers and neuropsychological measures validate the usability of the proposed metric.
Žiga Emeršič, Dejan Štepec, V. Štruc, P. Peer, Anjith George, Adil Ahmad, E. Omar, T. Boult, Reza Safdari, Yuxiang Zhou, S. Zafeiriou, Dogucan Yaman, Fevziye Irem Eyiokur, H. K. Ekenel. The unconstrained ear recognition challenge. 2017 IEEE International Joint Conference on Biometrics (IJCB), 2017. doi:10.1109/BTAS.2017.8272761 | URL
Abstract: In this paper we present the results of the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions. The goal of the challenge was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques for the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity of the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset (with 180 subjects) regardless of image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.

2016

Anjith George, A. Routray. Real-time eye gaze direction classification using convolutional neural network. International Conference on Signal Processing and Communications, 2016. doi:10.1109/SPCOM.2016.7746701 | URL
Abstract: Estimation eye gaze direction is useful in various human-computer interaction tasks. Knowledge of gaze direction can give valuable information regarding users point of attention. Certain patterns of eye movements known as eye accessing cues are reported to be related to the cognitive processes in the human brain. We propose a real-time framework for the classification of eye gaze direction and estimation of eye accessing cues. In the first stage, the algorithm detects faces using a modified version of the Viola-Jones algorithm. A rough eye region is obtained using geometric relations and facial landmarks. The eye region obtained is used in the subsequent stage to classify the eye gaze direction. A convolutional neural network is employed in this work for the classification of eye gaze direction. The proposed algorithm was tested on Eye Chimera database and found to outperform state of the art methods. The computational complexity of the algorithm is very less in the testing phase. The algorithm achieved an average frame rate of 24 fps in the desktop environment.
Anjith George, A. Routray. Fast and accurate algorithm for eye localisation for gaze tracking in low-resolution images. IET Computer Vision, 2016. doi:10.1049/iet-cvi.2015.0316 | URL
Abstract: Iris centre (IC) localisation in low-resolution visible images is a challenging problem in computer vision community due to noise, shadows, occlusions, pose variations, eye blinks etc. This study proposes an efficient method for determining IC in low-resolution images in the visible spectrum. Even low-cost consumer-grade webcams can be used for gaze tracking without any additional hardware. A two-stage algorithm is proposed for IC localisation. The proposed method uses geometrical characteristics of the eye. In the first stage, a fast convolution-based approach is used for obtaining the coarse location of IC). The IC location is further refined in the second stage using boundary tracing and ellipse fitting. The algorithm has been evaluated in public databases such as BioID, Gi4E and is found to outperform the state-of-the-art methods.
A. Morales, Julian Fierrez, M. Gomez-Barrero, J. Ortega-Garcia, Roberto Daza, John V. Monaco, J. Filho, J. Canuto, Anjith George. KBOC: Keystroke biometrics OnGoing competition. 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), 2016. doi:10.1109/BTAS.2016.7791180 | URL
Abstract: This paper presents the first Keystroke Biometrics Ongoing evaluation platform and a Competition (KBOC) organized to promote reproducible research and establish a baseline in person authentication using keystroke biometrics. The ongoing evaluation tool has been developed using the BEAT platform and includes keystroke sequences (fixed-text) from 300 users acquired in 4 different sessions. In addition, the results of a parallel offline competition based on the same data and evaluation protocol are presented. The results reported have achieved EERs as low as 5.32%, which represent a challenging baseline for keystroke recognition technologies to be evaluated on the new publicly available KBOC benchmark.
Anwesha Sengupta, Anjith George, A. Dasgupta, Aritra Chaudhuri, Bibek Kabi, A. Routray. Alertness Monitoring System for Vehicle Drivers using Physiological Signals. , 2016. doi:10.4018/978-1-5225-0084-1.CH013 | URL
Abstract: The present chapter deals with the development of a robust real-time embedded system which can detect the level of drowsiness in automotive and locomotive drivers based on ocular images and speech signals of the driver. The system has been cross-validated using Electroencephalogram (EEG) as well as Psychomotor response tests. A ratio based on eyelid closure rates called PERcentage of eyelid CLOSure (PERCLOS) using Principal Component Analysis (PCA) and Support Vector Machine (SVM) is employed to determine the state of drowsiness. Besides, the voiced-to-unvoiced speech ratio has also been used. Source localization and synchronization of EEG signals have been employed for detection of various brain stages during various stages of fatigue and cross-validating the algorithms based in image and speech data. The synchronization has been represented in terms of a complex network and the parameters of the network have been used to trace the change in fatigue of sleep-deprived subjects. In addition, subjective feedback has also been obtained.
Anjith George, A. Routray. A score level fusion method for eye movement biometrics. Pattern Recognition Letters, 2016. doi:10.1016/j.patrec.2015.11.020 | URL
Abstract: No abstract available.

2015

A. Dasgupta, Anjith George, S. Happy, A. Routray. A Vision Based System for Monitoring the Loss of Attention in Automotive Drivers. arXiv.org, 2015. DOI not available | URL
Abstract: : In this paper a real time vision based system is proposed to monitor driver fatigue. The whole system is built on the raspberry pi using the Raspbian operating system and OPENCV library for computer vision. The programming of the system is done by using C++ and PYTHON for GPIO programming of raspberry pi development board. The facial features are detected by Haar cascade classifier based on object detection algorithm. The eyes area are detected by using the function in the OpenCV library and tracking by using template matching method. . Vision-based driver fatigue detection method is a natural, non-intrusive and convenient technique to monitor driver’s vigilance. This dissertation attempts to study the driver’s drowsiness technique on Computer Vision (OpenCV) platform which is open source and developed by Intel.
S. Happy, Anjith George, A. Routray. A real time facial expression classification system using Local Binary Patterns. International Conference on Intelligent Human Computer Interaction, 2015. doi:10.1109/IHCI.2012.6481802 | URL
Abstract: Facial expression analysis is one of the popular fields of research in human computer interaction (HCI). It has several applications in next generation user interfaces, human emotion analysis, behavior and cognitive modeling. In this paper, a facial expression classification algorithm is proposed which uses Haar classifier for face detection purpose, Local Binary Patterns(LBP) histogram of different block sizes of a face image as feature vectors and classifies various facial expressions using Principal Component Analysis (PCA). The algorithm is implemented in real time for expression classification since the computational complexity of the algorithm is small. A customizable approach is proposed for facial expression analysis, since the various expressions and intensity of expressions vary from person to person. The system uses grayscale frontal face images of a person to classify six basic emotions namely happiness, sadness, disgust, fear, surprise and anger.
A. Dasgupta, Anshit Mandloi, Anjith George, A. Routray. An improved algorithm for eye corner detection. International Conference on Signal Processing and Communications, 2015. doi:10.1109/SPCOM.2016.7746627 | URL
Abstract: In this paper, a modified algorithm for the detection of nasal and temporal eye corners is presented. The algorithm is a modification of the Santos and Proenka Method. In the first step, we detect the face and the eyes using classifiers based on Haar-like features. We then segment out the sclera, from the detected eye region. From the segmented sclera, we segment out an approximate eyelid contour. Eye corner candidates are obtained using Harris and Stephens corner detector. We introduce a post-pruning of the Eye corner candidates to locate the eye corners, finally. The algorithm has been tested on Yale, JAFFE databases as well as our created database.
A. Dasgupta, Anjith George, S. Happy, A. Routray. An On-board Video Database of Human Drivers. arXiv.org, 2015. DOI not available | URL
Abstract: Detection of fatigue due to drowsiness or loss of attention in human drivers is an evolving area of research. Several algorithms have been implemented to detect the level of fatigue in human drivers by capturing videos of facial image sequences and extracting facial features such as eye closure rates, eye gaze, head nodding, blink frequency etc. However, availability of standard video database to validate such algorithms is insufficient. This paper discusses the creation of such a database created under on-board conditions during the day as well as night. Passive Near Infra-red (NIR) illumination has been used for illuminating the face during night driving since prolonged exposure to active Infra-Red lighting may lead to many health issues. The database contains videos of 30 subjects under actual driving conditions. Variation is ensured as the database contains different head orientations and with different facial expressions, facial occlusions and illumination variation. This new database can be a very valuable resource for development and evaluation of algorithms for the video based detection of driver fatigue.
Anjith George, A. Routray. Design and Implementation of Real-time Algorithms for Eye Tracking and PERCLOS Measurement for on board Estimation of Alertness of Drivers. arXiv.org, 2015. DOI not available | URL
Abstract: The alertness level of drivers can be estimated with the use of computer vision based methods. The level of fatigue can be found from the value of PERCLOS. It is the ratio of closed eye frames to the total frames processed. The main objective of the thesis is the design and implementation of real-time algorithms for measurement of PERCLOS. In this work we have developed a real-time system which is able to process the video onboard and to alarm the driver in case the driver is in alert. For accurate estimation of PERCLOS the frame rate should be greater than 4 and accuracy should be greater than 90%. For eye detection we have used mainly two approaches Haar classifier based method and Principal Component Analysis (PCA) based method for day time. During night time active Near Infra Red (NIR) illumination is used. Local Binary Pattern (LBP) histogram based method is used for the detection of eyes at night time. The accuracy rate of the algorithms was found to be more than 90% at frame rates more than 5 fps which was suitable for the application.
Anjith George, A. Dasgupta, A. Routray. A Framework for Fast Face and Eye Detection. arXiv.org, 2015. DOI not available | URL
Abstract: Face detection is an essential step in many computer vision applications like surveillance, tracking, medical analysis, facial expression analysis etc. Several approaches have been made in the direction of face detection. Among them, Haar-like features based method is a robust method. In spite of the robustness, Haar - like features work with some limitations. However, with some simple modifications in the algorithm, its performance can be made faster and more robust. The present work refers to the increase in speed of operation of the original algorithm by down sampling the frames and its analysis with different scale factors. It also discusses the detection of tilted faces using an affine transformation of the input image.
A. Dasgupta, Bibek Kabi, Anjith George, S. Happy, A. Routray. A Drowsiness Detection Scheme Based on Fusion of Voice and Vision Cues. arXiv.org, 2015. DOI not available | URL
Abstract: Drowsiness level detection of an individual is very important in many safety critical applications such as driving. There are several invasive and contact based methods such as use of blood biochemical, brain signals etc. which can estimate the level of drowsiness very accurately. However, these methods are very difficult to implement in practical scenarios, as they cause discomfort to the user. This paper presents a combined voice and vision based drowsiness detection system well suited to detect the drowsiness level of an automotive driver. The vision and voice based detection, being non-contact methods, has the advantage of their feasibility of implementation. The authenticity of these methods have been cross-validated using brain signals.

2013

A. Dasgupta, Anjith George, S. Happy, A. Routray, Tara Shanker. An on-board vision based system for drowsiness detection in automotive drivers. International Journal of Advances in Engineering Sciences and Applied Mathematics, 2013. doi:10.1007/s12572-013-0086-2 | URL
Abstract: No abstract available.
A. Dasgupta, Anjith George, S. Happy, A. Routray. A Vision-Based System for Monitoring the Loss of Attention in Automotive Drivers. IEEE transactions on intelligent transportation systems (Print), 2013. doi:10.1109/TITS.2013.2271052 | URL
Abstract: Onboard monitoring of the alertness level of an automotive driver has been challenging to research in transportation safety and management. In this paper, we propose a robust real-time embedded platform to monitor the loss of attention of the driver during day and night driving conditions. The percentage of eye closure has been used to indicate the alertness level. In this approach, the face is detected using Haar-like features and is tracked using a Kalman filter. The eyes are detected using principal component analysis during daytime and using the block local-binary-pattern features during nighttime. Finally, the eye state is classified as open or closed using support vector machines. In-plane and off-plane rotations of the driver's face have been compensated using affine transformation and perspective transformation, respectively. Compensation in illumination variation is carried out using bihistogram equalization. The algorithm has been cross-validated using brain signals and, finally, has been implemented on a single-board computer that has an Intel Atom processor with a 1.66-GHz clock, a random access memory of 1 GB, ×86 architecture, and a Windows-embedded XP operating system. The system is found to be robust under actual driving conditions.

2012

S. Happy, A. Dasgupta, Anjith George, A. Routray. A video database of human faces under near Infra-Red illumination for human computer interaction applications. International Conference on Intelligent Human Computer Interaction, 2012. doi:10.1109/IHCI.2012.6481868 | URL
Abstract: Human Computer Interaction (HCI) is an evolving area of research for coherent communication between computers and human beings. Some of the important applications of HCI as reported in literature are face detection, face pose estimation, face tracking and eye gaze estimation. Development of algorithms for these applications is an active field of research. However, availability of standard database to validate such algorithms is insufficient. This paper discusses the creation of such a database created under Near Infra-Red (NIR) illumination. NIR illumination has gained its popularity for night mode applications since prolonged exposure to Infra-Red (IR) lighting may lead to many health issues. The database contains NIR videos of 60 subjects in different head orientations and with different facial expressions, facial occlusions and illumination variation. This new database can be a very valuable resource for development and evaluation of algorithms on face detection, eye detection, head tracking, eye gaze tracking etc. in NIR lighting.