Sign Languages (SL) are the main medium of communication for the deaf community. However, only the deaf themselves, their families and interpreters (less than 1% of the total population) learn it. Consequently, improving communication with hearing people is a fundamental need for the deaf community. In this sense, this project covers different studies of Deep Learning techniques applied to improve communication and quality life for deaf people.
This study addresses the challenge of improving communication between the deaf and hearing community by exploring different Sign Language Recognition (SLR) techniques. Due to privacy issues and the need for validation by interpreters, creating large-scale sign language (SL) datasets can be difficult. This issue is addressed by presenting a new Spanish isolated sign language recognition dataset, CALSE-1000, consisting of 5000 videos representing 1000 glosses, with various signers and scenarios. The study also proposes using different computer vision techniques, such as faceswapping and affine transformations, to augment the SL dataset and improve the accuracy of the model I3D trained using them.
In this research, a rule-based system, called ruLSE, is presented to generate synthetic Spanish Sign Language datasets. To test the usefulness of these datasets, experiments are performed with two state-of-the-art transformer-based models, MarianMT and Trasformer-STMC.
In a first instance, a mobile application named SignUS was developed to collect videos of deaf people, signers and interpreters in order to get a semi-labeled set of videos (the user records a predetermined phrase, but without segmenting each gloss). Although approximately 20 associations of deaf people are contacted, we do not get users that allow us to accumulate a new dataset. However, the application is easily adaptable to recording the deaf person and supplying the video to a translation model that returns the text to the deaf person.
In addition, we are carrying out a master's degree final project associated with our working group related to this study, more specifically:
The project was supported by FEDER/Junta de Andalucía - Paidi 2020/ _Proyecto P20_01213, which consists of the following members:
- Juan Antonio Álvarez García (PI)
- Miguel Ángel Martínez del Amor (research team)
- Fernando Sancho Caparrini (research team)
- Luis Miguel Soria Morillo (research team)
- Álvaro Arcos García (collaborator team)
- Diego Cabrera Mendieta (collaborator team)
- Javier de la Rosa Pérez (collaborator team)
- Marina Perea Trigo (collaborator team)
- Macarena Vilches Suárez (collaborator team)
- José Luis Salazar González (staff)
- José Antonio Rodríguez Gallego (staff)
- José Morera Figueroa (staff)
The research associated with this project was also supported by grants from NVIDIA and used an A100 GPU donated by the NVIDIA HW Grant awarded to our colleague Miguel A. Martinez-del-Amor.
We would like to especially thank the following team members for their work:
- Macarena Vilches Suárez
- Celia Botella López
- José Antonio Rodríguez Gallego
- José Morera Figueroa
- José Luis Salazar González