Impact of the CIFAR Datasets on the development of deep learning

Nicolas Sacchetti 2024-05-13

On May 23, 2023, 4POINT0 were pleased to welcome Aldo Geuna, professor at the

Aldo Geuna

department of social sciences – University of Turin, and CIFAR Fellow, for the Anchor 4POINT0 webinar The Making of a New Technoscience: The Impact of the CIFAR on databases.

by Nicolas Sacchetti

In this webinar, Aldo Geuna takes us through the project he carried out with his academic colleagues Daniel Souza and Jeff Rodrigues, to assess the 2009 CIFAR Databases (Canadian Institute for Advanced Research), highlighting their significant contribution to the development of deep learning.

The team employed a mixed-methodology approach, incorporating interviews, surveys, and both bibliometric and econometric analyses.

The CIFAR datasets, a noteworthy Canadian contribution to the field developed by Alex Krizhevsky and his colleagues at the University of Toronto, is renowned in the computer vision arena for training and te

sting machine learning algorithms, particularly convolutional neural networks (CNNs). It features two principal versions: CIFAR-10 and CIFAR-100, with its significance notably highlighted in « Learning Multiple Layers of Features from Tiny Images » by Alex Krizhevsky (2009)

The research findings underscored CIFAR-10’s crucial role in the foundational developments leading to the deep learning revolution. Contrary to the researchers’ initial expectations, the dataset continues to be highly relevant in scientific research and plays a key role in technological advancements.

Furthermore, the study revealed an unexpected aspect: CIFAR-10’s significant impact on education in the fields of deep learning, machine learning, and computer vision, underscoring its value as a teaching resource.

Geoffrey E. Hinton

Geoffrey E. Hinton

Professor Geuna mentioned Geoffrey Hinton as a key inspiration for conducting this study. Often hailed as the godfather of deep learning for his pivotal contributions to AI, particularly in developing deep learning techniques, Hinton has voiced concerns over the potential for AI to become overly powerful and uncontrollable. He warns of the ethical, security, and existential risks such advancements could pose to humanity.

How did CIFAR datasets contribute to the development of deep learning?

This specific research question is at the forefront of the study. « Without deep learning we wouldn’t be here, » says Professor Geuna, adding that now deep learning is the dominant paradigm in AI.

He also puts forward that CIFAR datasets have been crucial in advancing research and innovation in deep learning, partly due to their availability as an open-access resource. Their use has enabled the testing and improvement of various deep learning techniques and models, thus significantly contributing to the evolution of this field.

Results of the Interviews

Aldo Geuna presents an overview of the study’s interview findings, highlighting three key insights. The first major point is the concept of « Bridging the Gaps. » Geuna explains how the CIFAR-10 dataset (University of Toronto, 2009) not only progressed beyond the MNIST dataset (AT&T, BELL Laboratories, 1998) in terms of data complexity and diversity, but also remained more manageable compared to ImageNet (Princeton University, 2009). This balance positioned CIFAR-10 as a crucial intermediary in the evolution of deep learning, with its specific attributes playing a significant role.

Secondly, the simplicity of CIFAR-10 is emphasized. Geuna points out that the dataset’s straightforward design facilitated the testing and iteration of various neural network architectures without the need for excessive computational resources.

Lastly, Geuna underscores the pedagogical significance of CIFAR-10, noting its persistent relevance and utility in deep learning, attributed to its educational value. The dataset’s success and its continued role as a foundational tool in the field underscore its pivotal importance. He explains that training has become an essential mechanism by which this specific dataset has risen to prominence, enabling users to effectively learn how to navigate and utilize such structures.

Survey Results

Importance of CIFAR databases (CIFAR-10 and CIFAR-100) for the development of deep learning and computer vision.

Comparing CIFAR-10 and CIFAR-100 to other databases

Pedagogical Perspectives of CIFAR-10

Summary Survey Results

To summarize the survey results of his study, Aldo Geuna emphasizes the CIFAR databases’ significant contribution to the fields of deep learning and computer vision. According to his survey, 76% of respondents believe the CIFAR databases have been important for these fields’ development, with over 40% considering them extremely important.

Geuna further discusses the importance of Open Access and Data Characteristics. Notably, 90% of participants identified data availability as the crucial factor for choosing the CIFAR-10 database. Furthermore, 87% highlighted its comparability (benchmarking), while the quality of labeling and the number of images were also deemed important by 72% and 66%, respectively.

Regarding the pedagogical aspect, the relevance of CIFAR databases for training is underscored. The data indicates these databases are widely used in graduate school education and continues to play a significant role in teaching, as evidenced by responses to 190 open and closed questions.

Summary Econometric Analysis

Discussing the scientific impact of his study, Aldo Geuna notes the significant influence of CIFAR-10, particularly in its early years. The dataset received 226%—more than double—the number of citations on average compared to other datasets. Additionally, the research team observed weaker evidence of its ongoing influence, albeit not as pronounced as that of ImageNet (highlighted by the ImageNet competition and the resolution of scientific problems in 2014 serving as benchmarks).

Regarding the technological impact, Geuna emphasizes that CIFAR-10’s impact surpasses that of ImageNet and remains pertinent in recent years. This sustained relevance is primarily attributed to citations of the initial seminal papers that have significantly contributed to the deep learning revolution.

Direct and Indirect Effect

The research team discovered that CIFAR-10 played a crucial role in the developments leading to the deep learning revolution and continues to influence the field’s trajectory. The direct effect was especially significant during the initial years of deep learning and computer vision techniques’ evolution. Furthermore, the dataset has had a notable impact on technological development throughout the decade under review.

As for the indirect effect, Aldo Geuna notes that CIFAR-10 has been, and remains, extensively utilized in the training of computer scientists specializing in deep learning and machine learning. This widespread use in education underlines the dataset’s lasting importance beyond its initial technological contributions.

Ce contenu a été mis à jour le 2024-05-13 à 14 h 02 min.