Tools to Undertake Data Analysis

by Nicolas Sacchetti

"Transforming new scientific discoveries into concrete applications for the benefit of society as a whole" is what IVADO, a Québec-based institute in the field of digital intelligence (DI), is dedicated to. Jean-François Connolly is an IVADO advisor in DI. He presents some tools for undertaking data analysis projects in the humanities and social sciences.

The event took place by videoconference on May 11 to 13, 2021 P4IE Conference on Policies, Processes and Practices for Performance of Innovation Ecosystems presented by the 4POINT0 Partnership for Organisation of Innovation and New Technologies. 

Jean-François Connolly, Digital Intelligence Advisor at IVADO

DI is not artificial intelligence (AI). IVADO defines DI as: "A set of tools and methodologies that combine data collection and exploitation with the design and use of models and algorithms to facilitate, enrich and support decision-making."

Understanding: Business Intelligence

The goal is to understand the firm's data. Understand the world and contextualize it with the data that goes with it. For researchers, a dashboard allows them to see the variations and to follow a situation in real time.

 

https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html

Data can be found in clusters of different colours. Dimensionality reduction (in the virtual universe, the number of dimensions is counted in hundreds +++) consists of identifying redundant information and removing it. The DI consultant J-F Connolly proposes a PCA (Principal Component Analysis) software to analyze clusters of data which is managed by Excel of the Microsoft suite.

 

https://arxiv.org/pdf/1804.00079.pdf

In addition, he mentions the t-SNE algorithm (t-distributed stochastic neighbor embedding). "This is a nonlinear method for representing a set of points from a high-dimensional space in a two- or three-dimensional space. The data can then be visualized with a point cloud." The software is available on scikit-learn.org as open source code.

Nelson's 8 Rules

Mr. Connolly invites us to remember simple rules and analysis to be drawn from them without going through the AI. He then mentions Lloyd S. Nelson who updated the Western Electrical Rules (WECO rules) in 1984. The intent is to make the probability of detecting an out-of-control condition. "These rules do a very good job of tracking industrial operations, » says Connolly. (source: Wikipedia)

Rule 1: A point is more than 3 standard deviations from the mean. It is out of control.

 

 

 

Rule 2: Nine or more points in a row are on the same side of the mean. Some extended bias exists.

 

 

 

Rule 3: Six or more points in a row are continuously increasing (or decreasing), describing a trend.

 

 

 

Rule 4: Fourteen or more points in a row alternate in direction. Such oscillation is beyond noise. The rule only concerns directionality. The position of the mean and the size of the standard deviation are irrelevant.

 

 

Rule 5: Two/three points out of three in a row are more than 2 standard deviations from the mean in the same direction. There is an average tendency for the samples to be out of control. The side of the mean for the third point is indeterminate.

 

 

Rule 6: Four/five out of five points in a row have more than one standard deviation from the mean in the same direction. There is a strong tendency for samples to be slightly out of control. The side of the mean of the fifth point is indeterminate.

 

 

Rule 7: Fifteen points in a row are all within one standard deviation of the mean, on either side of it. With a standard deviation of 1, more variation is expected.

 

 

Rule 8: There are eight points in a row, but none are within one standard deviation of the mean, and the points are in both directions of the mean. Jumping from the top to the bottom by missing the first standard deviation band is rarely random.

 

Predicting: Machine Learning

The goal is to make predictions based on probabilities. "While people talk about AI, it's really about machine learning," says Connolly. Not all types of data are even. Labelled data is more valuable, but also more expensive. Labelling is useful in order to identify what it represent, so that DI can classify it, or to "do a reduction of the data to represent a phenomenon by a simplifying law," or a mathematical regression.

https://en.wikipedia.org/wiki/Linear_regression

Machine learning is based on a mathematical model of linear regression. It is a model that seeks to establish a straight line relationship between variables called explained y, and those explained x. It is then possible to make predictions on even unknown variables from those known.

This is what machine learning analytics does. "Analytics: the research technique of analyzing metadata using algorithms, specialized tools, or AI systems for the purpose of obtaining actionable or decision-making information." Therefore, predict.

Connolly recommends the documentary AlphaGo directed by Greg Kohs, when discussing the consolidation of learning. "Go is the most complex game ever devised by humans. Beating a professional Go player has been an incredible long-standing challenge for AI research." — from the AlphaGo documentary

Learning AI is like a Maslow pyramid with AI at the top. You basically have to know how to collect data, before learning how to move and curate it to be able to explore and transform it. After that comes aggregation and labeling, which involves the dashboard, before getting to optimization and deep learning of AI.

To identify opportunities to use machine learning, he explains to take a process and break it down into predictions. He gives the example of a chatbot, a program that attempts to converse with a user. Its process is broken down into tasks. I- Having a Conversation. II- Understand the Context. III- Describe the Situation. IV- Recommend the Appropriate Resource. 

For each of the tasks, formulate them in prediction. Using The AI Canvas is useful for thinking about how AI could help with business decision-making. It maps what surrounds a prediction. Here are the elements of thinking. (Harvard Business Review - Ajay Agrawal et al.) 

Readings

Jean-François Connolly recommends these readings on digital intelligence.

This content has been updated on 2022-09-27 at 23 h 38 min.