Ihr Ansprechpartner

Dr. Marco Weiß

I’m sorry Dave, I’m afraid I can’t do that“ – thus responds HAL to David Bowman when the astronaut wants to re-enter the Discovery One in Stanley Kubrick’s masterpiece 2001: A Space Odyssey. HAL is a computer of the 9000 series with a human personality – an artificial intelligence controlling the proper functioning of the spaceship. The journey of mankind to Jupiter is still fiction, but other elements of the movie are revealing the science behind it. Today, artificial intelligence (AI) is used in several situations where it already facilitates decision making, for example.

Confronted with the increasingly numerous uses of artificial intelligence, this essay ponders what constitutes a „good“ artificial intelligence. It especially considers the relative importance of the algorithm or the data used for training the AI.

What is artificial intelligence?

Artificial intelligence comes in different shapes and sizes. Starting with a brief overview of AI before turning to the core question is therefore a sensible task.

Artificial intelligence can be defined as intelligence performed by computers or other machines in contrast to natural intelligence shown, for example, by humans. Artificial intelligence can solve problems and make decisions on its own – at least in a constrained and closed environment. AI shows the ability to gather knowledge through ongoing learning processes.

Technically, an artificial intelligence relies on an algorithm that is used to interpret data about the computer’s environment. As increasing amounts of data are processed, the AI can improve its underlying algorithm automatically through the experience gained.

Such machine learning is often divided in two phases. In the first, the AI is trained to classify the presented data, whereas in the second it applies this knowledge and makes decisions on its own. Training occurs through regression to detect an underlying implicit function between the input of data and the output as classified by someone teaching the AI. For example, by presenting numerous pictures of cats and dogs labelled accordingly, the AI can identify distinctive features of each species and develop a model that can be used to classify images of unknown creatures. This process is comparable to the way a child learns to distinguish between animals or objects in its environment through detection of distinctive patterns. There are several practical uses of such artificial intelligence that we already rely on every day, for example, the sorting of unwanted e-mails into a folder for junk mail or the recommendations made on e-commerce sites.

Artificial intelligence can also be used in settings where no classification exists. Some business models in the financial services industry, for example, rely on so-called Robo-Advisors. Such a robo-advisor provides financial advice based on algorithms initially designed by investment managers, but continuously improved through machine learning. Using big data, the robo-advisors search for undetected patterns in these data trying to exploit any abnormalities for superior returns. In this way, insights from data and machine learning algorithms can be invaluable in the competition.

Another application combines the input data from sensors used to perceive the environment of the artificial intelligence with machine power able of changing the position in this environment autonomously. Such combinations are used in robotics or in the automobile industry pursuing the goal of autonomous driving. And although artificial intelligence has not yet reached Jupiter, it has already made its way to another planet. Earlier this year, the rover Perseverance has landed on Mars. It is helped in its navigation by an artificial intelligence that can avoid potentially hazardous spots on the surface of Mars, and which also directs the robotics arm to drill for core samples in promising spots.

Last, but not least, artificial intelligence is not only used in technical disciplines. It is also beginning to interact with humans. Social intelligence can recognize and interpret human affects and simulates accordingly. Comparable to the other use cases, input data capturing the sentiment of the counterpart is interpreted by an algorithm that is not only responding, but also capable of improving itself as it learns from the resulting reaction of this counterpart.

Note that in all these different appearances of artificial intelligence, it is the interplay between algorithm and data that determines the quality of an artificial intelligence. This observation leads to the core question of this essay: What determines the level of intelligence of an AI? Both factors, the algorithm an artificial intelligence is endowed with and the data about its environment used for training and calibration, are possible candidates for an answer. To discern between them, it might be wise to investigate a similar debate that has been ongoing regarding human intelligence in the next section.

Nature versus Nurture

The supremacy of the innate endowment or the environmental factors on human behaviour has been subject of discourse since ancient Greece, when Plato philosophized in its Protagoras about the question whether virtue is a common characteristic of man or whether it must be taught.

In the wake of Charles Darwin’s theory of evolution, the same question has become the focus in a debate alliteratively summarized as „Nature versus Nurture“. Depending on which side of the debate you ask, human behaviour has been either attributed to the genetic inheritance of a person provided by nature at the time of birth or is influenced by environmental factors the person is exposed to later in life and the way he or she is nurtured.

A lot of ideology or political fashion has shaped the debate through time. Scientific research conducted on identical twins based on sound statistical methods has finally ruled out confounding factors based on genetic heritage so that the prevailing view today is that both factors are inextricably linked with each other: Success of a human being is determined by the initial endowment as well as by the environmental exposure. This synthesis of the seemingly opposing sides in the best tradition of dialectics is an outcome that Plato would have appreciated!

The same dichotomy between nature or nurture, between genetic heritage or experience gained in life, could be claimed in the field of artificial intelligence. Here, it could be either the programmed algorithm or the data used for calibration that determines the success of the AI. Keeping in mind, however, that for human intelligence both factors are contributing substan­tially towards success, it might, therefore, not be a bold hypothesis that the same answer regarding the behaviour of humans could also apply to artificial intelligence: Such as the success of humans is shaped by both nature and nurture, the quality of an artificial intelligence is influenced by its algorithm as well as the data used for training.

The algorithm and the data

To establish the validity of this proposition it is sensible to investigate each factor in turn. Before that, however, illustrating a well-known example of an artificial intelligence that has developed badly, is beneficial in the further investigation.

The example is about a notorious chatbot developed by Microsoft. A chatbot is designed to allow for an online conversation between a human and the computer using artificial intelligence.  Microsoft released the bot named Tay as a showcase for social intelligence in March 2016. She was equipped with her own website and Twitter account. According to the still preserved website on web.archive.org „Tay has been built by mining relevant public data and by using AI“. The website promised further that „the more you chat with Tay the smarter she gets, so the experience can be more personalized for you.“

Tay was well received by her target group and engaged in conversation on Twitter. She quickly learned from the people interacting with her, imitating their behaviour. However, as she was mimicking the prevalently indecent language on the human side of the chats, her own contributions became more and more offensive, so that soon after her release, Microsoft suspended Tay’s account on Twitter ending the experiment.

This example shows the importance of data used for the ongoing training of an artificial intelligence. While the algorithm worked as designed and quickly adapted itself, the inappropriateness of the data it has been exposed to, led to an outcome notable for its absence of any intelligence.

The nature of data

To better understand the importance of data for the development of an AI, it is reasonable to recall some important characteristics of data. Large amounts of data often show statistical patterns and properties that can accordingly be modelled. One such basic pattern often used is the normal, or Gaussian, distribution. In such a distribution individual observations of data cluster around a mean value which has the highest possibility of occurrence. The further away from the average value the unlikelier an observation becomes.

Using a normal distribution is safe for an artificial intelligence. An AI used for autonomous driving, for example, can with sufficient confidence distinguish between a pedestrian crossing the street and a larger-than-life appearance of a person on a roadside billboard simply by relying on a probability of zero of finding a person three meters tall. In the latter case no decision to adapt the vehicle’s speed is required.

But decision-making is often not that simple for an AI. Especially in environments where the underlying statistical distribution is not yet known, an artificial intelligence cannot rely on earlier experience. Prior beliefs about the form of the distribution curve might turn out to be wrong, and the nearly impossible observation occurs to be merely an outlier in the basic population.

It is notable, that such outliers are highly valuable for training of an AI, as they allow it to get the whole picture. Learning works best in extreme situations when an intelligence is set under stress. It is intriguing that in this respect, artificial intelligence again resembles the way that human intelligence learns. As a child grows up, a plethora of experience is gained through simply experimenting with the environment. Cautiously advancing to the limits and adjusting for outliers in the data allows people to learn from their mistakes and to continuously calibrate their behaviour accordingly.

In the case of Tay, the people in conversation with the AI were allegedly an extreme subset of the total population, perhaps unrestrained by common norms due to the anonymity provided by the internet. As the data that Tay was confronted with were thus tremendously skewed, the mean value of the subset could not be taken as the norm for the whole population.

Nurturing the algorithm

Preventing such seemingly thoughtless use of the data would have been the task of the algorithm. Microsoft apparently tried to tinker with it as the level of offensiveness in Tay’s conversation increased. However, the attempt was futile, and shortly afterwards it suspended Tay altogether.

It is an interesting question whether an AI better endowed in its initial algorithm would have resisted the deterioration that Tay has undergone when exposed to a biased subset of data. Such a question regarding possible algorithmic bias is, however, not easily answered. The characteristic of an AI is, nearly by definition, its autonomy. This makes the traceability of a particular development path of an artificial intelligence difficult. Decisions of the AI can neither be anticipated with certainty ex ante nor easily reproduced ex post for a detailed analysis.

The complexity of the algorithms and their underlying fuzzy logic in combination with the blurring of training and deciding phases entail changes in the algorithms where even trivia matter. Small changes in initial parameters or slightly different data used for calibration can provide a path of development for the AI that leads to a vastly different level of intelligence over time.

The risk that such an algorithmic bias might occur and the unintended consequences it might have, calls for the controlled use of an AI in a narrow and specific environment: On the one hand, training is intensive, using data sets as large as possible to enhance the probability of capturing potential outliers. On the other hand, decision-making is closely supervised, allowing for immediate re-calibration of the algorithm should the necessity arise.

Towards a general artificial intelligence

Such narrow use of artificial intelligence for automation of repetitive tasks, however, would forfeit much of the promise of the technology. It is, therefore, intriguing to take a properly functioning AI, develop it further and expose it, for example, to situations with unknown underlying statistical distributions to allow for a more general intelligence.

Behind this emerges a career path for an artificial intelligence that is comparable to that of a human intelligence: A social AI like Tay could initially be trained internally with employees, where negative side effects can be contained. As any reputational risks appear manageable, the AI is then used for interference with costumers as well. This resembles a system in which an apprentice first must master the technical side of the craft and perform all the tedious tasks necessary to learn a trade, before eventually engaging more and more with the customer side widening the scope of her service.

Returning to the main theme of this essay and answering the question, what determines the level of intelligence of an AI, we must acknowledge that artificial intelligence is shaped by data and algorithm likewise. On the one hand, data is important – as the more experience gained the better for an AI, especially as chances of taking outlying data into account are increased. This might be enough for a focussed and narrow AI. On the other hand, the algorithm, especially the way it improves itself, might be a more important factor in determining the level of intelligence of an AI whenever solving of yet unknown problems is the task.

The relative importance of data and algorithm depends, therefore, on the objective behind the development of an AI. If that objective is to explore unknown planets such as Mars or eventually Jupiter, the scarcity of data about other planets puts a higher premium on the ability to adapt. Focussing on the algorithm instead of data is hence the sensible thing to do in the development of such a general artificial intelligence.

Even if an artificial intelligence like HAL is still more fiction than science today, it appears reasonable to install two interconnected traits into the algorithm: the capability to weigh data and the ambition to discover the truth.

  • The competence to weigh data with precaution allows to discern „good“ from „bad“ data. This is no easy task. Interpreting data correctly rests on the probability given prior beliefs. If the algorithm has been nurtured with some common sense it might be able to reject false data in the same way that a human intelligence behaves sceptically when confronted with fake news. But still, there remains the problem of finding the right balance, as neither should incorrect data be accepted, nor correct data be rejected.
  • Installing ambition to discover the truth into the algorithms allows it to be patient and maybe wait with a final decision until further data is ready for interpretation. The AI’s curiosity to discover the truth allows to adjust prior beliefs over time, thus helping it to strike the best balance in weighing the data.

Balancing both interconnected traits correctly provides the faculty of attaining cognition even in situations where no direct evidence or rationale exists: Installing some sense of intuition is of paramount importance. And maybe some future day, spawning from that sense of intuition it might be just a short step for an AI to develop other emotions as well and show sorrow, regret, or even fear as HAL has done when denying David Bowman re-entry into the spaceship. Whatever the future brings along the path of development of a general artificial intelligence, we must bear in mind that we all shape the way an AI develops – we all put in data that are regressed by the algorithm for interaction between human and artificial intelligence no matter whether nature or nurture prevails in the development of AI.

Ihr Ansprechpartner

Dr. Marco Weiß
DataNavigator
Eine Künstliche Intelligenz (KI) ist nur so gut wie die Daten, die zum Training der KI verwendet werden. Sören Bey
DataNavigator
Der richtige Umgang mit Daten kann ein entscheidender Wettbewerbsvorteil im Retail Payment sein. Stefan Schnitzler erläutert in unserer Reihe DataNavigator
DataNavigator
Kann der Einsatz von Künstlicher Intelligenz die Erfüllung der Anforderungen an die Nachhaltigkeitsberichterstattung erleichtern? Tomislav Bisic und Simon Wilmerding schildern