The Internet of Things (IoT) is now already reality in multiple application areas. Smart sensors used in smart cities, autonomous driving, home and building automation (HABA), industrial applications, etc. experience challenging requirements in large interconnected networks.
A significant increase of data transmission and required bandwidth, leading to an overload of the communication infrastructure can be expected.
To mitigate this, the use of artificial intelligence (AI) in smart sensors can significantly reduce the amount of data exchange within the networks. Philipp Jantscher from ams AG explains.
Artificial Intelligence (AI) is not a new topic at all. In fact, the idea and the name AI, already appeared in the 1950s. People started to develop computer programs to play simple games like Checkers. One milestone known to many people was the launch of the computer program ELIZA, built in 1966 by Joseph Weizenbaum. The program was able to run a dialog in written English. It created questions, the user provided an answer and ELIZA continued with another question related to the response provided by the user.
Figure 1: Example screenshot of the ELIZA terminal screen
The main AI technique are neural networks. Neural networks have first been used in 1957 when Frank Rosenblatt invented the perceptron. Today's neurons in neural networks are still using a very similar principle. However, the capabilities of single neurons were rather limited. It took until the early 70s before scientist realized that multiple layers of such perceptrons could overcome these limitations. The final breakthrough on multi-layer perceptron was the application of the backpropagation algorithm to learn the weights for multi-layer networks. An article in Nature in 1986 by Rumelhart et. al.[Rum] made backpropagation mark the breakthrough of neural networks. From this moment, many scientists and engineers were drawn into the neural network hype.
In the 1990 and early 2000, the method was applied to almost any kind of problem. The number of research publications around AI, and particularly neural networks, significantly increased. Nevertheless, once all the magic behind neural networks has been understood, it became just one of many other classification techniques. Due to their very demanding training efforts, neural networks in the second half of the 2000 decade faced significantly reduced interest.
Reinvestigating neural networks with respect to their operating principles caused the second hype, currently still ongoing. By having more computational power at hand and the involvement of a large number of people, Google demonstrated to beat the best Go players with a trained neural network.
Types of AI's
Over the last decades, different AI techniques have emerged. In fact, it is not black and white whether a certain technique belongs to AI. Many simple classification techniques like principle components analysis (PCA) also use training data. They are not classified as AI however. Four very prominent techniques are outlined in the subsequent sections. Each of them has many variants and the given overview in this article does not claim to be complete.
Fuzzy Logic extends classic logic based on false/true or 0/1 by introducing states that are in between true and false, like “a little bit” or “mostly”. Based on such fuzzy attributes it is possible to define rules for problems. For example, one of the rules to control a heater could be the following: “If it is a little bit cold, increase the water temperature a little bit”. Such rules seem to express the way humans think very well. That is why Fuzzy Logic often is considered an AI technique.
In the real application, large sets of such fuzzy rules are applied to control problems for instance. Fuzzy Logic provides algorithms for all the classical operators like AND, OR and NOT to work on fuzzy attributes. With such algorithms, it is possible to infer a decision out of a set of fuzzy rules.
A set of fuzzy rules has the advantage of being easily read, interpreted and maintained by humans.
Figure 2 illustrates the fuzzy states of a heater control. The states are “Cold”, “Warm”, and “Hot”. As can be seen in the figure, the three states do have some overlap and some temperatures belong to two states at the same time. In fact, each temperature belongs to a certain state with a defined probability.
Figure 2: Example for fuzzy states of a heater control. The arrows denote the state values at the indicated temperature
Genetic Algorithms apply the basic principles of biological evolution to optimization problems in engineering. The principles of combination, mutation and selection are applied to find an optimum set of parameters for a high dimensional problem.
Assuming a large (more than 20) set of parameters with a given fitness function, it is mathematically not possible to determine an optimum set of parameters to maximize the fitness function.
Genetic Algorithms tackle this problem in the following way. First, a population of random parameter sets is generated. For each set in this population, the fitness function is calculated. Then the next generation is derived from the previous generation by applying the following principles:
- Two selected sets of the previous populations (parents) combined form all sets of the next generation (children). The selection is random but sets with a higher score on the fitness function have a higher probability of being chosen as parents
- Based on a mutation rate, random parameters of random sets of the new generation are modified by a small percentage
Each set of the next generation is then evaluated using the fitness functions. In case one set appears to be good enough, the genetic algorithm stops, otherwise a new generation is created as described above.
Figure 3 depicts this cycle.
Figure 3: Description of generating algorithm cycle
It has been shown that for many high-dimensional optimization problems a Genetic Algorithm is able to find a global optimum, whereas conventional optimization algorithms fail, because they were stuck in a local optimum.
Genetic Programming takes Genetic Algorithms a step further by applying the same principles to actual source code of programs. The sets are replaced by sequences of program code and the fitness function is the result of executing the actual code.
Very often, the generated program code does not execute at all. It has been demonstrated however that such a procedure can indeed generate working source code for problems like finding an exit in a maze.
Neural networks mimic the behavior of human brains by implementing neurons. They take input from many other neurons, and then perform a weighted sum. Finally, the output is limited to a defined range. The impact of a specific input depends on the weight associated to this input. These weights resemble the functions of synapses in the human brain to a certain extent.
The weights of the connections are determined by applying inputs and desired outputs. This again is very similar to the way humans teach their kids how to determine the difference between a dog and a cat.
Figure 4: Example of a neural network architecture
The main components of a neural network architecture are input nodes where the input data is applied. The second set of components are hidden layers who process the inputs through application of weights to their inputs. The weighted inputs are then transferred to the inputs of the next layer. Finally, the outputs assign a certain weight to the classification of the input set as a result.
IoT sensor solutions today are mostly only responsible for data acquisition. The raw-data needs to be extracted from the sensor and transmitted to another, more computationally capable device within the network. Depending on the use-case, this device could be either an embedded system, or a server within the cloud. The receiving end collects the raw-data and performs pre-processing in order to present relevant results. Frequently the raw-data of the IoT device needs to be processed using artificial intelligence, as in speech recognition for example. The amount of IoT devices and especially the demand for artificial intelligence is expected to increase dramatically over the next years since sensor solutions become more complex.
However, the growing amount of connected IoT devices, which rely on cloud solutions to compute meaningful results, leads to various problems in several areas. The first issue is the latency between acquiring the raw-data and the response with the evaluated information. It is not possible to build a real time system, since the data needs to be sent over the network, processed on a server and then again interpreted by the local device. This leads to the second problem, which is the increasing network traffic and therefore is reducing reliability of network connections. Servers need to handle more and more requests of IoT devices and thus could be overwhelmed in the future.
A major advantage of neural networks is their ability to extract and store the essential knowledge of a large set of data in a fixed, typically much smaller set of weights. The amount of data, which is used to train a neural network, can be vast. In particular, for high dimensional problems the data set needs to scale exponentially to maintain a certain case coverage. The training algorithm extracts the features out of the data, which efficiently will classify unseen input data. As the number of weights is fixed, the amount of storage does not correlate to the size of the training data set. For sure, if the network is too small it will not deliver good accuracy, but once a proper size has been found the amount of training data does not affect the size of the network anymore. Nor does it affect the execution speed of the network. This is another reason why in IoT applications a local network can outperform a cloud solution. The cloud solution may of course store vast amounts of reference data, but then the response time of the cloud degrades quickly with the number of reference data stored in the cloud.
By definition, IoT nodes are connected to a network, and very likely to the Internet. However, it can be very desirable to have a local intelligence. Then, processing of raw data can happen on the sensor or in the IoT node instead of requiring communication with the network. The most important reason for such a strategy is the reduction of energy consumption of network communication traffic.
Major companies as embedded microprocessor manufacturers already realized, that cloud-based services have to be adopted. One of the consequences is the introduction of new embedded microprocessor cores capable of machine learning tasks. In the future, the trend of processing data within the cloud will be further shifted back to local on-device processing. This allows more complex sensor solutions, which involve sensor fusion or pattern recognition. For these applications, local intelligence of the IoT device is needed. Sensor solutions will become truly smart, as they already deliver finalized meaningful data.
Figure 5 represents this paradigm shift from cloud-based solutions, to local intelligence.
Figure 5: Comparison of cloud and local intelligence architectures
However, computing elaborate AI solutions within an IoT device, requires new solutions which meet power, speed and size constraints. In order to archive this, the trend is shifting to integrated circuits optimized for machine learning. This type of processing is commonly referred to as edge AI.
In sensors for IoT applications, which are very frequently mobile or at least service free, the most prominent constraint is power consumption.
This leads to a system design, which minimizes the amount of data to be transferred via a communication channel as sending and receiving data. In particular, in wireless mode, transmission is always very expensive in terms of power budget. Thus, the goal is to process all raw data locally and only transmit meaningful data to the network.
Local processing neural networks are a great option as their power consumption can be well controlled. First, the right architecture (recurrent versus non-recurrent) and the right topology (number of layers and neurons per layer) must be chosen. This is far from trivial and requires experience in the field. Second, the bit-resolution of the weights get important. Whether a standard float type is used or whether someone can find an optimized solution using just 4 bits per weight contributes significantly to memory size and therefore to power consumption.
Gas Sensor Physics
The sensor system, used as a test case for AI in sensors, is a Metal-oxide (MOX) based gas sensor. The sensor works based on the principle of a chemiresistor[Kor]. Under a certain possible set of reducing (e.g. CO, H2, CH4) and/or oxidizing (e.g. O3, NOx, Cl2) gases the detector layer is changing its resistivity. This can in turn be detected via a metal electrode sitting underneath the detector layer. The main problem of such a configuration is the indiscriminate response to all sorts of gases. Therefore, the sensor is thermally cycled (through a microhotplate). This causes the sensor to react with a resistance change with a unique signature and thus significantly increases the selectivity of gas detection.
Figure 6: Structural concept of a chemiresistive gas sensor
Another approach is to combine different MOX sensor layers to discriminate further between the different gas types.
A closed physical model explaining the behavior of chemiresistors would depend on too many parameters. A non-exhaustive list of these parameters are thickness, grain size, porosity, grain faceting, agglomeration, film texture, surface geometry, sensor geometry, surface disordering, bulk stoichiometry, grain network, active surface area, size of necks of the sensor layer. Together with the thermal cycling profile, the model would be too complex and is currently simply not available.
Therefore, such systems form an ideal case to apply modern AI methods.
Gas sensing is an especially potent application for AI. The problem, which needs to be solved, is the prediction of the concentration of gasses with the resistance of multiple MOX pastes as the inputs.
To solve the task, the behavior of the MOX pastes when exposed to various gasses with different concentrations has been recorded. From this data, a dataset consisting of features (the temporal resistance trend of each paste) and labels (the gas that was present) has been created.
This kind of data is especially well suited for the supervised learning method. In supervised learning, the neural network is given many samples, each consisting of features and a label. The network then learns to associate features with labels in an iterative learning process. It is exposed to every sample multiple times. Its prediction is nudged in the direction of the correct label every iteration by adjusting its weights.
Architecture and Solution Approach
A neural network is defined by its architecture. The architecture is usually influenced by the dataset at hand. In our case, the dataset has a temporal structure, so a recurrent neural network is a good fit. Recurrent neural networks process the features in multiple steps and keep information about previous steps in an internal state.
The architecture also has to be adapted to the already mentioned IoT constraints. The neural network should be as small as possible to minimize power consumption. We use one hidden layer with 47 neurons. The weights are quantized to 4 bits, which further reduce power consumption. On top of this, the network is implemented in an analog circuitry to make it even more efficient.
The network was first tested in a pure software environment using TensorFlow (https://www.tensorflow.org/). This allowed rapid adjustment of the architecture to make sure it is able to solve the task properly before actually building it.
It is not trivial to evaluate a machine learning classifier. There are multiple ways to measure performance. One of the most popular ones is the Receiver Operating Characteristic (ROC). The ROC is a line with false positive rate on one axis and true positive rate on the other. The area under the curve (AUC) of this line should be as high as possible. It measures how well the classifier can separate the positive and negative samples.
Figure 7: Correlation of false positives to good positives for the example of ROC classified gas data
Another interesting metric is the mean absolute error (MAE).
The MAE averages the absolute error of the prediction over all samples. We have
been looking at this metric over time to get a sense of how many time steps the
network needs to achieve a good prediction.