Advantages of tree architectures compared to convolutional networks: a performance study

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

Traditionally, the methods of training solutions for in -depth learning (DL) have their roots in the principles of the human brain. In this context, neurons are represented as nodes connected to each other, and the resistance of these connections changes as neurons interact. Deep neural networks consist of three layers or more nodes, including input and output layers. However, these two learning scenarios are significantly different. First, effective DL architectures require dozens of hidden layers, which are currently developing hundreds, while brain dynamics are made up of a few layers.

Second, deep learning architectures generally include many hidden layers, the majority of them being convolutional layers. These convolutional layers are looking for specific models or symmetries in small sections of input data. Then, when these operations are repeated in the following hidden layers, they help identify the larger features that define the input data class. Similar processes have been observed in our visual cortex, but approximated convolutionary connections were mainly confirmed from retinal entry to the first hidden layer.

Another complex aspect of in -depth learning is that the technique of retropropagation, which is crucial for the functioning of neural networks, has no biological analog. This method adjusts the weights of neurons so that they become more suitable for the resolution of the task. During training, we provide the network input and compare the quantity that this departs from what we expect. We use an error function to measure this difference.

Then we start updating the weights of neurons to reduce this error. To do this, we consider each path between the entry and exit of the network and determine how each weight on this path contributes to the overall error. We use this information to correct weights.

Convolutional and fully connected layers of the network play a crucial role in this process, and they are particularly effective due to parallel calculations on graphic processing units. However, it should be noted that such a method has no analogues in biology and differs from the way in which the human brain processes information.

Thus, although in -depth learning is powerful and effective, it is an algorithm developed exclusively for automatic learning and does not imitate the process of biological learning.

Researchers at Bar-Ilan University in Israel wondered if it was possible to develop a more effective form of artificial intelligence using an architecture resembling an artificial tree. In this architecture, each weight has only one path to the exit unit. Their hypothesis is that such an approach can lead to higher precision in classification than in more complex learning architectures that use more layers and filters. The study is published in the journal Scientific relationships.

The nucleus of this study explores whether learning in a tree -shaped architecture, inspired by dendritic trees, can obtain as successful results as those generally obtained using more structured architectures involving multiple fully connected and convolutional multiple layers.

Figure 1

Figure 1

This study presents a learning approach based on architectures in the shape of trees, where each weight is connected to the unit output by a single road, as shown in Figure 1 (C, D). This approach represents a step closer to the implementation of biological learning in a realistic way, considering Recent results that dendrites (parts of neurons) and their immediate branches can change, improving the strength and expressiveness of the signals that cross them.

Here, it is shown that the performance measures of the proposed Tree-3 architecture, which only has three hidden layers, surpasses the achievable success rates of Lenet-5 on the CIFAR-10 database.

In Figure 1 (A), the Convolutional Architectures of Lenet-5 and Tree-3 are taken into account. The Lenet-5 Convolutional network for the CIFAR-10 database consists of size 32 × 32 RGB input images, belonging to 10 output labels. The first layer consists of six (5 × 5) Convolutional filters, followed by (2 × 2) of maximum pool. The second layer consists of 16 (5 × 5) Convolutional filters, and layers 3 to 5 have three fully connected layers of 400, 120 and 84 sizes, which are connected to 10 output units.

In Figure 1 (b), the red line in dotted lines indicates the road diagram influencing the weight updates belonging to the first layer on the panel (A) during the technique of retro-propagation of error. One weight is connected to one of the output units by several routes (red dotted lines) and can exceed one million. It is important to note that all weights at the first layer are assimilated to the weights of 6 × (5 × 5), belonging to the six filters in convolution, as shown in Figure 1 (C).

The Tree-3 architecture consists of M = 16 branches. The first layer of each branch consists of K filters (6 or 15) (5 × 5) for each of the three RGB channels. Each channel is convinced with its own set of K filters, resulting in different 3 × k filters. Convolutional layer filters are the same for all branches M. The first layer ends with a maximum pool composed of non -riding squares (2 × 2). Consequently, there are (14 × 14) output units for each filter. The second layer consists of a tree-shaped sampling (non-picking) (2 × 2 × 7 units) of K filters for each RVB color in each branch, resulting in 21 output signals (7 × 3) for each branch. The third layer completely connects the outputs of the 21 × m branches of the 2 to 10 layer output modules. The reread activation function is used for online learning, while Sigmoïd is used for offline learning.

In Figure 1 (D), the dotted black line marks the diagram of an itinerary connecting the updated weight to the first layer, as illustrated in Figure 1 (C), during the error retropropagation technique to the output device.

To resolve the classification task, the researchers applied the cost of entropy and used the stochastic gradient algorithm to minimize it. To refine the model, optimal hyperparammeters such as the learning rate, the momentum constant and the weight disintegration coefficient were found. To assess the model, several validation data sets composed of 10,000 random examples, similar to the test of test data, have been used. The average results have been calculated, given the standard deviation from average success measures. The Nesterov method and L2 regularization were applied in the study.

Hyperparammeters for offline learning, including η (learning rate), μ (motion constant) and α (L2 regularization), were optimized during offline learning, which involved 200 eras. The hyperparammeters for online learning have been optimized using three sizes of different data.

Following the experience, an effective approach to the formation of an architecture in the shape of a tree, where each weight is connected to the unit out by a single road, has been demonstrated. This approximation of biological learning and the capacity to use learning in depth with very simplified dendritic trees of one or more neurons. It is important to note that the addition of a convolutional layer to the entry helps to preserve the tree -shaped structure and improve success compared to architectures without convolution.

Although the lenet-5 calculation complexity is significantly higher than that of the Tree-3 architecture with comparable success rates, its effective implementation requires new materials. It is also provided that the formation of architecture in the shape of a tree will minimize the probability of gradient explosions, which is one of the main challenges of in -depth learning. The introduction of parallel branches instead of the second convolutional layer in Lenet-5 has improved success measurements while maintaining the shaft-shaped structure. A more in-depth survey is justified to explore the potential of large-scale tree architectures, with an increased number of branches and filters, to compete with contemporary CIFAR-10 success rates. This experience, using Lenet-5 as a starting point, underlines the potential advantages of dendritic learning and its calculation capacities.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.