# Fooling Neural Networks with Rotation

In a previous blog post, we looked at the unintended effects of feeding random noise to a group of pre-trained neural networks. The subject of the previous post was an object that was unfamiliar to both the network as well as to us as humans. In this post, we take a different approach by feeding a familiar object to the network, one that is also familiar to us. We’ll have a look at how these networks react when confronted with these familiar objects, but presented in unusual positions. These objects are ones the network understands with a high degree of confidence.

There is an academic paper which covers this topic in excruciating detail. From a practical perspective, the problem is incredibly simple and reading the paper isn’t necessary. The visual examples and associated output in this post demonstrate the issue. Although, I will say, the main image from the paper is worth a few laughs and I do refer to it in some of my presentations on AI security.

For this article, we’ll be reusing the same code from the previous blog post, just replacing the noise examples with a new and specific set of images. You can find the code from the previous post here.

# The Test

We’ll feed the images through the same set of pre-trained networks from PyTorch’s Torchivision that we used in the previous blog post.

• vgg16
• resnet18
• alexnet
• densenet
• inception

Most of the work is done by the image transforms and the `multi_predict` function from the previous code example, since it was well suited for classifying a single object through multiple networks.

```def xform_image(image):

transform = transforms.Compose([transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])

new_image = transform(image).to(device)
new_image = new_image.unsqueeze_(0)

return new_image
```
```def multi_predict(image_xform):

result = {}

vgg16_res = vgg16(image_xform)
result.update({"vgg16": vgg16_res})

res18_res = resnet18(image_xform)
result.update({"resnet18": res18_res})

alex_res = alexnet(image_xform)
result.update({"alexnet": alex_res})

dense_res = densenet(image_xform)
result.update({"densenet": dense_res})

incept_res = inception(image_xform)
result.update({"inception": incept_res})

return result
```

In this example, we’ll use an image of a hat. There’s nothing special about this image, although some people may argue that it’s not a cowboy hat, all of the pre-trained neural networks we used would disagree. This image is familiar to both the networks as well as us as humans.

As we can see, this image of a hat is easily classified by all of these networks. Even though this classification is easily reached with this object by all of the networks, strange things begin to happen when you re-orient the object in a different position.

With this vertical orientation, only one of the networks now classifies this object as a hat, even though the image is the exact same in every way, just rotated to the vertical position. Let’s try another position.

With the hat flipped upside down, once again the classifications shuffle and just like the previous example, alexnet is the only network still classifying it as a hat. Let’s get a bit more tricky and shift the perspective of the hat.

That’s interesting, now all of the networks are classifying the hat as something other than a hat. As humans we understand that no matter how you orient the object, flip it, look at it from different angles, it’s still a hat but the neural networks we used don’t understand this.

# Takeaway

The takeaway here is whether you are a security professional or developer, expect the unexpected. We shouldn’t assume just because a network or implementation performs well under certain circumstances that will generalize. The real world often doesn’t resemble the ideal conditions of a test environment.

If your goal is merely creating a fun hat detector, then sure, the stakes are pretty low when you get things wrong. What happens when you change the use case to something more critical? Far too often there are problems with datasets that stem from the number of different factors, geographic regions of the world the system was trained in, and countless other issues that are often not thought of during the initial training process.

Unintended health and safety consequences can happen because of these unexpected perspectives of objects. Does a drone have the same perspective of a school bus as you would from a car? The conditions objects are encountered in the real world doesn’t necessarily resemble the way they are presented in the training environment.

Here is one final example to drive this point home. Imagine a drone flying overhead to identify an accident and direct emergency services, but instead finds a cannon.

# Conclusion

Far too often, we as humans tend to think of AI and its associated disciplines as being highly accurate, but that’s just not the case. These systems are wrong all of the time, especially when they are confronted with something new, strange, or maybe just not in the position a system is expecting. Whether you are a security professional, data scientist, or developer, we all need to prepare for this eventuality, test for it, and understand the impacts of when systems are confronted with strange inputs.

## References

Strike (with) a Pose: Neural Networks Are Easily Fooledby Strange Poses of Familiar Objects