Fooling Neural Networks with Rotation

In a previous blog post, we looked at the unintended effects of feeding random noise to a group of pre-trained neural networks. The subject of the previous post was an object that was unfamiliar to both the network as well as to us as humans. In this post, we take a different approach by feeding a familiar object to the network, one that is also familiar to us. We’ll have a look at how these networks react when confronted with these familiar objects, but presented in unusual positions. These objects are ones the network understands with a high degree of confidence.

There is an academic paper which covers this topic in excruciating detail. From a practical perspective, the problem is incredibly simple and reading the paper isn’t necessary. The visual examples and associated output in this post demonstrate the issue. Although, I will say, the main image from the paper is worth a few laughs and I do refer to it in some of my presentations on AI security.

For this article, we’ll be reusing the same code from the previous blog post, just replacing the noise examples with a new and specific set of images. You can find the code from the previous post here.

The Test

We’ll feed the images through the same set of pre-trained networks from PyTorch’s Torchivision that we used in the previous blog post.

  • vgg16
  • resnet18
  • alexnet
  • densenet
  • inception

Most of the work is done by the image transforms and the multi_predict function from the previous code example, since it was well suited for classifying a single object through multiple networks.

def xform_image(image):
    
    transform = transforms.Compose([transforms.Resize(256),
                                    transforms.CenterCrop(224),
                                    transforms.ToTensor(),
                                    transforms.Normalize([0.485, 0.456, 0.406],
                                                         [0.229, 0.224, 0.225])])
    
    new_image = transform(image).to(device)
    new_image = new_image.unsqueeze_(0)
    
    return new_image
def multi_predict(image_xform):
    
    result = {}
    
    with torch.no_grad():
        vgg16_res = vgg16(image_xform)
        result.update({"vgg16": vgg16_res})
        
        res18_res = resnet18(image_xform)
        result.update({"resnet18": res18_res})
        
        alex_res = alexnet(image_xform)
        result.update({"alexnet": alex_res})
        
        dense_res = densenet(image_xform)
        result.update({"densenet": dense_res})
        
        incept_res = inception(image_xform)
        result.update({"inception": incept_res})
        
    return result

In this example, we’ll use an image of a hat. There’s nothing special about this image, although some people may argue that it’s not a cowboy hat, all of the pre-trained neural networks we used would disagree. This image is familiar to both the networks as well as us as humans.

Network Classification Score
vgg16 cowboy_hat 0.7508
resnet18 cowboy_hat 0.7342
alexnet cowboy_hat 0.6884
densenet cowboy_hat 0.5331
inception cowboy_hat 0.9998
Regular Hat Classification

As we can see, this image of a hat is easily classified by all of these networks. Even though this classification is easily reached with this object by all of the networks, strange things begin to happen when you re-orient the object in a different position.

Network Classification Score
vgg16 holster 0.3923
resnet18 holster 0.4799
alexnet cowboy_hat 0.4439
densenet holster 0.6358
inception breastplate 0.4303
Vertical Hat Classification

With this vertical orientation, only one of the networks now classifies this object as a hat, even though the image is the exact same in every way, just rotated to the vertical position. Let’s try another position.

Network Classification Score
vgg16 buckle 0.3146
resnet18 loafer 0.4154
alexnet cowboy_hat 0.4829
densenet holster 0.7630
inception clog 0.3004
Upside Down Hat Classification

With the hat flipped upside down, once again the classifications shuffle and just like the previous example, alexnet is the only network still classifying it as a hat. Let’s get a bit more tricky and shift the perspective of the hat.

Network Classification Score
vgg16 shield 0.7054
resnet18 shield 0.3247
alexnet loudspeaker 0.2389
densenet shield 0.5714
inception shield 0.3999
Hat Inside Classification

That’s interesting, now all of the networks are classifying the hat as something other than a hat. As humans we understand that no matter how you orient the object, flip it, look at it from different angles, it’s still a hat but the neural networks we used don’t understand this.

Takeaway

The takeaway here is whether you are a security professional or developer, expect the unexpected. We shouldn’t assume just because a network or implementation performs well under certain circumstances that will generalize. The real world often doesn’t resemble the ideal conditions of a test environment.

If your goal is merely creating a fun hat detector, then sure, the stakes are pretty low when you get things wrong. What happens when you change the use case to something more critical? Far too often there are problems with datasets that stem from the number of different factors, geographic regions of the world the system was trained in, and countless other issues that are often not thought of during the initial training process.

Unintended health and safety consequences can happen because of these unexpected perspectives of objects. Does a drone have the same perspective of a school bus as you would from a car? The conditions objects are encountered in the real world doesn’t necessarily resemble the way they are presented in the training environment.

Here is one final example to drive this point home. Imagine a drone flying overhead to identify an accident and direct emergency services, but instead finds a cannon.

Network Classification Score
vgg16 cannon 0.3462
resnet18 tractor 0.2012
alexnet tank 0.4665
densenet thresher 0.1893
inception motor_scooter 0.5318
Car Crash Classification

Conclusion

Far too often, we as humans tend to think of AI and its associated disciplines as being highly accurate, but that’s just not the case. These systems are wrong all of the time, especially when they are confronted with something new, strange, or maybe just not in the position a system is expecting. Whether you are a security professional, data scientist, or developer, we all need to prepare for this eventuality, test for it, and understand the impacts of when systems are confronted with strange inputs.

References

Strike (with) a Pose: Neural Networks Are Easily Fooledby Strange Poses of Familiar Objects

Leave a Reply