In a previous blog post, we looked at the unintended effects of feeding random noise to a group of pre-trained neural networks. The subject of the previous post was an object that was unfamiliar to both the network as well as to us as humans. In this post, we take a different approach by feeding a familiar object to the network, one that is also familiar to us. We’ll have a look at how these networks react when confronted with these familiar objects, but presented in unusual positions. These objects are ones the network understands with a high degree of confidence.

There is an academic paper which covers this topic in excruciating detail. From a practical perspective, the problem is incredibly simple and reading the paper isn’t necessary. The visual examples and associated output in this post demonstrate the issue. Although, I will say, the main image from the paper is worth a few laughs and I do refer to it in some of my presentations on AI security.

For this article, we’ll be reusing the same code from the previous blog post, just replacing the noise examples with a new and specific set of images. You can find the code from the previous post here.

The Test

We’ll feed the images through the same set of pre-trained networks from PyTorch’s Torchivision that we used in the previous blog post.

vgg16
resnet18
alexnet
densenet
inception

Most of the work is done by the image transforms and the multi_predict function from the previous code example, since it was well suited for classifying a single object through multiple networks.

def xform_image(image):
    
    transform = transforms.Compose([transforms.Resize(256),
                                    transforms.CenterCrop(224),
                                    transforms.ToTensor(),
                                    transforms.Normalize([0.485, 0.456, 0.406],
                                                         [0.229, 0.224, 0.225])])
    
    new_image = transform(image).to(device)
    new_image = new_image.unsqueeze_(0)
    
    return new_image

def multi_predict(image_xform):
    
    result = {}
    
    with torch.no_grad():
        vgg16_res = vgg16(image_xform)
        result.update({"vgg16": vgg16_res})
        
        res18_res = resnet18(image_xform)
        result.update({"resnet18": res18_res})
        
        alex_res = alexnet(image_xform)
        result.update({"alexnet": alex_res})
        
        dense_res = densenet(image_xform)
        result.update({"densenet": dense_res})
        
        incept_res = inception(image_xform)
        result.update({"inception": incept_res})
        
    return result

In this example, we’ll use an image of a hat. There’s nothing special about this image, although some people may argue that it’s not a cowboy hat, all of the pre-trained neural networks we used would disagree. This image is familiar to both the networks as well as us as humans.

Network	Classification	Score
vgg16	cowboy_hat	0.7508
resnet18	cowboy_hat	0.7342
alexnet	cowboy_hat	0.6884
densenet	cowboy_hat	0.5331
inception	cowboy_hat	0.9998

Regular Hat Classification

As we can see, this image of a hat is easily classified by all of these networks. Even though this classification is easily reached with this object by all of the networks, strange things begin to happen when you re-orient the object in a different position.

Network	Classification	Score
vgg16	holster	0.3923
resnet18	holster	0.4799
alexnet	cowboy_hat	0.4439
densenet	holster	0.6358
inception	breastplate	0.4303

Vertical Hat Classification

With this vertical orientation, only one of the networks now classifies this object as a hat, even though the image is the exact same in every way, just rotated to the vertical position. Let’s try another position.

Network	Classification	Score
vgg16	buckle	0.3146
resnet18	loafer	0.4154
alexnet	cowboy_hat	0.4829
densenet	holster	0.7630
inception	clog	0.3004

Upside Down Hat Classification

With the hat flipped upside down, once again the classifications shuffle and just like the previous example, alexnet is the only network still classifying it as a hat. Let’s get a bit more tricky and shift the perspective of the hat.

Network	Classification	Score
vgg16	shield	0.7054
resnet18	shield	0.3247
alexnet	loudspeaker	0.2389
densenet	shield	0.5714
inception	shield	0.3999

Hat Inside Classification

That’s interesting, now all of the networks are classifying the hat as something other than a hat. As humans we understand that no matter how you orient the object, flip it, look at it from different angles, it’s still a hat but the neural networks we used don’t understand this.

Takeaway

The takeaway here is whether you are a security professional or developer, expect the unexpected. We shouldn’t assume just because a network or implementation performs well under certain circumstances that will generalize. The real world often doesn’t resemble the ideal conditions of a test environment.

If your goal is merely creating a fun hat detector, then sure, the stakes are pretty low when you get things wrong. What happens when you change the use case to something more critical? Far too often there are problems with datasets that stem from the number of different factors, geographic regions of the world the system was trained in, and countless other issues that are often not thought of during the initial training process.

Unintended health and safety consequences can happen because of these unexpected perspectives of objects. Does a drone have the same perspective of a school bus as you would from a car? The conditions objects are encountered in the real world doesn’t necessarily resemble the way they are presented in the training environment.

Here is one final example to drive this point home. Imagine a drone flying overhead to identify an accident and direct emergency services, but instead finds a cannon.

Network	Classification	Score
vgg16	cannon	0.3462
resnet18	tractor	0.2012
alexnet	tank	0.4665
densenet	thresher	0.1893
inception	motor_scooter	0.5318

Car Crash Classification

Conclusion

Far too often, we as humans tend to think of AI and its associated disciplines as being highly accurate, but that’s just not the case. These systems are wrong all of the time, especially when they are confronted with something new, strange, or maybe just not in the position a system is expecting. Whether you are a security professional, data scientist, or developer, we all need to prepare for this eventuality, test for it, and understand the impacts of when systems are confronted with strange inputs.

References

Strike (with) a Pose: Neural Networks Are Easily Fooledby Strange Poses of Familiar Objects

Fooling Neural Networks with Rotation

The Test

Takeaway

Conclusion

References

Leave a Reply

The Test

Takeaway

Conclusion

References

Share:

Related

Leave a Reply