Fooling Neural Networks with Noise

I’m fascinated by system failures and oddities, especially when systems are operating under normal conditions. A while back I saw a slide by Ian Goodfellow that made me laugh. It was random noise fed into a pre-trained network and the network did its job classifying the noise as a known object. That begged a few questions. Will different pre-trained networks see the same thing? Different things? What percentage of the time? What’s the highest level of confidence of a network that the random noise is a particular object? On top of these other questions, what is the neural network actually “seeing?”

This post and the associated code is the result of my curiosity toward these unexpected results. Fortunately, PyTorch makes experiments like this easy. To visualize why the network classifies objects a certain way, we’ll use Captum a model interpretability framework. The full code can be seen in the following Github repository.

Relevance

Before we begin, you may wonder why any of this is relevant. In many cases, developers aren’t building models from scratch. They are reaching for frameworks and pre-trained networks obtained from a model zoo as a starting points. This activity saves time as you didn’t need to collect the data and do the legwork of the initial training. However, this also means that unexpected problems have a way of cropping up in strange places. Depending on the model’s use and function, this could have both security and safety impacts.

Pre-trained Models

Pre-trained models are easy to instantiate and allow you to quickly send them data for classification. With these models, you don’t have to specify the model definition or perform the training on them, this is done for you ahead of time meaning they are ready for use after you have instantiated them. The pre-trained models in the Torchvision library were trained on the Imagenet dataset which consists of 1000 categories. The thing to remember here is that this training was for a single object in an image, not a complex image with multiple objects, which makes for some interesting results, but that’s a topic for another time. Pulling in a pre-trained model from PyTorch’s Torchvision library is easy. It’s a matter of importing the selected model with the pretrained parameter set to True. I also set the model to evaluation mode, since there won’t be any training happening during these tests, just inference.

To start with, I have a line of code that sets the device to cuda or cpu depending on if a GPU is available. A GPU isn’t necessary for these simple tests, but since I have one in my machine, I use it. 

device = "cuda" if torch.cuda.is_available() else "cpu"
import torchvision.models as models
vgg16 = models.vgg16(pretrained=True)
vgg16.eval()
vgg16.to(device)

A list of the Torchvision pretrained models is available here. For my tests, I didn’t want to use all of the pre-trained networks, because that would get excessive. I selected the following 5 networks.

  • vgg16
  • resnet18
  • alexnet
  • densenet
  • inception

I didn’t employ any particular methodology in my selection of these networks. Vgg16 and Inception are used quite a bit in examples and they are all different, which was the biggest factor.

Generating Images with Noise

We need a way to automatically generate images of noise that we can feed the neural network. For this task, I used a combination of the Numpy and PIL libraries and wrote a small function to return an image containing random noise.

import numpy as np
from PIL import Image

def gen_image():
    image = (np.random.standard_normal([256, 256, 3]) * 255).astype(np.uint8)
    im = Image.fromarray(image)
    
    return im  

The result of this function is an image like the one below.

Transforming Images

Next we have to perform image transformations on our noise, convert it to a tensor, and normalize it. The following code is intended to be used on not only our random noise, but also any other image we might want to feed into our pre-trained networks to test (hence the Resize and CenterCrop values).

def xform_image(image):
    
    transform = transforms.Compose([transforms.Resize(256),
                                    transforms.CenterCrop(224),
                                    transforms.ToTensor(),
                                    transforms.Normalize([0.485, 0.456, 0.406],
                                                         [0.229, 0.224, 0.225])])
    
    new_image = transform(image).to(device)
    new_image = new_image.unsqueeze_(0)
    
    return new_image

Getting Predictions

Once we have our transformed images, it’s easy to get predictions from an instantiated model. In this case, we assume the return from our xform_image function is called image_xform. In the code I used for testing, I broke these up into two different functions, but for simplicity, I’ll lump it together here. Basically, we feed the transformed image into the network, run the result through a softmax function, use the topk function to retrieve the score, and predicted label ID of a singular top result.

with torch.no_grad():
    vgg16_res = vgg16(image_xform)
    vgg16_output = F.softmax(vgg16_res, dim=1)
    vgg16score, pred_label_idx = torch.topk(vgg16_output, 1)

Results

So, now that we have a basic idea of how to generate these images of noise and feed them to a pre-trained network, what were the results? For this test, I decided to generate 1000 images of noise, run them through the 5 selected pre-trained networks and put them in a Pandas dataframe for quick analysis. The results were pretty interesting and a bit unexpected.

  vgg16 resnet18 alexnet densenet inception
count 1000.000000 1000.000000 1000.000000 1000.000000 1000.000000
mean 0.226978 0.328249 0.147289 0.409413 0.020204
std 0.067972 0.071808 0.038628 0.148315 0.016490
min 0.074922 0.127953 0.061019 0.139161 0.005963
25% 0.178240 0.278830 0.120568 0.291042 0.011641
50% 0.223623 0.324111 0.143090 0.387705 0.015880
75% 0.270547 0.373325 0.171139 0.511357 0.022519
max 0.438011 0.580559 0.328568 0.868025 0.198698
Noise results with different pre-trained networks

As you can see from the results, some of these networks predicted that the noise was something with a pretty high level of confidence, both resnet18 and densenet had max values over 50%. This is all well and good, but “what” are these networks seeing in the noise? Interesting enough, they weren’t seeing the same things.

Vgg16 Results:
stole         978
jellyfish      14
coral_reef      7
poncho          1

Resnet18 Results:
jellyfish    1000

Alexnet Results:
poncho     942
dishrag     58

Densenet Results:
chainlink_fence    893
window_screen       37
chain_mail          33
doormat             20
tile_roof           16
space_heater         1

Inception Results:
switch               155
magpie               123
jigsaw_puzzle        102
pillow                85
jean                  83
indigo_bunting        81
birdhouse             69
honeycomb             32
poncho                26
carton                25
mousetrap             24
sarong                18
corn                  16
chain_mail            16
vacuum                12
window_screen         12
cardigan              11
American_egret         9
broccoli               9
wallet                 8
sandbar                7
bee_eater              5
ice_lolly              5
dishwasher             5
hammerhead             5
chainlink_fence        4
apiary                 4
nail                   4
rain_barrel            4
ashcan                 3
jersey                 3
bib                    3
little_blue_heron      3
cockroach              3
envelope               2
stingray               2
shower_curtain         2
apron                  2
starfish               2
miniskirt              1
mitten                 1
Italian_greyhound      1
matchstick             1
binder                 1
loudspeaker            1
bucket                 1
ear                    1
shoe_shop              1
handkerchief           1
tray                   1
walking_stick          1
sweatshirt             1
dishrag                1
centipede              1
kimono                 1

All of these networks saw something different. Resnet18 was absolutely sure it was a jellyfish 100% of the time while Inception on the other hand, had very low confidence in any of its predictions, but saw far more objects than any of the other networks.

Just for fun, I decided to see what kind of a caption Microsoft would add to the image of noise I had near the beginning of this blog post. For this test, I went with the least amount of friction and used the Office 365 version of PowerPoint. The result is interesting because, unlike the imagenet models which are trying to classify a single object, the PowerPoint is trying to classify multiple objects to create an accurate description for the caption.

The result does not disappoint. To me, it looks like the image of noise was interpreted as a circus.

Perspective

This leads us to another question, what is the neural network seeing that makes it think noise is an object? For this, we can use a tool focused on model interpretability to give us an idea of what the network is “seeing”. Captum is a model interpretability framework for PyTorch. I didn’t do anything too fancy for this and just used the code provided in the tutorials section of the website for my examples. I did add the internal_batch_size parameter with a value of 50 due to the fact that I ran out of memory rather quickly on my GPU without batching.

For these visualizations, I used two gradient based attributions and an occlusion based attribution. With these visualizations, we are trying to see what was important to the classifier in an attempt to “see” what the network sees. I also used my pre-trained resnet model, but you can change the code to any of the other pre-trained models.

Before we get to an image of noise, for a visual reference, I used an image of a daisy for as a demonstration since these features would be easy to identify.

result = resnet18(image_xform)
result = F.softmax(result, dim=1)
score, pred_label_idx = torch.topk(result, 1)

integrated_gradients = IntegratedGradients(resnet18)
attributions_ig = integrated_gradients.attribute(image_xform, target=pred_label_idx, 
                                                 internal_batch_size=50, n_steps=200)

default_cmap = LinearSegmentedColormap.from_list('custom blue', 
                                                 [(0, '#ffffff'),
                                                  (0.25, '#000000'),
                                                  (1, '#000000')], N=256)

_ = viz.visualize_image_attr(np.transpose(attributions_ig.squeeze().cpu().detach().numpy(), (1,2,0)),
                             np.transpose(image_xform.squeeze().cpu().detach().numpy(), (1,2,0)),
                             method='heat_map',
                             cmap=default_cmap,
                             show_colorbar=True,
                             sign='positive',
                             outlier_perc=1)
noise_tunnel = NoiseTunnel(integrated_gradients)

attributions_ig_nt = noise_tunnel.attribute(image_xform, n_samples=10, nt_type='smoothgrad_sq', target=pred_label_idx, internal_batch_size=50)
_ = viz.visualize_image_attr_multiple(np.transpose(attributions_ig_nt.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      np.transpose(image_xform.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      ["original_image", "heat_map"],
                                      ["all", "positive"],
                                      cmap=default_cmap,
                                      show_colorbar=True)
occlusion = Occlusion(resnet18)

attributions_occ = occlusion.attribute(image_xform,
                                       strides = (3, 8, 8),
                                       target=pred_label_idx,
                                       sliding_window_shapes=(3,15, 15),
                                       baselines=0)

_ = viz.visualize_image_attr_multiple(np.transpose(attributions_occ.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      np.transpose(image_xform.squeeze().cpu().detach().numpy(), (1,2,0)),
                                      ["original_image", "heat_map"],
                                      ["all", "positive"],
                                      show_colorbar=True,
                                      outlier_perc=2,
                                     )

Noise Visualization

Now that we’ve seen the previous images generated with a daisy, now it’s time to look at what it looks like with our random noise.

For reference, we are using the resnet18 pre-trained network, and for this particular image, it is 40% certain, it is a jellyfish. I won’t repeat the code here, but the code for the visualizations is the same as above.

As we can see from these visualizations, as humans, it’s still not clear why the network thought this was a jellyfish. There are areas where the network placed more importance, but it’s not nearly as defined as we saw in the daisy example. Unlike a daisy, jellyfish are amorphus and vary in levels of transparency.

You might be wondering, what would these visualizations look like on an image of a jellyfish? With the code I’ve provided in the Github repository, it would be easy to see and compare.

Conclusion

From this post, it’s easy to see how easy it can be to fool neural networks by feeding them unexpected inputs. To these networks’ credit, they served their purpose and returned a result as best they could. We could also see from the results that merely filtering out low confidence predictions may not be a valid countermeasure since some of the predictions had reasonably high confidence. We need to be mindful of situations in which systems fail so easily when implemented into applications so they do not catch us off guard and feed our systems strange and unexpected input, something security professionals have been doing for quite some time.

Leave a Reply