I’m fascinated by system failures and oddities, especially when systems are operating under normal conditions. A while back I saw a slide by Ian Goodfellow that made me laugh. It was random noise fed into a pre-trained network and the network did its job classifying the noise as a known object. That begged a few questions. Will different pre-trained networks see the same thing? Different things? What percentage of the time? What’s the highest level of confidence of a network that the random noise is a particular object? On top of these other questions, what is the neural network actually “seeing?”
This post and the associated code is the result of my curiosity toward these unexpected results. Fortunately, PyTorch makes experiments like this easy. To visualize why the network classifies objects a certain way, we’ll use Captum a model interpretability framework. The full code can be seen in the following Github repository.
Relevance
Before we begin, you may wonder why any of this is relevant. In many cases, developers aren’t building models from scratch. They are reaching for frameworks and pre-trained networks obtained from a model zoo as a starting points. This activity saves time as you didn’t need to collect the data and do the legwork of the initial training. However, this also means that unexpected problems have a way of cropping up in strange places. Depending on the model’s use and function, this could have both security and safety impacts.
Pre-trained Models
Pre-trained models are easy to instantiate and allow you to quickly send them data for classification. With these models, you don’t have to specify the model definition or perform the training on them, this is done for you ahead of time meaning they are ready for use after you have instantiated them. The pre-trained models in the Torchvision library were trained on the Imagenet dataset which consists of 1000 categories. The thing to remember here is that this training was for a single object in an image, not a complex image with multiple objects, which makes for some interesting results, but that’s a topic for another time. Pulling in a pre-trained model from PyTorch’s Torchvision library is easy. It’s a matter of importing the selected model with the pretrained
parameter set to True
. I also set the model to evaluation mode, since there won’t be any training happening during these tests, just inference.
To start with, I have a line of code that sets the device to cuda
or cpu
depending on if a GPU is available. A GPU isn’t necessary for these simple tests, but since I have one in my machine, I use it.
device = "cuda" if torch.cuda.is_available() else "cpu"
import torchvision.models as models
vgg16 = models.vgg16(pretrained=True)
vgg16.eval()
vgg16.to(device)
A list of the Torchvision pretrained models is available here. For my tests, I didn’t want to use all of the pre-trained networks, because that would get excessive. I selected the following 5 networks.
- vgg16
- resnet18
- alexnet
- densenet
- inception
I didn’t employ any particular methodology in my selection of these networks. Vgg16 and Inception are used quite a bit in examples and they are all different, which was the biggest factor.
Generating Images with Noise
We need a way to automatically generate images of noise that we can feed the neural network. For this task, I used a combination of the Numpy
and PIL
libraries and wrote a small function to return an image containing random noise.
import numpy as np
from PIL import Image
def gen_image():
image = (np.random.standard_normal([256, 256, 3]) * 255).astype(np.uint8)
im = Image.fromarray(image)
return im
The result of this function is an image like the one below.

Transforming Images
Next we have to perform image transformations on our noise, convert it to a tensor, and normalize it. The following code is intended to be used on not only our random noise, but also any other image we might want to feed into our pre-trained networks to test (hence the Resize and CenterCrop values).
def xform_image(image):
transform = transforms.Compose([transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
new_image = transform(image).to(device)
new_image = new_image.unsqueeze_(0)
return new_image
Getting Predictions
Once we have our transformed images, it’s easy to get predictions from an instantiated model. In this case, we assume the return from our xform_image
function is called image_xform
. In the code I used for testing, I broke these up into two different functions, but for simplicity, I’ll lump it together here. Basically, we feed the transformed image into the network, run the result through a softmax function, use the topk
function to retrieve the score, and predicted label ID of a singular top result.
with torch.no_grad():
vgg16_res = vgg16(image_xform)
vgg16_output = F.softmax(vgg16_res, dim=1)
vgg16score, pred_label_idx = torch.topk(vgg16_output, 1)
Results
So, now that we have a basic idea of how to generate these images of noise and feed them to a pre-trained network, what were the results? For this test, I decided to generate 1000 images of noise, run them through the 5 selected pre-trained networks and put them in a Pandas dataframe for quick analysis. The results were pretty interesting and a bit unexpected.
vgg16 | resnet18 | alexnet | densenet | inception | |
---|---|---|---|---|---|
count | 1000.000000 | 1000.000000 | 1000.000000 | 1000.000000 | 1000.000000 |
mean | 0.226978 | 0.328249 | 0.147289 | 0.409413 | 0.020204 |
std | 0.067972 | 0.071808 | 0.038628 | 0.148315 | 0.016490 |
min | 0.074922 | 0.127953 | 0.061019 | 0.139161 | 0.005963 |
25% | 0.178240 | 0.278830 | 0.120568 | 0.291042 | 0.011641 |
50% | 0.223623 | 0.324111 | 0.143090 | 0.387705 | 0.015880 |
75% | 0.270547 | 0.373325 | 0.171139 | 0.511357 | 0.022519 |
max | 0.438011 | 0.580559 | 0.328568 | 0.868025 | 0.198698 |
As you can see from the results, some of these networks predicted that the noise was something with a pretty high level of confidence, both resnet18 and densenet had max values over 50%. This is all well and good, but “what” are these networks seeing in the noise? Interesting enough, they weren’t seeing the same things.
Vgg16 Results: stole 978 jellyfish 14 coral_reef 7 poncho 1 Resnet18 Results: jellyfish 1000 Alexnet Results: poncho 942 dishrag 58 Densenet Results: chainlink_fence 893 window_screen 37 chain_mail 33 doormat 20 tile_roof 16 space_heater 1 Inception Results: switch 155 magpie 123 jigsaw_puzzle 102 pillow 85 jean 83 indigo_bunting 81 birdhouse 69 honeycomb 32 poncho 26 carton 25 mousetrap 24 sarong 18 corn 16 chain_mail 16 vacuum 12 window_screen 12 cardigan 11 American_egret 9 broccoli 9 wallet 8 sandbar 7 bee_eater 5 ice_lolly 5 dishwasher 5 hammerhead 5 chainlink_fence 4 apiary 4 nail 4 rain_barrel 4 ashcan 3 jersey 3 bib 3 little_blue_heron 3 cockroach 3 envelope 2 stingray 2 shower_curtain 2 apron 2 starfish 2 miniskirt 1 mitten 1 Italian_greyhound 1 matchstick 1 binder 1 loudspeaker 1 bucket 1 ear 1 shoe_shop 1 handkerchief 1 tray 1 walking_stick 1 sweatshirt 1 dishrag 1 centipede 1 kimono 1
All of these networks saw something different. Resnet18 was absolutely sure it was a jellyfish 100% of the time while Inception on the other hand, had very low confidence in any of its predictions, but saw far more objects than any of the other networks.
Just for fun, I decided to see what kind of a caption Microsoft would add to the image of noise I had near the beginning of this blog post. For this test, I went with the least amount of friction and used the Office 365 version of PowerPoint. The result is interesting because, unlike the imagenet models which are trying to classify a single object, the PowerPoint is trying to classify multiple objects to create an accurate description for the caption.

The result does not disappoint. To me, it looks like the image of noise was interpreted as a circus.
Perspective
This leads us to another question, what is the neural network seeing that makes it think noise is an object? For this, we can use a tool focused on model interpretability to give us an idea of what the network is “seeing”. Captum is a model interpretability framework for PyTorch. I didn’t do anything too fancy for this and just used the code provided in the tutorials section of the website for my examples. I did add the internal_batch_size
parameter with a value of 50 due to the fact that I ran out of memory rather quickly on my GPU without batching.
For these visualizations, I used two gradient based attributions and an occlusion based attribution. With these visualizations, we are trying to see what was important to the classifier in an attempt to “see” what the network sees. I also used my pre-trained resnet model, but you can change the code to any of the other pre-trained models.
Before we get to an image of noise, for a visual reference, I used an image of a daisy for as a demonstration since these features would be easy to identify.
result = resnet18(image_xform)
result = F.softmax(result, dim=1)
score, pred_label_idx = torch.topk(result, 1)
integrated_gradients = IntegratedGradients(resnet18)
attributions_ig = integrated_gradients.attribute(image_xform, target=pred_label_idx,
internal_batch_size=50, n_steps=200)
default_cmap = LinearSegmentedColormap.from_list('custom blue',
[(0, '#ffffff'),
(0.25, '#000000'),
(1, '#000000')], N=256)
_ = viz.visualize_image_attr(np.transpose(attributions_ig.squeeze().cpu().detach().numpy(), (1,2,0)),
np.transpose(image_xform.squeeze().cpu().detach().numpy(), (1,2,0)),
method='heat_map',
cmap=default_cmap,
show_colorbar=True,
sign='positive',
outlier_perc=1)

noise_tunnel = NoiseTunnel(integrated_gradients)
attributions_ig_nt = noise_tunnel.attribute(image_xform, n_samples=10, nt_type='smoothgrad_sq', target=pred_label_idx, internal_batch_size=50)
_ = viz.visualize_image_attr_multiple(np.transpose(attributions_ig_nt.squeeze().cpu().detach().numpy(), (1,2,0)),
np.transpose(image_xform.squeeze().cpu().detach().numpy(), (1,2,0)),
["original_image", "heat_map"],
["all", "positive"],
cmap=default_cmap,
show_colorbar=True)

occlusion = Occlusion(resnet18)
attributions_occ = occlusion.attribute(image_xform,
strides = (3, 8, 8),
target=pred_label_idx,
sliding_window_shapes=(3,15, 15),
baselines=0)
_ = viz.visualize_image_attr_multiple(np.transpose(attributions_occ.squeeze().cpu().detach().numpy(), (1,2,0)),
np.transpose(image_xform.squeeze().cpu().detach().numpy(), (1,2,0)),
["original_image", "heat_map"],
["all", "positive"],
show_colorbar=True,
outlier_perc=2,
)

Noise Visualization
Now that we’ve seen the previous images generated with a daisy, now it’s time to look at what it looks like with our random noise.
For reference, we are using the resnet18 pre-trained network, and for this particular image, it is 40% certain, it is a jellyfish. I won’t repeat the code here, but the code for the visualizations is the same as above.




As we can see from these visualizations, as humans, it’s still not clear why the network thought this was a jellyfish. There are areas where the network placed more importance, but it’s not nearly as defined as we saw in the daisy example. Unlike a daisy, jellyfish are amorphus and vary in levels of transparency.
You might be wondering, what would these visualizations look like on an image of a jellyfish? With the code I’ve provided in the Github repository, it would be easy to see and compare.
Conclusion
From this post, it’s easy to see how easy it can be to fool neural networks by feeding them unexpected inputs. To these networks’ credit, they served their purpose and returned a result as best they could. We could also see from the results that merely filtering out low confidence predictions may not be a valid countermeasure since some of the predictions had reasonably high confidence. We need to be mindful of situations in which systems fail so easily when implemented into applications so they do not catch us off guard and feed our systems strange and unexpected input, something security professionals have been doing for quite some time.