This is a guest post by Aleksandr Mylnikov, who did his semester project under JP Aumasson during his master’s at EPFL, co-supervised by Prof. Arjen Lenstra. This post summarizes part of his work, thanks Alex!
This part-time research project started in February 2017 and finished middle of June 2017. The goal was to understand WhatsApp’s network architecture as well as undocumented internals of the Android application, by doing basic network traffic analysis and reverse engineering using open-source tools.
Throughout the project we observed evolutions in the network infrastructure used by WhatsApp. As of April 2017, it looked like this:
In particular, our observations suggest that:
- Most of the WhatsApp messaging infrastructure is hosted on IBM-owned SoftLayer.
- Connections to exchange servers (e-servers) use Noise Pipes transport encryption suite of protocols. This notably protects (e2ee) content of text messages and service commands from interception. According to our experiments, pinning is properly enforced.
- The API server, used for registration and user authorization at the start of the application, works via HTTPS, sometimes with an insecure TLS configuration. The API regularly changes.
- Static content servers store encrypted media content encrypted for a long time period. As of April 2017, only small media files (under 16MB) were uploaded to static server servers, and receivers downloaded directly from the same server.
- Facebook CDN server was used (as of April 2017) to store larger media content (over 16MB).
- Google Firebase is used for delivery of instant notifications to end devices and to fetch updates from exchange servers. Google could therefore directly learn information on WhatsApp user activity.
While small media files are stored directly on static content servers, we found that basic traffic analysis on small or medium networks can reveal who’s chatting with whom (within this network). In June 2017, however, we observed that all media content was stored on the Facebook CDN, which makes such traffic analysis harder.
Android App Internals
We chose the Android app rather than the iOS one because it’s much easier to “reverse engineer”, by decompiling bytecode to Java.
Native libraries include the following, whose content is obvious except for libwhatsapp: libcurve25519.so, libpl_droidsonroids_gif_surface.so, libvlc.so, libwhatsapp.so, libpl_droidsonroids_gif.so, libresample.so, libwa_dalvik.so. These are therefore used for performance-critical operations such as elliptic-curve arithmetic or audio/video encoding and decoding. We observed that exploitation mitigations are not enabled consistently across libraries (e.g. stack canaries and FORTIFY_SOURCE).
The Signal protocol is not implemented natively, however, but using a slightly modified version of libsignal-protocol-java, which according to our review behaves similarly to the open-source (GPL) version. We could verify the implementation of Noise Pipes as well for transport-level encryption.
The app requests tons of permissions, some of which are surprising, such as android.hardware.bluetooth, android.permission.BLUETOOTH, or android.permission.SEND_SMS. WhatsApp could therefore surreptitiously send out SMS messages, a potential way to bypass e2ee of messages, though we did not find any malicious use of these permissions.
The app includes hard-coded IP addresses of e-servers, to be used when DNS servers are unreachable or unavailable, however it may still attempt to connect to a fake e-server if e.g. a malicious local DNS resolver returns an incorrect address. We did not find hard-coded data of other infrastructure servers than e-servers.
Media files are by default automatically saved to the public partition (e.g. SDcard) of the device. All applications can therefore read the files received. Other applications such as Wire, Signal, or even Telegram will only save a picture into the “gallery” on demand.
In this small project we only scratched the surface of WhatsApp’s security, and as mentioned observations that are valid today may not be so tomorrow. Nonetheless, we found that WhatsApp crypto and security protocols look sound, but user activity information goes through (obviously) Facebook-owned infrastructure but also on IBM and Google systems. Security–convenience tradeoffs tend to favor the latter, which probably makes sense given the WhatsApp user base but isn’t optimal for more security-savvy users. The code base is complex and the app requires many permissions.