Fingerprinting MySQL with scannerl

This blog post is a walk-through on the implementation of a fingerprinting module for scannerl to identify the version of MySQL running on remote servers. MySQL is a very popular open-source relational database management system. We released the implemented module and did a full scan on IPv4 with very interesting results as shown at the end of this post.

Scannerl is modular and allows to easily implement new modules for specific needs in terms of fingerprinting. Modules have to be implemented in Erlang. For a complete documentation on how to implement your own fingerprinting modules, see scannerl wiki page.

The goal here is to identify the version of MySQL running on a remote host. A few documentation is available in order to understand how MySQL protocol works:

As mentioned in the documentation, MySQL will send a greeting message once the TCP handshake on the MySQL port (per default 3306) is established containing different information on the server (including the version string). So let’s get started …

Templates are provided in scannerl repository to get started. The TCP fingerprinting skeleton will be the template used to implement this module:

A few fields need to be filled in before getting into coding. The module directive should reflect the name of the source file without its .erl extension, here fp_mysql_greeting.

define(TIMEOUT, 3000). % milli-seconds
define(PORT, 3306). % mysql port
define(MAXPKT, 1). % only greeting packet is needed
define(DESCRIPTION, "TCP/3306: Mysql version identification").

Specific fields related to the module like the port and the description need as well to be completed. The DESCRIPTION definition above will be displayed when modules are listed with scannerl using the -l switch. The MAXPKT field instructs scannerl that we are only expecting a single packet in return and any additional packets must be ignored. This speeds up the process in case the remote service is not the expected one or sending large quantity of data over multiple packets. By setting it to 1 we are able to more quickly stop the exchange and dedicate the freed resources to fingerprinting other hosts.

Then the callback function is to be implemented. This is the central part of the fingerprinting module that will tell scannerl what to do next when fingerprinting a specific host. For additional information on the callback_next_step function and its usage, see this wiki page.

callback_next_step(Args) when Args#args.moddata == undefined ->
{continue, Args#args.maxpkt, "", true};
callback_next_step(Args) when Args#args.packetrcv < 1 ->
{result, {{error,up}, timeout}};
callback_next_step(Args) ->
{result, parse_header(Args, Args#args.datarcv)}.

The first part of the function is called when scannerl finishes the TCP handshake and looks for instructions on what to do next. Here we tell it to continue and send nothing (using an empty payload with “”) to the remote host since we are expecting the MySQL service to send us the greeting message first. The returned tuple allows for user-data as its last argument. This is useful when specific information need to be store until the next call to callback_next_step and will be available through the record field Args#args.moddata. For this module, no data are needed so only the boolean true is set there as a placeholder in order avoid entering this first part on the next call to callback_next_step.

Scannerl won’t send anything to the remote host and will wait the number of milli-seconds defined in the field TIMEOUT. If no packet are received within TIMEOUT, the second part of the function will be called and will match since Args#args.packetrcv which contains the number of packet received will be 0. This allows to identify any not responding host and return an error, here timeout. For more information on the format of the result, see this page.

In case the remote service sends data, the third part of the function will match. Here the data received (available in Args#args.datarcv) need to be parsed to identify if we are dealing with valid MySQL data. The function parse_header is called with the received payload.

Pldlen:24/little, % 3 bytes payload length
_:8, % 1 byte sequence id or packet nb
Pld/binary % content
>>) ->
case byte_size(Pld) == Pldlen of
true ->
parse_content(Args, Pld);
false ->
{{error, up}, unexpected_data}
parse_header(Args, _) ->
{{error, up}, unexpected_data}.

The first part of this function will try to match the received payload against a specific layout. If it doesn’t match (and thus the data received is not what was expected) the second part will match and will return an error of type unexpected_data.

The format of a MySQL packet is defined in this page. Every MySQL packet contains a header made of 3 bytes containing the payload length followed by 1 byte of sequence id (also sometimes called packet number).  The payload length is then matched against the variable Pldlen (3 bytes of 8 bits = 24 bits) in little endian while the sequence id is ignored. The rest of the data is stored in the variable Pld. The version string is expected to be found in these data.

In order to make sure the data is valid, the size of the rest of the payload (stored in Pld) is compared to the value provided in the Pldlen variable. If that doesn’t match, an error is returned. If it does match, then the content is parsed in the function parse_content.

define(PROTO, 16#0a). % only interested in protocol version 10
?PROTO:8, % protocol version
>>) ->
case find_null(Rest, 1) of
error ->
{{error, up}, unexpected_data};
{Version, _Bin} ->
{{ok, result}, [binary_to_list(Version)]}
parse_content(_Args, _) ->
{{error, up}, unexpected_data}.

Here we are only interested by version 10 of the MySQL protocol (0x0a in hex). Therefore the first part of the parse_content function will only match if the first byte (8 bits) matches against that value. The rest of the binary should contain, in its first part, data being the null terminated string of the server version. The helper function below is used to split the version from the rest of the binary data.

find_null(Bin, Pos) ->
case Bin of
<<Start:Pos/binary, 0, Rest/binary>> ->
% found
{Start, Rest};
<<Bin:Pos/binary>> ->
% not found
<<_:Pos/binary, _/binary>>=B ->
% go forward
find_null(B, Pos+1)

The find_null function will go through the binary data until a null byte is found. It will then return a tuple made of the part of the binary containing the string as its first element and the rest of the binary data as the second. If no null byte is found, the function will return an error.

This version is then converted to string with erlang’s built in function binary_to_list. This is what will be returned to scannerl and outputted in case of success.

The entire module is available on github in scannerl’s repository under

We ran a fingerprinting scan using this module on the entire IPv4 address space in order to identify the versions of MySQL being used out there. We used for this a cluster of 30 VMs and leverage the ability of scannerl to scale and distribute its work over multiple hosts. The list of opened TCP/3306 ports contained 7’763’490 targets. The scan took 10 minutes and 33 seconds. Below is a small overview of the results found.

The fingerprinting scan returned 2.5 millions MySQL servers. In comparison, shodan has only 1M MySQL records.

The top 10 versions seen are:

  • 4.50% 5.1.73
  • 2.59% 5.7.19
  • 2.45% 5.5.5
  • 1.51% 5.6.36
  • 1.45% 5.5.58
  • 1.26% 5.6.38
  • 1.22% 5.5.51
  • 1.20% 5.7.20
  • 0.78% 5.5.56
  • 0.74% 5.6.37

The oldest MySQL version found was 3.21.26, which was released before 1999 ! (update anyone ?)

Also it is interesting to see that MySQL on ubuntu contains the version of the distribution as part of the returned string. 7.7% (192’953) hosts were actually running some version of Ubuntu based on the results we gathered.

Here are the top 10 Ubuntu versions identified:

  • 75629 14.04.1
  • 63871 16.04.1
  • 18827 12.04.1
  • 12370 <no-version-provided>
  • 6495 10.04.1
  • 3747 14.04.2
  • 2451 12.04.2
  • 1175 17.04.1
  • 1090 15.10.1
  • 1070 16.04.2

Ubuntu versions as old as 6.06.2 (released in 2006) were found.

Also some results contained the string MariaDB, which is a fork of MySQL. There were 12% (300’619) hosts containing the string MariaDB found among the results.

As shown in this blogpost, implementing a new module for scannerl is quite easy and allows to quickly get concrete results even on very large target list. By leveraging the ability of scannerl to distribute the work, we fingerprinted more than 7 millions target in 10 minutes. In this example we limited ourself to only identifying the version of the service but the parsing could be easily enhanced to retrieve additional information from this greeting packet (server capabilities, salt, …).

Scannerl is modular and its fast, go try it ;-)

One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s