Preface

This search interface is designed for experienced developers who already have in-depth knowledge of Nmap's probe system and probe formats, low-level file format encodings as well as solid regular expression skills (in particular PCRE). Although most parts are protocol-agnostic, this project focuses in part on the telnet protocol and internet scans of port 23, as the protocol structure, implementations and console-oriented human-readable nature of telnet favor it for fingerprinting and rule development. Therefore prior knowledge of the protocol, its applications and particularly the role of its binary (non-ascii) 3-byte command structure are assumed.

SSH search examples

As an introduction to the search interface and its output structure, consider the following simplistic regular expression rule:

^SSH

Anchored in front by the

operator, this rule is matching all devices giving responses starting with the string SSH, as required by the ssh protocol. It is therefore a reasonable prefix for a rule fingerprinting a (more specific) SSH protocol version or device.
When entered in the search interface, at the time of this writing 203 existing Nmap rules sharing this common prefix are found, analyzed and presented in a dynamic list which is refreshed upon changes of the search expression. This scrollable presentation is deliberately focused on the regex text contents to allow quick visual comparison of expression structures, and meta-information is grouped to the right of the screen. The ℹℹ symbol allows quick access to meta-information imported from the database and is independent of other data. The two bar graphs indicate the predicted number of possible unique responses (left) and the exact number of responses by full regex matching (right) based on the loaded scan data, currently 100000 host responses of port 23 deduplicated for uniqueness to about 15000 unique responses.

SSH's default port is specified as port 22, therefore it is not surprising that only 89 uniquely different responses are found matching the

^SSH

prefix, as administrators deliberately have to change the configuration to use port 23. Due to the brevity of the "static" regex prefix, most existing rules have a similar prefix to those 89 responses as seen from the first graph bar. Only a few fingerprints actually score hits displayed in the second bar and have actual matches, for example one matching OpenSSH, which is reasonable due to its prevalence on modern hosts.
A notable occurrence with 15 unique responses is this probe identified as a Kojoney SSH honeypot in the meta data. However, it is plausible that intrusion detection systems such as honeyposts are over-proportionally represented in unusual port configurations, as they are aimed at port scans such as the one used in capturing the raw response data.

In such a narrow search, only a few likely probes (or none) from the database is shown in the probe reference section, as it is based on literal prefix search. The two sections beneath the probe database view are the exact matches including a matching indicator in bold face (middle section) and the general superset of responses found using the fixed regex prefix, which is also indicated in bold face (bottom section).
All sections can optionally be hidden using the option buttons in the search bar, e.g. when unnecessary or visually overwhelming on broad searches. Similarly to the existing Nmap probes, those real-world responses also have meta-information about their occurrence. There is also a special overlay designed to simulate console output of printable characters for human-readable parts. For SSH, this is only marginally helpful, but other formats profit more from this feature.

Telnet search examples

As noted in the introduction, telnet exhibits a special 3 byte command system interlaced with ascii characters. Consider the following rule:

^\xff\xfb\x01\xff\xfb\x03\r\n$(FSM\w+)$ \r\nUser:

(interactive)
Anchored in front, it begins with the two sequences

\xff\xfb\x01

\xff\xfb\x03

containing protocol handshake information and then continues with two lines of text, displaying a dynamic section starting with FSM and a login prompt. This matches well to the meta-information of the probe identifying this as a Netgear router telnetd service.
Using the search interface, three responses are found, which give the device identifiers FSM7328S, FSM7352S and FSM726E, which can be confirmed to be types of professional Netgear L2 switch equipment. It is noteworthy here that while the use of a capturing group for the device number section has enormous advantages in flexibility, grouping multiple fingerprints and allowing for later, yet unreleased models to be matched, probe developers often have no substantial information against which responses the rule was originally intended to match, as this is not documented within the database. (In some cases, there are manual comments indicating a specific device by name,and those are presented in the interface as well)
Recognizing the rule's intended response targets and being able to compare to actual matching behavior is one therefore of the main applications of this software prototype, since this is necessary to correct flaws of current expressions in terms of false negative / false positive results as well as rule collision.

As an example of an under-performing rule:

^\xff\xfb\x01\xff\xfb\x03\xff\xfd\x18\r\n(TA \w+)\r\n\n\n\ruser:

( interactive)
is designed to match Adtran TotalAccess routers/ analogue telephony systems, but matches only two models, namely TA 608 and TA 600R. as can be seen from the occurence bar to the right, suggests the possibility that the fingerprint is too narrow. Once looking at the available response probes, it can be inferred that later revisions like TA 608 Gen3 and other models such as 612 and 624 are not matched as the capturing group is specified too narrow. Including a space

character within the inner matching group improves the match rate to 8, as can be seen here. A deeper look at the available data reveals that there is a large number of individual responses that include office and factory names which are still unmatched. It is highly likely this is a custom location string entered upon device configuration, which wasn't set on the other devices, but is relevant to e.g. the network administrator scanning with Nmap for legacy devices on the local network/VPN. When finally including a generic matching section for this location string, 350 different responses are matched, up from original set of two, without any false positives visible from the available data. Final new rule definition:

^\xff\xfb\x01\xff\xfb\x03\xff\xfd\x18\r\n(.*)(TA [\w ]+)\r\n\n\n\ruser:

Another example: when working from the following prefix:

^\xff\xfb\x01\xff\xfb\x03\r\n\r\n\[

There is one common rule, detailed as a Proxim Tsunami telnetd, which matches 21 of potential 52 hits:

^\xff\xfb\x01\xff\xfb\x03\r\n\r\n\[([\w.-]+)\]> Please enter password:

Clearly, the response data shows two distinct patterns of either ending in

Please enter password:

, and the fingerprint in question is dedicated to match the latter of those patterns. After deliberately broadening the search, 38 instead of 21 hosts exhibiting the right format and suffix can be found, 17 of which are previously unmatched. This concludes that there is a mismatch of the fingerprinting rule, as several devices indicating to be Tsunami MP.11 5054-R (and similar types) were among those excluded from the match, and the creator's comment in the fingerprint database (available in the probe meta-info) specifically states that the regular expression was designed to match against one of those devices.
Upon closer inspection, the core specification of the capturing group

[\w.-]+

seems to be responsible for this misbehavior. It is possible that the original creator of the match had assumed the wide-matching dot operator

to work in a similar way as when used outside of a character group of

[]

, which is not the case, as can be seen in the pcre standards. A possible substitution could be:

^\xff\xfb\x01\xff\xfb\x03\r\n\r\n\[([\S\s]+)\]> Please enter password: $

which matches all 38 intended response targets and also adds a suffix anchor.