Tuesday, April 12, 2016

Wireshark and AWK


In the world of technology, guessing (or blaming) games as to the source of a problem seem to be a time-honored pastime.  While I’m not against the occasional bout, life’s too short to constantly repeat a loop of “guess-and-be-wrong” before stumbling upon the right solution, or giving up to find another game to play.

Seeing is believing, and in the computer world, a “Sniffer” trace, a capture of every bit of data flowing on a network is a great source of seeing.  A real “Sniffer” is an expensive device, deployed by a network guru, and is overkill for most diagnostic data capture.  Enter Wireshark, a free (I love free) software package that runs on Windows, Mac and Linux that captures every bit of data that enters or exits the machine it’s running on.  In most cases you crank it up, recreate your problem, stop the capture and scroll through a few thousand packets, looking for something out of place.  There’s also the capability to filter those packets to just those you’re interested in, for example DNS look-ups, reducing your search to a manageable few.

But what to do when you need to capture millions of packets over an hour or more when you’re looking for a needle in a haystack?  One solution I’ve found very effective is employ a small AWK formatting program.  AWK is a sleek programming language, named after its creators, Aho, Weinberger and Kernighan.  I’ll demonstrate one example of how I used this combination to help find one of those needles, including the AWK source code.

The problem involved trying to eliminate the printing of an unused label from an application with limited source code and even less documentation.  On the positive side, the application keeps just about everything in an SQLServer database.  I was hoping that capturing and analysing the database traffic between the app server and database server would reveal clues on how the label printing application worked.

To start, I ran a Wireshark trace on the app server, capturing all packets without any filters.  I stopped the trace after a label was printed and exported all packets using the “File...Export Packet Dissections…” as a text file.  There are a number of options on what to export, but only the Packet Header and Packet Bytes are needed, so I made sure only those two selection boxes were checked.  The resulting text file has many lines per packet, and it’s too tedious to scan down to find clues and near impossible to make Find commands useful.  Having one, long line for each packet is much more useful.  Enter the AWK code.

I use the GNU version of AWK which can be downloaded from:     http://gnuwin32.sourceforge.net/packages/gawk.htm

Besides AWK being a simple programming language that is very good at handing strings, I also find that having a single, stand-alone executable (gawk.exe) much easier to deploy, with no Windows installation, DLL’s or configuration files.  I keep the executable in same directory as all the input files I use and output files I create, C:\GAWK, which avoids the tedium of having repeatedly spell out directory paths.  To run the formatting program (code included below), open a Command Prompt, navigate to the C:\GAWK directory and enter the following command, replacing the italicized file names with the appropriate names.

< input file gawk.exe -f printpdml.awk > output file

The “< input file” throws the file into the input stream.  “-f printpdml.awk” tells “gawk.exe” which file contains the program code and “> output file” sends all the program’s print output to the named file.

The resulting file contains the desired one line per packet, and while I’m not claiming it’s perfect, it’s 90% of the way there.  In the case of the unwanted label, I was able to quickly find where the label printed, then backed up to find a Stored Procedure that looked appropriate. Searching that procedure led to another procedure that inserted one database record for each label.  Commenting out the unwanted insert resolved the issue.

Not bad for less than an hour’s time and using two free programs.


Source Code for “printpdml.awk”

# This awk program formats that text file into a smaller, more readable format.
#
BEGIN {
  line = "";
}
# Print the single, consolidated line collected by the code below.
{
if (NF < 2) {
  if (length(line) > 0) {
     print packet " " time " " source " " destination " " line;
     line = "";
     }
  }
#
# Get the information from the Packet Header line.
#
# This code assumes that the first field is the packet number, the second and third are the
# date/time, the fourth and fifth are the the source IP/port, and the sixth and seventh are the
# destination IP/port.  Most importantly, it assumes a “2” in column 9 of any line represents a
# header line.  These may need adjusted depending on the exact format of your export.
#
if (substr($0,9,1) == "2") {
  packet = $1;
  time = $2 ":" $3;
  source = $4 ":" $5;
  destination = $6 ":" $7;
  next;
  }
#
# Ignore the first three lines and part of the fourth of the Packet Bytes lines,
# which contain unneeded network header information.
#
if (substr($0,1,4) == "0000") {next;}
if (substr($0,1,4) == "0010") {next;}
if (substr($0,1,4) == "0020") {next;}
if (substr($0,1,4) == "0030") {i=8;} else {i=2;}
#
# Get the information from the Packet Bytes lines.
#
# Ignoring binary zeroes (“00”) reduces the line size and makes
# seeing and finding things much easier.
#
while (i < 18) {
  if (substr($0,7+((i-2)*3),2) != "00") {
     line = line substr($0,i+55,1);
     }
  i++;
  }
}


No comments: