Using AWK in BASH scripts with variables

One of the primary server’s for my porn blocking service generates a continual log file with entries for all of my users. This log file on average grows about 100MB per day. For whatever reason, the software that I am using does not good log rotation built in. Awhile back I created a simple script that would run via CRON at 12:01 every morning. It would take the previous days entries from the log file and put them into their own separate file with a time stamp, then remove those entries from the main file. Maybe not the best way to get it done but it works and taught me plenty of awk and sed magic along the way.

A few months back during a server migration I lost that script. Since then I have been manually rotating these logs once a week or so. It’s been kind of a pain. Here is what I typically end up doing:

[email protected]:~$ awk '$1 == "2019-09-15"' /var/log/guardrails/access.log > /var/log/guardrails/archive/2019/09/15.log
[email protected]:~$ awk '$1 == "2019-09-16"' /var/log/guardrails/access.log > /var/log/guardrails/archive/2019/09/16.log
[email protected]:~$ awk '$1 == "2019-09-17"' /var/log/guardrails/access.log > /var/log/guardrails/archive/2019/09/17.log
[email protected]:~$ cp /var/log/guardrails/archive/2019/09/17.log /var/log/guardrails/access.log

As you can see that get’s pretty tedious. I decided to try and simplify it by creating a basic BASH script where I can pass in some date codes and let BASH do the rest. It turned out to be a little trickier than I thought, but with some searching on Stack Overflow I finally found something that works. Let’s break down what I am trying to do first. Here is a snippet from one of the log files. I have removed all user information.

2019-09-17 00:00:03.217010,userid,allow,https://web-production.lime.bike/api/rider/v1/views/in_trip
2019-09-17 00:00:04.098843,userid,allow,https://web-production.lime.bike/api/rider/v1/views/in_trip_map
2019-09-17 00:00:04.849048,userid,block-invisible,http://www.googleadservices.com:443
2019-09-17 00:00:08.097627,userid,allow,https://web-production.lime.bike/api/rider/v1/views/in_trip
2019-09-17 00:00:11.408182,userid,ssl-bump,//outlookmobile-office365-tas.msedge.net
2019-09-17 00:00:11.411641,userid,allow,http://outlook.office365.com:443
2019-09-17 00:00:11.513710,userid,allow,http://app.adjust.com:443
2019-09-17 00:00:11.608798,userid,ssl-bump,//acompli.helpshift.com
2019-09-17 00:00:11.609426,userid,ssl-bump,//login.microsoftonline.com

As you can see from the entry above, every line starts with a timestamp. Since I am just concerned with splitting these into date specific files, I am only concerned with the first column. The awk command that I would use to filter out all entries for 2019-09-17 is this:

[email protected]:~$ awk '$1 == "2019-09-17"' /var/log/guardrails/access.log 

In this instance awk recognizes $1 as the first column. By default awk is using spaces to seperate columns. So essentially, in plain english, what this command is doing is saying “Go through the file /var/log/guardrails/access.log and check the first column of every line and print them out”. This brings me to the challenging part. I wanted to put this into a script that I could call, and simply pass in a day and month. I needed to figure out how to pass those variables to awk. First I started by creating a script called simply rotatelogs.sh

[email protected]:~$ nano rotatelogs.sh

Then I start off with the basics

#!/bin/bash
DATESTRING="2019-$1-$2"
echo $DATESTRING

Now what the heck does DATESTRING=”2019-$1-$2″ mean? Well i’m glad you asked. Let’s imagine we ran the following command

bash rotatelogs.sh 09 11
2019-09-11

What happened is that our script grabbed the first thing we passed in (09 in this case) and assigned it to $1. It then takes the second thing and assigns it to $2. These are called arguments. We could pass as many arguments to our script as we want. Now what happens if we don’t pass in a second argument?

[email protected]:~$ bash rotatelogs.sh 09
2019-09-

You see that our timestamp would be incomplete. Now let’s think through our script some more. Essentially we want something like this:

[email protected]:~$ awk '$1 == "2019-$1-$2"' /var/log/guardrails/access.log > /var/log/guardrails/archive/2019/$1/$2.log

Unfortunately that will not work. You can see that their will be some confusion. This is where awk variables come in handy. Let’s get back to the script and I will go ahead and fill in the rest. Once I understood the concept it was pretty simple.

#!/bin/bash
DATESTRING="2019-$1-$2"
awk -v date="$DATESTRING" '$1 == date' /var/log/guardrails/access.log > /var/log/guardrails/archive/2019/$1/$2.log

As you can see we are adding the -v flag to awk. This tells awk that what comes next is a variable assignment. In this case we want to be able to use our $DATESTRING variable inside our awk command so we assign it to the variable date. Now we just need to modify our awk command slightly and you can see from the example above that it becomes awk -v date=”$DATESTRING” ‘$1 == date’. This tells awk to look for all entries where the first column equals whatever date string we pass in. I run this script like this:

[email protected]:~$ ./rotatelogs.sh 09 11

That’s much cleaner that what I was doing before. And If I need to do multiple days at once I could simply run it in a for loop like so:

[email protected]:~$ for i in {10..17}; do ./rotatelogs.sh 09 $i; done
2019-09-10
2019-09-11
2019-09-12
2019-09-13
2019-09-14
2019-09-15
2019-09-16
2019-09-17

Note in the example above that {10..17} is special in BASH. It says to start our variable i at 10 and increase by one each loop until we get to 17. My next step would be to modify my rotatelogs.sh script so that it can be run via CRON once a day. For this post I just wanted to demonstrate how to use BASH variable in awk.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top