• Stars
    star
    185
  • Rank 208,271 (Top 5 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Source code repo for Ben Porter (FreedomBen)'s free course on Awk (originally a talk at Linux Fest Northwest 2019 and 2020)

Awk: Hack the Planet['s text]!

This is a free "course" on Awk. It originally started out as a talk at Linux Fest Northwest that was repeated due to popularity (apologies still to those who weren't able to attend because the room was full), but has moved to a video series. This Github repo contains the "challenges" portion of the course, where you write Awk code to answer a question.

About Awk: Hack the Planet['s text]!

Ready to learn Awk? This is the place! An hour or two from now, you'll be able to read and write moderately complex Awk scripts!

Course Progression (recommended order):

  1. Watch video: Awk: Hack the Planet['s text]! - Part 1
  2. Attempt the challenges*: See "The Challenges" section on README.md in Github
  3. Watch solution video: Awk: Hack the Planet['s text]! - Part 2

*Note: If you click on the *.awk files in the repo, you'll be seeing my solution! To avoid spoilers, make sure to follow "The Challenges" section below. It will describe the scenario and then provide the questions.

Course Description

Awk has been around almost forever, yet so many today are unaware of it's power and elegance. It is an amazingly powerful tool that is its own Turing complete programming language. Awk is so powerful that it can be used to create entire services (that process text). But, there's a lot of ignorance out there regarding Awk, and ignorance breeds fear. Come take the Awk red pill like that guy in the documentary "The Matrix" did. Awk can be a ton of fun! Let's make text processing fun again!

We start out by discussing what Awk is, and briefly reviewing the history of Awk. We'll then go over some examples of cool things we can do to whet our appetites. Then we'll go over the syntax and rules of the Awk language. Then we'll see real examples of Awk in action by doing some amazing text processing using only Awk.

Throughout the process, there will be lots of examples that you can run and test yourself (if you want to). Some text files will be provided so you can quickly and easily reproduce the results locally in real time. Source code is available on github: https://github.com/FreedomBen/awk-hack-the-planet .

By the end of this presentation, you will be ready to start using Awk to solve real world problems. You will be comfortable reading and understanding Awk programs and will be ready to slice and dice like a classic *nix hacker.

When you are ready to take on the challenges, you can find the source at: https://github.com/FreedomBen/awk-hack-the-planet

You can also see my solutions (with in depth explanations) in Part 2: https://youtu.be/4UGLsRYDfo8

Videos

Here are links to the videos:

Slides

And here are links to the slides:

Slides (2023 update) and source code to go along with Ben Porter's "Awk: Hack the planet['s text]!" video. (This started out as a talk at Linux Fest Northwest in 2019 and 2020)

Slides from previous (2020) version

Original Videos from Linux Fest Northwest 2020:

  • Part 1 (Presentation) - This is the presentation or lecture explaining Awk syntax and functions
  • Part 2 (Exercises) - This includes me explaining all of the answers to the challenges in the repo

If you want to contact me:

The challenges

The Scenario

The boss has given us a tsv file full of payroll data, and she would like us to run some analysis on it. We recently learned about awk and its amazing processing power, and have decided this is an awesome chance to use our new skillz!

You should primarily use awk, but you can (and should) combine with other tools (like sort, uniq) when it makes sense. Donโ€™t use grep or sed tho since awk can handle the same scenarios (and you are trying to learn awk after all) :-)

To begin you can either clone this repo (recommended):

git clone https://github.com/FreedomBen/awk-hack-the-planet.git

or directly download the payroll.tsv file:

wget 'https://raw.githubusercontent.com/FreedomBen/awk-hack-the-planet/master/payroll.tsv'

The payroll file is [payroll.tsv](https://github.com/FreedomBen/awk-hack-the-planet/blob/master/payroll.tsv). If you would like to randomize it, you can generate a new one with the provided ruby script:

# This script will write to a file "payroll.rsv" in the working directory
./generate-payroll.rb

There are many different solutions. The ones presented are just mine. Many of them could be optimized and refactored to be more elegant, but my goal was simplicity, readability, and ease of learning. To run my solutions (and check my output against yours), use awk -f <file> payroll.tsv (but substitute the number for the one you are trying to run). For example, to run my solution for problem 1:

awk -f 01.awk payroll.tsv

Some solutions are bash scripts (you can tell by the file extension .sh), in which case just run them like normal:

./09-awk.sh

Best of luck!

Challenges (Questions to answer about our payroll data using awk to analyze)

Easy (one-liners)

  1. How much money per hour does the janitor make?
  2. What is the name of the CEO? Format like "LastName, FirstName"?
  3. Which employees were hired on April 16, 1993? (Print the list)
  4. Which employee works in the Springfield office?

A little harder

  1. How many mechanical engineers work here?
  2. How many people from the Portwood family work here?
  3. Are there any employees with identical first & last names? (IOW, the first name is the same as the last name. e.g. Linus Torvalds is not identical, Johnson Johnson is identical)

Gotta think a bit

  1. Print each column header, along with the column number. E.g. The LastName column is the second column, so print "2 - LastName"
  2. How much money per hour does the Seattle office cost to run? (IOW, how much total per hour does it cost to pay all employees who work out of the Seattle office)
  3. How many engineers (of any type) work here?
  4. Who is the highest paid employee?
  5. Who worked the most hours this week?

Awk proficient

  1. Anonymize the data by removing the first two columns. Print all remaining columns
  2. Our client is complaining about the anonymized data from the previous question. They say is claiming it is too hard to read. They would like you to add line numbers to the beginning of each line in the output.
  3. How many different office locations does the company have?
  4. What is the average (mean) wage of all employees? What about the median (extra credit)?
  5. Are there any duplicate entries? (Same names appearing on payroll more than once)
  6. Who was the first employee hired?

Solutions

My solutions are in the *.awk files in this repository. Feel free to use them for hints. You can run them with:

awk -f <file>.awk payroll.tsv

They are also detailed in the Slides at the end of the deck.

If you need some discouragement or demotivation and/or want to learn an extra tidbit:

Good news everyone! The boss just sent me another message, and she says that if you get these questions solved then you'll get a huge raise! 10x higher than what she's paying you right now. Only, the computer is broken so you have to use awk to calculate it. This is an opportunity to explore a cool feature of the awk CLI! We can pre-set variables and pass them in. There are other cool flags too. Check out awk --help. But note that only the POSIX options are portable, so avoid the GNU options if you need compatibility.

To calculate your new salary, update the CURRENT_SALARY variable below to what she is paying you now. Here's mine:

awk -v CURRENT_SALARY=0 'BEGIN { print "Your new salary is: " 10 * CURRENT_SALARY }'

More Repositories

1

rtl8188ce-linux-driver

This modified version of the RealTek WiFi driver fixes some issues with RealTek cards on Linux.
C
476
star
2

dory

Your development proxy for docker
Ruby
152
star
3

screen-for-OSX

A version of screen that supports vertical splitting, ready for building on OSX
C
46
star
4

clipgrab

Customized version of ClipGrab (http://clipgrab.org/)
C++
38
star
5

canvas-development-tools

Some handy scripts that I use to make life better while working on Canvas by Instructure
Shell
36
star
6

handy-bash-scripts

Collection of some handy Bash scripts for doing common tasks
Shell
19
star
7

slackbot_frd

The source code for the slackbot_frd gem
Ruby
10
star
8

outcomes-import-tool

A simple command line client to facilitate the scheduling of outcomes imports
Go
6
star
9

metals

Drop-in mTLS solution for any HTTP service running in a Pod!
Shell
6
star
10

findref

findref is a grep-like program that helps you find strings in files using regexes
Go
5
star
11

malan

An "authentication" service which you can add to your eco-system and use via API, or you can fork and use as the basis of a new Phoenix application
Elixir
5
star
12

docker-janitor

Ruby
5
star
13

angelbot

Angelbot!
Ruby
4
star
14

terminator

This is a branch of the official Terminator release 0.97
Python
4
star
15

nexy

Source code for nexy, the Simple Nexus Slack bot
Ruby
4
star
16

arch-linux-install-scripts

Contains a series of Bash scripts to automate the setup of an Arch Linux system
Shell
3
star
17

slack_proxy3

Elixir
3
star
18

slack_proxy

A simple proxy for posting messages to slack
Elixir
2
star
19

openshift-install-checklists

Some handy checklists to use when installing OpenShift or OKD
2
star
20

nginx-docker

Adds autoindex to the official nginx image
Dockerfile
2
star
21

gimme-dat-canvas

A stupid-easy Canvas Appliance, dockerized no less
Shell
2
star
22

2048-docker

C
2
star
23

digall

Shell
2
star
24

lds-quotes

Get inspirational LDS quotes in your terminal!
HTML
2
star
25

privatebin-setup

Handy scripts to help set up privatebin on a VPS
Shell
1
star
26

metals-example

Example HTTP Service that uses MeTaLS for drop-in mTLS
Shell
1
star
27

kdbook

Source code for the kdbook gem
Ruby
1
star
28

rafflebot

This is a raffle bot built on top of the slackbot_frd gem
Ruby
1
star
29

privatebin-nginx-proxy

Shell
1
star
30

tls-cert-generator

Ruby
1
star
31

rtlwifi-freedom-tool

This tool allows for simplified configuration and troubleshooting of RealTek cards
Python
1
star
32

http-echo-server-docker

Dockerfile
1
star
33

basic-ocp-demo

Sample app that can be easily deployed to OpenShift either as a training exercise or demonstration.
Shell
1
star
34

infa719-game-tracker

The repository housing the code for the web app developed by Team Farfan-Quiroz, Anderson, and Porter
Python
1
star
35

hash-dot-evil

A version of the hash-dot gem that warns you about exposed AWS creds
Ruby
1
star
36

aescrypt

This repository houses the source code for AES Crypt. This is not an official AES Crypt repository. Please visit http://www.aescrypt.com/ for more information.
C
1
star