📰 Grep with Head Line

Posted on April 23, 2022 by Myoungjin Jeon
Tags: fish, shell, script, function, util, grep, perl, python

when I tried to find a process, I normally use ps with grep command.

sh> ps aux | grep fish
myoungj+     695  0.0  0.0  88596  7000 tty2     S+   09:17   0:00 -/usr/bin/fish -c /usr/bin/gnome-session -l 
myoungj+    2490  0.0  0.1 164660 10140 pts/1    Ss   09:21   0:00 -fish
myoungj+    2665  0.0  0.1 172848 10076 pts/2    Ss+  09:24   0:00 -fish
myoungj+    2781  0.0  0.1 172724  9712 pts/0    Ss+  09:27   0:00 -fish
myoungj+    3024  0.0  0.1 164528  9552 pts/3    Ss+  09:32   0:00 -fish
myoungj+    4709  0.0  0.0   9136  2692 pts/5    S+   10:00   0:00 grep --color=auto fish

Headline is helpful

However, I found that no head line sometimes makes me wondering what those information actually means. i.e: I’d though it would be nicer if I could see the head line of ps command along with search results.

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
myoungj+     695  0.0  0.0  88596  7000 tty2     S+   09:17   0:00 -/usr/bin/fish -c /usr/bin/gnome-session -l 
myoungj+    2490  0.0  0.1 164660 10140 pts/1    Ss   09:21   0:00 -fish
.. snip ..

Quick Solution for single use

awk or sed could be useful in this category if you don’t need any other feature from grep.

sh> ps aux | awk 'NR == 1 || /fish/ { print; }'

But I think grep is more powerful tool.

In bash, it looks straight forward for me.

bash> ps aux | { read line; echo "$line"; grep 'fish'; }

or using sub-shell.

bash> ps aux | ( read line; echo "$line"; grep 'fish'; )

or in fish shell (little longer)

fish> ps aux | begin read -l line; echo "$line"; grep 'fish'; end

Still, I think bash is better than fish in one-liner command.

More Serious Approach

But those one-liners are not very friendly. IMHO, all the programmes, at least, provide us of simple usage. So I decided to go little deeper.

Fish shell solution

The recent file is on my github: hgrep.fish

The basic options are below:

#!/usr/bin/env fish

set -l PROG = 'hgrep.fish'
# ref: https://fishshell.com/docs/current/cmds/argparse.html#cmd-argparse
set -l options 'C/context=' 'h/help'

function usage -S -d "basic usage for $PROG"
    echo \
"Usage: $PROG [-C|--context context] <SEARCH> [<INPUT PATH>]"
end

# parse args here
argparse $options -- $argv

set -l argc (count $argv)
# note: processed arguments are removed from $argv
if test $argc -ne 1 -a $argc -ne 2
    usage
    exit 0
end

set -l search_string $argv[1] # first argument
set -l input_path /dev/stdin

if test $argc -gt 1
    # <INPUT PATH> is specified
    set input_path $argv[-1]
end

echo $input_path

set -l grep_options -i

if set -q _flag_context
    set --append grep_options '-C' $_flag_context
end

set --append grep_options $search_string

begin
    # print head first
    read -l line
    echo "$line"

    # let 'grep' do the rest
    exec grep $grep_options

end < $input_path

begin .. end < $input_path pattern is used before when I made fish-pandoc-any-to-markdown. So I found this version a bit easier than others.

Perl Solution

My perl solution was made very long time ago. I’m happy to see that it is still working. Basic routine is the same, except it has one more options. –nohead which is not neccessary. I think I just wanted to chceck the how the OptArgs is working at that time.

I realized today that the routine in fish shell is also applicable.

  1. read one line from input and print to stdout
  2. exec to grep with option

Nevertheless, I believed that it is worth to learn!

parsing options in perl

And thanks to OptArgs module, I could handle option handy and in a more structural approach. (However, I think this is little heavier than python’s argparse.)

The recent file is on my github: hgrep.pl

#!/usr/bin/env perl
# -*- Mode: cperl; cperl-indent-level:4; tab-width: 8; indent-tabs-mode: nil -*-
# -*- coding: utf-8 -*-
# vim: set tabstop=8 expandtab:

use strict; use warnings;
use feature qw(switch);
use OptArgs; # https://metacpan.org/dist/OptArgs/view/bin/optargs

my @grep_options = qw(-i);

for ( {'TERM'} ) {
    if (  =~ /dumb/ ) { }
    default { push @grep_options, "--color=auto" }
}

# ref: https://metacpan.org/pod/OptArgs
## option parts ...
opt context =>
  ( isa => 'Num',
    alias => 'C',
    default => 3,
    comment => 'print NUM lines of output context' );

opt help =>
  ( isa => 'Bool',
    alias => 'h',
    comment => 'print a help message and exit',
    ishelp => 1 );

# argument parts ...
arg search =>
  ( isa => 'Str',
    required => 1,
    comment => 'string to search from file' );

arg file_name =>
  ( isa => 'Str',
    default => '-', # default input from stdin
    comment => 'the file which we search from' );

# parsing options via optargs function!
my $opts = optargs;

And now processing the parsed arguments and open a file (or stdin)

if ( defined $opts->{'context'} and $opts->{'context'} > 0 ) {
    push @grep_options, '-C', $opts->{'context'};
}
my $fh;

if ( $opts->{'file_name'} ne '-' ) {
    open my $fh, "<$opts->{file_name}",
        or die "Can't open `$opts->{file_name}'";
}
else {
    # http://perldoc.perl.org/functions/open.html
    open( $fh, "<&=",*STDIN );
}

if ( not $opts->{nohead} ) {
    my $head = <$fh>;
    # FIXME: colourising ....
    print "$head";
}

my $to_gh;

requirement for system programming

And when I try to go further, I found that I need little more system programming underneath, which shell normally does for me.

To communicate with grep function, we need to open a pipe via open function.

my $grep_pid = open( $to_gh, '|-' );
if ( not defined $grep_pid ) {
    die "Can't fork: ";
}

|- means creating a pipe, and fork implicitly at the same time and now we have two processes, when the parent writing into new handle $to_gh, the child will read from the stdin.

In terms of shell script, it looks like below at the moment.

sh>  parent_perl <some options ...> | child_perl

i.e. parent_perl and child_perl now communicate with piple(|) and the child_perl process will be replaced with grep process via exec.

There is a simple way to we are in the parent_perl process or child_perl process, which is checking the $grep_pid value.

if ( $grep_pid ) {
    # if grep_pid is not zero, this is parent_perl (parent process)
    # which handle both file handles.
    while ( <$fh> ) { print $to_gh ; }

    close  for $to_gh, $fh;

    # parent process have to wait any children processes finsished.
    waitpid $grep_pid, 0;
}
else {
    # otherwise, this is child_perl (child process)
    close $fh; # not used in child process
    exec 'grep', @grep_options, $opts->{'search'};
}

exit 0;

and last exec 'grep' ... will replace its own perl process to grep process. no process could not be created without a parent.

I found that it is worth trying to understand basic system programming in perl, However shell script will be much easier to handle it.

Python Solution (as a newbie)

How about python? I think the same logic could be applied in python as well. However, I didn’t get chance to write down a python script yet. so, I didn’t make any function and write it as simple as possible. BTW, I only have python version 3.

credit:

I go through similar pattern as I did in perl you can find the recent file on my github: hgrep.py

#!/usr/bin/env python3

import os, sys
import argparse

# handle options first
parser = argparse.ArgumentParser()#prog="hgrep.py")
parser.add_argument( "-C", "--context",
                     nargs = 1,
                     type = int,
                     dest = "context",
                     required = False,
                     help="print NUM lines of output context" )

parser.add_argument( "search",
                     # upper case in the help message
                     metavar = "<SEARCH>",
                     help = "string to search from <file_path>" )

parser.add_argument( "file_path",
                     # upper case in the help message
                     metavar = "[<FILE PATH>]",
                     default = '-',
                     help = "<file_path> to search" )

# case insenstive search
grep_options = [ '-i' ]

# highligting
if os.environ['TERM'].lower != 'dumb':
    grep_options.append( "--color=auto" )

I found argparse module cannot handle optional positional argument. optional opsitional argument is natural in grep. So I’d like to keep that behaviour.

# argparse cannot handle optional argument
# WORKAROUND:
argv = sys.argv[1::]
if len(argv) == 0:
    print( "{prog}: No argument given".format(prog= sys.argv[0] ),
           file = sys.stderr )
    parser.print_help()
    exit( 1 )

if len(argv) == 1:
    # user ommit input file path
    # default : - (stdin)
    argv.append( '-' )

args = parser.parse_args( argv )

# check more grep options
if args.context is not None and args.context > 0:
    grep_options.extend( [ '-C', args.context ] )

grep_options.append( args.search )

I don’t really know about python, but I guess I took the very low-level pipe() function in python.

# and let's go for plumbing
# file descriptors r,w for reading and writing
r, w = os.pipe()

if args.file_path == "-":
    # from stdin
    file_to_read = sys.stdin
else:
    # or open file path to read
    if os.path.isfile( args.file_path ):
        file_to_read = open( args.file_path, "r" )
    else:


        print( "A file path:({fp}) is not readable"
               .format( fp=args.file_path )
               , file = sys.stderr )
        exit( 2 )

# read head first and print into stdout directly
print( file_to_read.readline() , file = sys.stdout, flush = True )

# fork() will create a child process
# and we can distinguish which one is parent process by checking
# return value
grep_pid = os.fork()

if grep_pid:
    # parent process

    # to communicate with to a child process
    # writing file descriptor will be used
    os.close(r)
    os.dup2( w, sys.stdout.fileno() )

    for line in file_to_read:
        print( line )

    # It is good practice to close all the file open
    os.close( w )

    # safely waiting for children processes
    os.waitpid( grep_pid,
                os.WNOHANG # if child process status not available: no wait
               )

else:
    # child process
    os.dup2( r, sys.stdin.fileno() )

    # child process only requires 'r' as stdin
    # and stdout so it is better to close r,w here.
    os.closerange( r, w )
    os.execvp( 'grep', grep_options )

exit(0)

Where I found difficulty

os.dup2 is essential to communicate with the grep in child process as grep only care about stdin here, but there is no way to inform the child that parent is going to newly open file descriptors (r,w). So we should kindly re-bind the new file descriptor to stdin

TBH, I spent too much time on this because lacks of my knowledge about system programming.

and os.waitpid requires os.WNOHANG option value, I thought it will be 0, which is actually not. so my programme was on hang after grep had finished its job.

Wrapping Up

pipe and shell’s power

parsing option is easier with modules

And also I tried to add option and test them.

Suggestion after post

sh> ps aux | head-with get-one-line --tail-with grep -i /fish/
    # or in fish-pandoc-any-to-markdown
sh> cat some.org | head-with retrieve-metadata --tail-with pandoc -t markdown > some.md

Well… but not for today. maybe after I get more chance to use the similar patterns!

Thank you for reading! and Happy coding!