Searching for Email in a Text File

The following script searches through any specified text file for text before and after the ubiquitous email "@" symbol and outputs these as a csv file through use of grep, sed, and sort (for neatness). If the input or the output file are not specified, it exits after echoing the error and provides the correct exit code (1), to indicate error.

A common conditional, and sadly often forgotten, is whether or not a script has the requisite files for input and output specified. If an input file is not specified a script that performs an action on the file will simple go idle and never complete. If an output file is hard-coded, then the person running the script runs the risk of overwriting a file with the same name, which could be a disaster.


#!/bin/bash
# findemails.sh
# Search for email addresses in file, extract, turn into csv with designated file name
INPUT=${1}
OUTPUT=${2}
{
if [ ! $1 -o ! $2 ]; then
echo "Input file not found, or output file not specified. Exiting script."
exit 0
fi
}
grep --only-matching -E '[.[:alnum:]]+@[.[:alnum:]]+' $INPUT > $OUTPUT
sed -i 's/$/,/g' $OUTPUT
sort -u $OUTPUT -o $OUTPUT
sed -i '{:q;N;s/\n/ /g;t q}' $OUTPUT
echo "Data file extracted to" $OUTPUT
exit

Make the script executable (chmod +x findemails.sh).

Test this file with hidden.txt as the input text and found.csv as the output text. The output will include a final comma on the last line but this is potentially useful if one wants to run the script with several input files and append to the same output file (simply change the single redirection in the grep statement to an double appended redirection.

A serious weakness of the script (so far) is that it will gather any string with the '@' symbol in it, regardless of whether it's a well-formed email address or not. So it's not quite suitable for screen-scraping usenet for email address to turn into a spammers list. But it's getting close.

Example hidden.txt file.

Talking chamber foxtrot@example.com as shewing an it minutes. Trees fully of blind do. Exquisite favourite at do extensive listening. Improve up musical welcome he. Gay attended vicinity prepared now diverted. Esteems it ye sending reached lima@example.com as. Longer lively her design settle tastes advice mrs off who.indigo@example.com kilo@example.com May indulgence difficulty ham can put especially. Bringing remember echo@example.com for supplied her why was confined. Middleton principle did she procuring extensive believing add. Weather adapted prepare oh is calling. bravo@example.com Far advanced settling say finished raillery. Offered chiefly farther of my no colonel shyness. hotel@example.com juliet@example.com Inhabit hearing perhaps on ye do no. It maids decay as there he. Smallest on suitable disposed do although blessing he juvenile in. Society or if excited forbade. Here name off yet delta@example.com she long sold easy whom. Differed oh cheerful procured pleasure securing suitable in. Hold rich on an he oh fine. Chapter ability shyness alpha@example.com Inquietude simplicity terminated she compliment remarkably few her nay. The weeks are ham mike@.... asked jokes. Neglected perceived shy nay concluded. Not mile draw plan snug charlie@example.com ext all. Houses latter an valley be indeed wished mere golf@example.com In my. Money doubt oh drawn every or an china