Sed, AWK and Creating HTML Tables
Anyone who does even a modicum of web developments finds themselves taking tabled text, sometimes in the most horrid format, and having to introduce it into some sort of clean html.
A recent experience of this type was transferring material from a TWiki site (good for collaborative projects) to a Drupal site (good for a internal and external website). TWiki likes introducing a lot of inline style information and similar which makes it a bit of a pain to transfer tables to other systems. This is where those old-fashioned UNIX tools, sed (stream editor, 1973) and AWK (Aho, Weinberger, and Kernighan, 1977) still proves their resilience, power and simplicity.
I'll start with a generic method for working with a CSV file. After that I'll give a shorter example specifically for TWiki to HTML.
Generic; From CSV to HTML
1. Start With A CSV File. Turn Every Delimiter Into A New Line
Let's presume that we start with a CSV (comma separate values) file of a table, probably achieved through a process as ugly as screen-scraping or stripping tags.
The first action will be to use sed to turn delimiters into new lines, so that every future table cell is now on its own line.
sed -i 's/,/\n/g' file.csv
The "-i" stands for "in place", e.g., change file.csv, rather than sending to standard error. The "s" is for search, the "g" for global, the "," is what we are searching for, and "\\n" is newline ("\n") with an escape character ("\") so the regular expression doesn't get confused.
You must also be careful to ensure that your csv file does not use a field delimiter (like a comma!) that is used in the table cells.
If you have saved text with quotations, which is common, you'll also need to do the following:
sed -i 's/"//g' file.csv
Note if you leave out the "g" (global"), sed will simply replace the first instance. It is a stream editor after all..
2. Add a table cell open and table cell close to every line
sed -i 's/^/<td>/' file.csv
The caret symbol ("^") means the first character of each line.
sed -i 's/$/<\/td>/' file.csv
The dollar symbol ("$") symbol means the last character of each line.
Now we have a file with each future table cell with a opening and closing html code on each line. We have a file of table cells.
3. Add a table row to every 8th row
Table cells need to be organised into rows. To do this we use a small AWK program. In this example every 8th row has a table row tag added. I am sufficiently paranoid to create a new file for this.
#!/bin/bash
awk ' {
print $0
if(NR % 8 == 0 )
print "<tr>"
} ' file.csv > file1.csv
Or, directly on the command line: awk ' {print $0; if(NR % 8 == 0 ) print "<tr>"} ' file.csv > file1.csv
(Note the semi-colon after print $0
).
4. Add a table row closure to every 9th row
As above; open file1.csv an add a tr to line 1. There's probably a better way to do this.
awk ' {
print $0
if(NR % 9 == 0 )
print "</tr>"
} ' file1.csv > file2.csv
Or, as a single line:
awk ' {print $0; if(NR % 9 == 0 ) print "</tr>"} ' file1.csv > file2.csv
Viola! Add the table, table header and table body tags along with any formatting characteristics and you're done.
TWiki to HTML
Go to EDIT, Raw View or even copy the appropriate text file from TWiki.
1. Remove table header line and convert.
2. Remove first "|" sed -i 's/.\(.*\)/\1/' test.txt
3. Remove last "|" sed -i 's/\(.*\)./\1/' test.txt
4. Add a table row and data marker at the start of each line sed -i 's/^/<tr><td>/' test.txt
5. Add a table row close and data close marker at the end of each line sed -i 's/$/<\/td><\/tr>/' test.txt
6. Turn all the other delimiters into table data markers. sed -i 's/|/<\/td><td>/g' test.txt
Surprise! You're finished already. You might even want to turn it into a small script!