Lately I’ve been working with lots of data files with fixed rows and columns, and have been finding myself doing the following a lot:
Getting the row count of a file,
twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.gamma 3183 lda_out/final.gamma twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.beta 200 lda_out/final.beta
And getting the column count of the same files,
twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.gamma | awk '{ print NF }' 200 twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.beta | awk '{ print NF }' 5568
I would do this for dozens of files and eventually decided to put this together in a simple shell function,
function datsize { if [ -e $1 ]; then rows=$(wc -l < $1) cols=$(head -1 $1 | awk '{ print NF }') echo "$rows X $cols $1" else return 1 fi }
Simple, and so much nicer,
twarnock@laptop:/var/data/ctm :) datsize lda_out/final.gamma 3183 X 200 lda_out/final.gamma twarnock@laptop:/var/data/ctm :) datsize lda_out/final.beta 200 X 5568 lda_out/final.beta twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-theta.dat 3183 X 200 ctr_out/final-theta.dat twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-U.dat 2011 X 200 ctr_out/final-U.dat twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-V.dat 3183 X 200 ctr_out/final-V.dat