Lately I’ve been working with lots of data files with fixed rows and columns, and have been finding myself doing the following a lot:
Getting the row count of a file,
twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.gamma
3183 lda_out/final.gamma
twarnock@laptop:/var/data/ctm :) wc -l lda_out/final.beta
200 lda_out/final.beta
And getting the column count of the same files,
twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.gamma | awk '{ print NF }'
200
twarnock@laptop:/var/data/ctm :) head -1 lda_out/final.beta | awk '{ print NF }'
5568
I would do this for dozens of files and eventually decided to put this together in a simple shell function,
function datsize {
if [ -e $1 ]; then
rows=$(wc -l < $1)
cols=$(head -1 $1 | awk '{ print NF }')
echo "$rows X $cols $1"
else
return 1
fi
}
Simple, and so much nicer,
twarnock@laptop:/var/data/ctm :) datsize lda_out/final.gamma
3183 X 200 lda_out/final.gamma
twarnock@laptop:/var/data/ctm :) datsize lda_out/final.beta
200 X 5568 lda_out/final.beta
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-theta.dat
3183 X 200 ctr_out/final-theta.dat
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-U.dat
2011 X 200 ctr_out/final-U.dat
twarnock@laptop:/var/data/ctm :) datsize ctr_out/final-V.dat
3183 X 200 ctr_out/final-V.dat