Linux Tips

Create a directory with multiple sub-directories

mkdir -p output_folder/{raw_data,temp_data,result,log_folder}

Go to specific line number using vi

vi +14 filename

This will open the file at line 14

Split a large file into multiple smaller files

split -l 500 input_filename output_filename

If there are 2000 lines in a file, this will split the input file into 4 files each with 500 lines.

Find duplicate lines in a file

uniq -D filename

Compress and decompress file

compress: tar -zcvf compressed.tar.gz *.csv

extract: tar -xzvf compressed.tar.gz

Change the value of specific column given a condition (if else)

awk '{print $1,($2<0)? 0 : $2,$3}

Add a new column with unique id

awk '{FS=OFS="\t"}{ print $1"\t"$2"\t"$3"\tid-"NR"\t"$4; }' filename > output_filename

Find common lines between two files

comm -12 >(sort file1) <(sort file2)

Replace empty columnn with certain value

awk -F, 'BEGIN {FS=OFS}{if ($7=="") $7="changed"; else $7=$7; print}' input_file.txt > modified_input.txt

Here, 7th column is replaced by value changed if its empty else, original value is kept

Getting all the rows with 2 columns having same values

awk '($4==$5)' input_file.txt > output_file.txt

Filtering rows based on certain column value

awk '($4 < 5)' input_file.txt > output_file.txt

Filering using multiple conditions

awk '($4 <5 && $4 >0)' input_file.txt > output_file.txt

Getting mean value for specific column

awk {total +=$4} END {print total/NR} input_file.txt

Getting unique values based on multiple column comparison

awk -F"\t" 'seen[$1,$2,$3,$4]++' input_file,txt > output_file.txt

Removing first row without opening file

sed '1d' input_file.txt > output_file.txt

Changing the values of file

sed 's/-/\t/g' input_file.txt > output_file.txt

Sometimes there are carriage returns (\r, ^M) in file, which can mess up the file processing, you can remove it by:

sed 's/\r//g' input_file.txt > output_file.txt