Remove duplicate lines from file using awk

We mostly have the requirement to remove duplicate lines from file, support guys know the pain well 🙂

Lets look at the quickest solution to achieve this.


Image of Remove Duplicate Lines Requirement


Solution :

awk ‘!x[$0]++’ file1.txt

Explanation :

  • x[$0]: look at the value of key $0, in associative array x. If it does not exist, create it.
  • x[$0]++: increment the value of x[$0], return the old value as value of expression. If x[$0]does not exist, return 0 and increment x[$0] to 1 (++ operator returns numeric value).
  • !x[$0]++: negate the value of expression. If x[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.


Image of Remove Duplicate Lines Solution

