I have a list of 50,000 ID’s in a flat file and need to remove any duplicate ID’s. Is there any efficient/recommended algorithm for my problem?



You can use the command line sort program to order and filter the list of ids. This is a very efficient program and scales well too.

sort -u ids.txt > filteredIds.txt


Read into a dictionary line by line, discarding duplicates. When all read, write out to a new file.


I’ve did some experiments once and the fastest solution I could get in PHP was by sorting the items and manually remove all the duplicate items.

If performance isn’t that much of an issue for you (which I suspect, 50,000 is not that much) than you can use array_unique():


i guess if you have large enough memory allowance, you can put all these ids in array

$array$id = $id;

this would automatically weed out the dupes.


You can do:


How it works?

  • Read the file using function file
    which returns an array.
  • Get rid of the duplicate lines using
  • implode those unique lines with “\n”
    to get a string
  • write the string back to the file
    using file_put_contents

This solution assumes that you’ve got one ID per line in the flat file.


You can do it via array / array_unique, in this example i guess your ids are separated by line braks, if thats not the case just change it

$file = file_get_contents('/path/to/file.txt');
$array = explode("\n",$file);
$array = array_unique($array);
$file = implode("\n",$array);


If you can just explode the contents of the file on a comma (or any delimiter), then array_unique will produce the least (and cleanest) code, otherwise if your are parsing the file going with the $array$id = $id is the fastest and cleanest solution.


If you can use a terminal (or native unix execution), the easiest way: (assuming that there is nothing else in the file):

sort < ids.txt | uniq > filteredIds.txt

