c++ – STL Set: insert two million ordered numbers in the most efficient manner – Education Career Blog

For the following mass-insert, because the inputs are ordered, are there any (slight) optimizations?

set<int> primes;
for ( int i = 2; i <= 2000000; i++ ) {
    primes.insert(i);
}
// then follows Sieve of Eratosthenes algorithm

New improvement, twice as fast:

set<int> primes;
for ( int i = 2; i <= 2000000; i++ ) {
    primes.insert(primes.end(), i);
}
// then follows Sieve of Eratosthenes algorithm

,

From http://www.cplusplus.com/reference/stl/set/set/

vector<int> numbers;
for (int i=2; i<=2000000; ++i)
    numbers.push_back(i);

set<int> primes(numbers.begin(), numbers.end());

That should have O(n) complexity, where as yours has O(n*log(n)) complexity.

The referenced link says that when you use the iterator based constructor, if the iterator is sorted, then you get linear. However, I’m not seeing anything explicit on howto explicitly inform the constructor that it’s a sorted iterator.


For something cleaner, I’d rather use boost’s counting iterator though. That shortens this to:

set<int> primes(
    boost::counting_iterator<int>(0),
    boost::counting_iterator<int>(200000));

,

There is a overloaded version of insert method available which takes an iterator as the first parameter. This iterator is used as the hint i.e. the comparison will start from this iterator. So if you pass the proper iterator as the hint, then you should have a very efficient insert operation on the set. See here for details. I believe in your case, numbers.end() -1 will be a good option.

,

If you are starting with the full range of candidate ints, I would not use a set at all, I would use a vector<bool>, init them all to false, and mark the targets (primes) as true as they are located.

vector<bool> candidates(200000);

You can index this using the current candidate int – 1 (candidate range = 1..200000). Then iterate using find to locate the true values, converting to int by using the offset versus begin().

  • Total number of memory allocations – 1.
  • Total memory usage 25000 bytes (vector<bool> is a special case, see comment from @Greg Rogers) vs >= 800000 for set<int>, discounting BTree overhead.
  • All element access is via pointer arithmetic.

,

Here’s a few:

  1. implement a custom stl allocator that does the reservations for memory upfront.

  2. If you use the vector solution, call reserve.

  3. If you’re just going to to implement the sieve use (or implement) a boost counting iterator and only store the results in a vector, which calls reserve.

Leave a Comment