# c++ – STL Set: insert two million ordered numbers in the most efficient manner – Education Career Blog

For the following mass-insert, because the inputs are ordered, are there any (slight) optimizations?

``````set<int> primes;
for ( int i = 2; i <= 2000000; i++ ) {
primes.insert(i);
}
// then follows Sieve of Eratosthenes algorithm
``````

New improvement, twice as fast:

``````set<int> primes;
for ( int i = 2; i <= 2000000; i++ ) {
primes.insert(primes.end(), i);
}
// then follows Sieve of Eratosthenes algorithm
``````

,

``````vector<int> numbers;
for (int i=2; i<=2000000; ++i)
numbers.push_back(i);

set<int> primes(numbers.begin(), numbers.end());
``````

That should have O(n) complexity, where as yours has O(n*log(n)) complexity.

The referenced link says that when you use the iterator based constructor, if the iterator is sorted, then you get linear. However, I’m not seeing anything explicit on howto explicitly inform the constructor that it’s a sorted iterator.

For something cleaner, I’d rather use boost’s counting iterator though. That shortens this to:

``````set<int> primes(
boost::counting_iterator<int>(0),
boost::counting_iterator<int>(200000));
``````

,

There is a overloaded version of `insert` method available which takes an iterator as the first parameter. This iterator is used as the hint i.e. the comparison will start from this iterator. So if you pass the proper iterator as the hint, then you should have a very efficient insert operation on the set. See here for details. I believe in your case, `numbers.end() -1` will be a good option.

,

If you are starting with the full range of candidate `int`s, I would not use a `set` at all, I would use a `vector<bool>`, init them all to `false`, and mark the targets (primes) as `true` as they are located.

``````vector<bool> candidates(200000);
``````

You can index this using the current candidate `int` – 1 (candidate range = 1..200000). Then iterate using `find` to locate the true values, converting to `int` by using the offset versus `begin()`.

• Total number of memory allocations – 1.
• Total memory usage 25000 bytes (`vector<bool>` is a special case, see comment from @Greg Rogers) vs >= 800000 for `set<int>`, discounting BTree overhead.
• All element access is via pointer arithmetic.

,

Here’s a few:

1. implement a custom stl allocator that does the reservations for memory upfront.

2. If you use the vector solution, call reserve.

3. If you’re just going to to implement the sieve use (or implement) a boost counting iterator and only store the results in a vector, which calls reserve.