An erase/remove idiom gotcha

I was investigating a crash in my application recently. Despite working in the production environment, I was able to launch the debug version in a debugger (yay). Soon enough, I got a promising SIGSEGV in the expression implementing the erase-remove idiom:

vec.erase(std::remove_if(vec.begin(), vec.end(),
          [&orig](auto const& msg){
    return orig.id == msg.id;
}));

Can you spot the bug already?

If not: worry not, apparently this part of the standard library wasn’t designed to be readable or error-resilient.

First of all: remove rearranges the passed-to sequence, moving the elements fitting the predicate at the end of said sequence. The return value is the new end iterator for this range — past the last item that did not fit the removal criteria. An example:

vector<int> foo = {1,2,3,4,5};
 
// remove even numbers
auto it = remove_if(foo.begin(), foo.end(),
                    [](int a){ return a%2 == 0; });

foo can now contain {1,3,5,2,4} (actually, elements past the new logical end of the container may have unspecified values, as allowed by MoveAssignable concept). it points to an element past the last valid one (3); so, in this example, *it == 4. If no element has been removed, nothing is changed and the actual end iterator is returned.

That’s about it for remove/remove_if. Now, the problem with std::vector‘s or most others standard containers’ erase is that it has two overloads:

iterator erase(const_iterator pos); // 1)
iterator erase(const_iterator first, const_iterator last); // 2)

1) removes element pointed-to by the iterator. 2) removes a sequence of elements minus last (as is usual in the standard library).

Do you see the bug now?

The original code calls the first overload of erase, erasing a single element. Unfortunately, if no element matched remove_if‘s predicate, the original end is returned and the code is equivalent to:

vec.erase(vec.end());

It is undefined behaviour.

Edit: as pointed out by a commenter, in the opposite case of more than one element being “removed” by the remove function, only one of them will be actually removed from the container. In my case, id’s were supposed to be unique, so that problem slipped my mind.

The fix is simple: either check if remove returns the end iterator or use the second overload of the function.

After my fix, my function looked more or less like the following:

auto removeIt = std::remove_if(vec.begin(), vec.end(),
                               [&orig](auto const& msg){
    return orig.id == msg.id;
});
 
vec.erase(removeIt, vec.end());

This prompted me to write a helper function template to prevent this from ever happening again:

template<typename Container, typename Predicate>
void erase_if(Container& c, Predicate p)
{
    using std::begin;
    using std::end;
    using std::remove_if;
 
    auto realEnd = end(c);
    auto removedIt = remove_if(begin(c), realEnd, p);
 
    c.erase(removedIt, realEnd);
}

That being said, the real problem here is that it’s pretty easy to make such a mistake, and not as easy to spot it (especially for a newbie). As a C++ programmer, I’ve come to rely on my compiler (and occasional static analysis) to point out my mistakes and refuse to compile erroneous code and all those tools have failed me here.

Hopefully, this will change in C++17 and we’ll finally be able to use single function to easily remove elements from a range.

17 thoughts on “An erase/remove idiom gotcha

  1. I’ve spotted another bug with this code. If there are N elements meeting orig.id == msg.id criteria you’ll end up with with N elements removed but with only 1 element erased. Maybe you eliminate that possibility before calling the code mentioned in your post, but it’s fragile anyway. Anyway, I agree, it’s not that obvious and that idiom is clumsy.

    1. In this case, the id was unique throughout the system, so that wasn’t a problem, but you’re right, it’s an additional difference one should watch out for.

  2. Couldn’t we just use the first listing passing vec.end() as second argument to vec.erase() ?

  3. I don’t understand why you use the overloaded version at all. The erase-remove idiom works on iterator ranges, not single elements.

    Proper erase-remove looks like:

    v.erase(remove_if(v.begin(), END), END)

    1. It was a mistake, a typo of sorts. Have you never passed incorrect parameters to a function? In my experience, that does happen quite often, but usually the compiler/library won’t let you do this. Here, it compiled but the results were completely unexpected — except that it worked as intended for the most common case (1 element found and deleted).

  4. If id is unique throughout the system this approach seems to be an overkill. Wouldn’t it be better to use:

    auto it = std::find_if(...)
    if(it)
    {
        std::iter_swap(it, v.end() -1);
        v.erase(v.end() -1);
    }

    ?

    1. Using std::find with the single-argument overload of erase is possible, but there are two problems with your solution:

      1) iterators do not convert to bool. You’d have to use if(it != v.end())
      2) this isn’t a deal-breaker, but if you want to keep your data sorted, this won’t work (simple find and erase will, though)

      That being said, I find it easier on eyes to omit the conditional. Plus, it has the additional advantage of erasing more matches in the erroneous situation. While this isn’t beautiful, it is practical.

  5. (see if I can beat the formatting issues – where’s the help link or preview button?)

    Really, all the requirements are met if you just use `std::set<Type, Type::CompareById>` instead of `std::vector<Type>`.

    That way, you keep it ordered, have the uniqueness guarantees and you can simply call


    container.erase(key);

    See http://en.cppreference.com/w/cpp/container/set/erase

    1. I feel really stupid now. That’s a good idea, thanks. That being said, the point about the interface being easy to misuse stands.

      preview: Soon™

    1. You’re right. For now, I am using

      template<typename Container, typename Predicate>
      void erase_if(Container& c, Predicate p)
      {
      	using std::begin;
      	using std::end;
      	using std::remove_if;
       
      	auto realEnd = end(c);
      	auto removedIt = remove_if(begin(c), realEnd, p);
       
      	c.erase(removedIt, realEnd);
      }

      but it doesn’t play well with i.e. std::map

  6. @krzaq but still your implementation of erase_if has UB(?)

    When removeIt will be end then erase.(end,end) is UB. This is what we can read in documentation :(

    1. Can you point the documentation saying this? If you look at cppreference for std::vector, you can read that for the two parameter overload

      The iterator first does not need to be dereferenceable if first==last: erasing an empty range is a no-op.

  7. I was refering to this: http://www.cplusplus.com/reference/vector/vector/erase/
    Removes from the vector either a single element (position) or a range of elements ([first,last)).
    As you can see they write” [ first” (included)

    But yeah you are right probably cppreference is better knowledge base ;)
    Thx a lot!

    Other thoughts:
    Also see sites like cppreference.com and cplusplus.com, which are not authoritative but also can be used for general unofficial reference to answer basic questions about C++.
    From: https://isocpp.org/std/the-standard

    1. Interesting. If you look at the best source – the standard draft you’ll not see anything explicit about empty ranges. That being said, it still denotes [a,b) range, which for a==b is an empty range. I maintain my position that no dereference/UB happens, as deleting an empty range should be a no-op, and there are no elements to be dereferenced.

Leave a Reply to Shafik Yaghmour Cancel reply

Your email address will not be published.