Question: push_back/emplace_back a shallow copy of an object into another vector

Question

push_back/emplace_back a shallow copy of an object into another vector

Answers 4
Added at 2017-01-05 07:01
Tags
Question

Say I have the following code

class Car {
    public:
        string color;
        string name;
        Car(string c, string n): color(c), name(n){}            
}

int main() {
    vector<Car> collection1;
    vector<Car> collection2;
    collection1.emplace_back("black", "Ford");
    collection1.emplace_back("white", "BMW");
    collection1.emplace_back("yellow", "Audi");

    //Question comes here
    collection2.push_back(collection1[0]);

}

Now I believe this does a deep copy of collection1[0]. I have tried using collection2.emplace_back(move(collection1[0])), but then the data fields of collection1[0] would be gone. I just want this "black Ford" to exist in both vectors, and changes made to this particular object, through either vector, would be reflected on both vectors.

I am guessing that for a vector of real objects, the elements of this vector take actual memory. So an element of collection1 must be independent on any element of collection2. I think the easiest way is to let collection1 and collection2 to be vectors of pointers and point to the same vector of Car. But is there any possible means to make the above code work, without using vector of pointers. I ultimately want to return both of these collections back to the previous function, so making vectors of pointers is meaningless.

In short, I want to simulate the List.append() method in python.

collection1 = [Car("black", "Ford"),Car("white", "BMW"),Car("yellow", "Audi")]
collection2 = []
collection2.append(collection1[0])
collection2[0].color = "blue" // This affects collection1 as well
Answers
nr: #1 dodano: 2017-01-05 08:01

Since you mentioned you don't like pointers, you can use references, but vectors can't store references (because they aren't copyable and assignable). However std::reference_wrapper wraps a reference in copyable and assignable object.

std::reference_wrapper is a class template that wraps a reference in a copyable, assignable object. It is frequently used as a mechanism to store references inside standard containers (like std::vector) which cannot normally hold references.

source: http://en.cppreference.com/w/cpp/utility/functional/reference_wrapper

vector<Car> collection1;
collection1.emplace_back("black", "Ford");
collection1.emplace_back("white", "BMW");
collection1.emplace_back("yellow", "Audi");

vector<std::reference_wrapper<Car>> collection2{collection1.begin(),
                                                collection1.end()};

Using this way, collection2 refers to the same objects as collection1 does. For example:

collection1[0].name = "frogatto!";
std::cout << collection2[0].get().name;
// prints 'frogatto!'

Important:

Note that using this way is discouraged since you must have another entity that manages insertion and removal to/from collection1 and takes appropriate actions on collection2. @Serge Ballesta's answer is better than mine. Use std::shared_ptr. Try to love and embrace pointers :)

nr: #2 dodano: 2017-01-05 08:01

In C++ languages, standard collections actually contain the object while in other languages like Python or Java, they actually contain references (or pointers) to objects that are stored elsewhere. But as C++ does not include garbage collection, the lifetime of the object must be explicitly managed elsewhere.

The consequence of that design is that, to allow the same object to be used in two different collections, you must use collections of pointers or references (beware, C++ does not directly allow collections of references; however, std::ref was created for that).

Depending on your use case, you could either use raw pointers (if the lifetime of the actual objects is already managed), or use smart pointers (here, std::shared_ptr) that internally manage a reference count to ensure that the object is automatically destroyed when the last shared_ptr is destroyed. This is not far from Python's references to objects, provided you are aware that the destruction of the last shared_ptr will actually destroy the object(*). Said differently, do not keep any other pointers or references to it if you do not want it to become dangling.

Alternatively, if the collections are not symmetric—that is, if one will actually contain all the objects, while the other will only contain references to objects from the former one—references will be your best bet, and the second collection could be a std::vector<std::reference_wrapper<Car>>.


Addition per MvG comment.

There is a possibly annoying difference between Python objects and C++ shared_ptr. Python has a full garbage collector that is clever enough to detect circular references and destroys the cycle as soon as there are no external references. Example:

>>> b = ['x']
>>> a = ['y']
>>> b.append(a)
>>> a.append(b)
>>> a
['y', ['x', [...]]]
>>> b
['x', ['y', [...]]]

a contains a ref to b which contains a ref to a...

If a is deleted (or goes out of scope) b will still contain the full chain

>>> del a
>>> b
['x', ['y', [...]]]

but if both a and b are delete (or go out of scope), the gc will detect that there is no more external ref and will destroy everything.

Unfortunately, if you manage to build a cycle of C++ objects using std::shared_ptr as it only uses local ref counting, each object will have a ref to the other and they will never be deleted even when they will go out of scope which will lead to a memory leak. An example of that:

struct Node {
    int val;
    std::shared_ptr<Node> next;
};

a = make_shared<Node>();  // ref count 1
b = make_shared<Node>();
a.next = std::shared_ptr<Node>(b);
b.next = std::shared_ptr<Node>(a); // ref count 2!

Hell has come here: even when a and b will both go out of scope, the ref count will still be one and the shared pointers will never delete their objects, what should have normally happend without the circular reference. The programmer must explicitely deals with that and break the cycle (and forbid it to happen). For example b.next = make_shared<Node>(); before b goes out of scope would be enough.

nr: #3 dodano: 2017-01-05 09:01

Similar but different to what others have said about using std::reference_wrapper<T> This maybe of use, but someone has also mentioned this in the comments below your question and that is using smart pointers, the only difference here is that I happen to take it a step farther by creating a template wrapper class. Here is the code and it should do what you are looking for except for this is working on the heap instead of using references.

#include <iostream>
#include <memory>
#include <string>
#include <vector>

class Car {
public:
    std::string color;
    std::string name;
    Car(){}  // Added Default Constructor to be safe.
    Car( std::string colorIn, std::string nameIn ) : color( colorIn ), name( nameIn ){}
};

template<class T>
class Wrapper {
public:
    std::shared_ptr<T> ptr;

    explicit Wrapper( T obj ) {
        ptr = std::make_shared<T>( T( obj ) );
    }

    ~Wrapper() {
        ptr.reset();
    }
};

int main () {

    std::vector<Wrapper<Car>> collection1;
    std::vector<Wrapper<Car>> collection2;

    collection1.emplace_back( Car("black", "Ford") );
    collection1.emplace_back( Car("white", "BMW") );
    collection1.emplace_back( Car("yellow", "Audi") );

    collection2.push_back( collection1[0] );

    std::cout << collection2[0].ptr->color << " " << collection2[0].ptr->name << std::endl;

    collection2[0].ptr->color = std::string( "green" );
    collection2[0].ptr->name  = std::string( "Gremlin" );

    std::cout << collection1[0].ptr->color << " " << collection1[0].ptr->name << std::endl;

    return 0;
}

If you noticed in the code I changed collection 2's first index object's fields and then I printed out collection 1's first index object's fields and they were changed. So what happens in one collection will happen in another because they are shared memory by using std::shared_ptr<T> the only reason I put it in the wrapper is so that its constructor will make new memory for you upon construction so that you wouldn't have to do this every time; the template wrapper class does this for you and you don't have to worry about cleaning up the memory because std::shared_ptr<T>'s destructor should do that for you, but to be safe I did call the shared_ptr<T>'s release method in the Wrapper's destructor.

To make this a little cleaner or more readable you can do this instead:

typedef Wrapper<Car> car;

std::vector<car> collection1;
std::vector<car> collection2;

// rest is same

And it will do the same for you.

Now if you don't want to use pointers or the heap you could create another wrapper yourself which would be similar to std::refrence_wrapper<T> you can write your own template wrapper for references that is very simple to use. Here is an example:

template<class T>
class Wrapper2 {
public:
    T& t;
    explicit Wrapper2( T& obj ) : t(obj) {} 
};

Then in your source you would do the same as above and it still works

typedef Wrapper2<Car> car2;
std::vector<car2> coll1;
std::vector<car2> coll2;

coll1.emplace_back( Car( "black", "Ford" ) );
coll1.emplace_back( Car( "white", "BMW" ) );
coll1.emplace_back( Car( "yellow", "Audi" ) );

coll2.push_back( coll1[0] );

std::cout << coll2[0].t.color << " " << coll2[0].t.name << std::endl;

coll2[0].t.color = std::string( "brown" );
coll2[0].t.name  = std::string( "Nova" );

std::cout << coll1[0].t.color << " " << coll1[0].t.name << std::endl;

And by modifying coll2's first indexed object's fields, coll1's first indexed object's fields are being changed as well.

EDIT

@Caleth asked me this in the comments:

What benefit is Wrapper over just shared_ptr here? (and Wrapper2 over reference_wrapper)

Without this wrapper look at this code here:

class Blob {
public:
    int blah;
    Blob() : blah(0) {}
    explicit Blob( int blahIn ) : blah( blahIn ) {}
};


void someFunc( ... ) {
    std::vector<std::shared_ptr<Blob>> blobs;        
    blobs.push_back( std::make_shared<Blob>( Blob( 1 ) ) );
    blobs.push_back( std::make_shared<Blob>( Blob( 2 ) ) );
    blobs.push_back( std::make_shared<Blob>( Blob( 3 ) ) );
}

Yeah it is readable but a lot of repetitious typing, now with the wrapper

void someFunc( ... ) {   
    typedef Wrapper<Blob> blob;        
    std::vector<blob> blobs;
    blobs.push_back( Blob( 1 ) );
    blobs.push_back( Blob( 2 ) );
    blobs.push_back( Blob( 3 ) );
}

Now as for the Wrapper to just a reference; try doing this:

void someFunc( ... ) {
    std::vector<int&> ints; // Won't Work     
}

However creating a class template that stores a reference to obj T you can now do this:

void someFunc( ... ) {
    typedef Wrapper2<Blob> blob;
    std::vector<blob> blobs;

    blobs.push_back( Blob( 1 ) );
    blobs.push_back( Blob( 2 ) );

    // then lets create a second container
    std::vector<blob> blobs2;
    // Push one of the reference objects in container 1 into container two
    blobs2.push_back( blobs[0] );
    // Now blobs2[0] contains the same referenced object as blobs[0]
    // blobs[0].t.blah = 1, blobs[1].t.blah = 2 and blobs2[0].t.blah = 1
    // lets change blobs2[0].t.blah value
    blobs2[0].t.blah = 4;

    // Now blobs1[0].t.blah also = 4.
}

You couldn't do this before with references in std::vector<T> unless if you used std::reference_wrapper<T> which does basically the same thing but much more convoluted. So for simple objects having your own wrappers can come in handy.

EDIT - Something that I had overlooked and didn't catch this when working in my IDE because everything compiled, built and ran successfully but it has come to my attention that the OP of this question should completely disregard my second wrapper. This can lead to Undefined Behavior. So you can still use the 1st wrapper of the smart pointer or if you need storable references as others had already pointed out definitely use std::some_container<std::reference_wrapper<T>>. I will leave the existing code above for a historical reference for others to learn from. I do thank those involved into pointing out the Undefined Behavior. And for those who don't know please take into consideration that I have no formal training and that I'm 100% self taught and still learning. You can also refer to this question that I had asked concerning references and undefined behavior here: undefined behavior of references on stack

Conclusion

Trying to use references of the same objects in multiple containers can be a bad idea because it can lead to Undefined Behavior when something is added or removed from either container leaving dangling references. So the proper or safer choice would be to use std::shared_ptr<T> to achieve the functionality that you want.

There is nothing wrong with using references but special care and design does need to be taken into consideration especially about the lifetime of the objects that are being referenced. If the objects are moved and then the reference is being accessed afterwards this will lead to problems, but if you know the lifetime of the object and that it will not be moved or destroyed, then accessing the references is not a problem. I'd still suggest using std::shared_ptr or std::reference_wrapper

nr: #4 dodano: 2017-01-05 11:01

The short answer: you cannot fully emulate python in C++.

Unlike python variables, C++ variables are real objects, not mere references (to objects), the copying of which does nothing to the underlying object and hence is always shallow (moreover, python uses type erasure to allow its variables to reference any possible object).

In C++, this very same design can also be achieved. Since an object must stay alive as long as any reference to it still exists, but be removed (and any memory freed) as soon as the last reference goes out of scope, such objects are shared. The C++ way to deal with shared objects is through std::shared_ptr<T>. See below for the reason why this must be a pointer-like object rather than a reference-like object (such as python variables).

Thus, using C++ in the C++ way, your code would be

std::vector<std::shared_ptr<Car>> collection1, collection2;

collection1.push_back(std::make_shared<Car>("black", "Ford"));
collection1.push_back(std::make_shared<Car>("white", "BMW"));
collection1.push_back(std::make_shared<Car>("yellow", "Audi"));

collection2.push_back(collection1[0]);
collection2[0]->color = "blue";
std::cout<<collection1[0]->color;   // "blue"

std::shared_ptr<T> behave like pointers, but this is very similar to a reference (which differs from a pointer in its syntax, but is implemented the same).


Note that it is impossible to design in C++ a corresponding shared_reference<T> with the same functionality as python variables, i.e. like std::shared_ptr<T> but using . instead of -> and with the guarantee of a valid object (no null/empty reference/pointer). The reason is that the . operator cannot be overloaded. For example

template<typename T>
struct shared_reference
{
  template<typename...Args>
  shared_reference(Args&&...args)
  : ptr(std::make_shared<T>(std::forward<Args>(args)...)) {}
private:
  std::shared_ptr<T> ptr;
};

then there is way we can make code like

shared_reference<car> Car;
Car.color = "blue";

work. This is simply how C++ is. It means that for indirection you should use pointers.

Source Show
◀ Wstecz