Mongo find duplicates for entries for two or more fields

Question

You can easily identify the duplicates by running the following aggregation pipeline operation:

db.collection.aggregate([
    { 
        "$group": { 
            "_id": { "foreing": "$foreing", "value": "$value" }, 
            "uniqueIds": { "$addToSet": "$_id" },
            "count": { "$sum": 1 } 
        }
    }, 
    { "$match": { "count": { "$gt": 1 } } }
])

The $group operator in the first step is used to group the documents by the foreign and value key values and then create an array of _id values for each of the grouped documents as the uniqueIds field using the $addToSet operator. This gives you an array of unique expression values for each group. Get the total number of grouped documents to use in the later pipeline stages with the $sum operator.

In the second pipeline stage, use the $match operator to filter out all documents with a count of 1. The filtered-out documents represent unique index keys.

The remaining documents will be those in the collection that have duplicate key values for pair foreing & value.

Leave a Comment Cancel reply