SOQL ‘Distinct’ Keyword

It doesn’t exist. Sad but true.

In SQL there is a clause that ensures a duplicate free result set is returned by any query you issue e.g. You want to retrieve a duplicate free list of first names from the table clients. The column holds the following values: Alice, Bob, Alice. In Oracle SQL you would use the following query (and it would return the names Alice and Bob each listed once).

SELECT DISTINCT firstname
FROM clients;

Unfortunately SOQL doesn’t have this capability although there is an Apex trick that can be employed as a work-around.

Let’s imagine you have a similar situation as the one above except that now you wish to retrieve the distinct lastnames of contacts. The code would be something like this:

public static List<String> getDistinctLastnames(List<String> duplicates){

    List<String> distinctLastnames = new List<String>();

    for(String lastname: duplicates){

    Boolean found = false;

    for(Integer i=0; i< distinctLastnames.size(); i++){

        if(lastname.equalsIgnoreCase(distinctLastnames[i])){ //Check if current lastname has been added yet

            found=true;

            break;

        }

    }

    if(!found) distinctLastnames.add(lastname);

    }

    return distinctLastnames;

}

You might be thinking, ‘Wes you silly chap, why not simply use a Set to store the lastnames?’. Well my friend a Set would consider Strings of different cases to be unique and thus ‘Wes’ and ‘wes’ would be stored as two values. The code above ensures that lastnames that have different cases are considered to be the same value. With some dynamic Apex and perhaps a tweak to the method signature you could make the above method generic, and use it within an utility class thereby having this functionality available whenever necessary. In fact I recommend it.

Of course the situation becomes slightly more complex should you require to distinct two or more fields at a time. Let’s say you now want distinct fullnames (a combination of firstname and lastname) for all contacts. In this case I would suggest using a Map<Id,String> where the String value would be a concatenation of the firstname and lastname and the key the Id of the object. The solution in this case would be slightly less ‘clean’ but at the time of writing I don’t think there’s any cleaner alternative.

27 thoughts on “SOQL ‘Distinct’ Keyword”

  1. How would this perform on a very large list? I’m guessing this algorithm is non-linear in execution times, i.e. double the list size and you quadruple the execution time.
    Wouldn’t keeping the output list in alphabetical order and using a binary-chop type of finding a match perform better with large lists? (like the bubble-sort algorithm is optimal for sorting)
    If the max list size in SF is small then I guess it isn’t an issue. Also if the time-cost of adding an item to the middle of a large list increases with the size of the list then it may be a non-starter.

    Thoughts anyone?

    Reply
    • Interesting.. but we’d still have to parse the entire list to ensure we didn’t miss any values, in this example at least. The advantage of a bubble-sort algorithm is realised because at least sometimes you won’t need to look through the entire list of artifacts to find your special one. Does this make sense? Or have a misunderstood your proposition?

      Reply
      • Agreed, you would have to parse ‘duplicates’ entirely. However if you maintained ‘distinctLastnames’ in alphabetical order you would be able to use a binary-chop lookup to determine if the lastname was already in the list. e.g. for a list size of 1000 you would be able to determine if your lastname was present in a maximum of 10 lookups. In your code you would average 500 lookups for the same result.

        e.g. current lastname=’Smith’ and ‘distinctLastnames’ now has 1000 lastnames in it.

        1) I could just go through ‘distinctLastnames’ 1 by 1 until I found a match or exhausted the list. On average this would take 500 comparisons.

        2) If ‘distinctLastnames’ is maintained in ascending order I could check ‘distinctLastnames’ entry 500, if it was < 'Smith' then I would check entry 750, if it was ‘Smith’ then I would look at entry 812, and so on until I had a match or ran out of a range to divide. By this method I would know within 10 comparisons *every time*. If not present I also have the index to insert the lastname into ‘distinctLastnames’

        distinctLastnames.add(834,lastname)

        Does that make sense?

        Reply
        • It does indeed. A marvelous suggestion. The ‘searching’ part could definitely do with some tinkering and I suppose could be swapped out quite easy with a more efficient search algorithm. Nice work!

          Reply
  2. Lets say we create a set of Strings. In the set, before adding any value, we convert it into Uppercase. So all my values in the set would be in Uppercase. Similarly before comparing a new string, I would convert it to Uppercase and then compare.
    I guess using this, the algorithm can be made linear.

    Reply
  3. My situation is a little different …

    List opps = new List();
    for(Opportunity opp: [select name, Account.name ID limit 10000]){
    //here I want to only add an opp if this account is //not in my list.
    // ideally opps.contains(account.Name);
    opps.add(opp);
    return opps;
    }

    —-

    How do I accomplish this without using another for loop, (O(n^2)) worst case is 10K * 10K records …

    Reply
    • You could maintain a set of the account names (or better ids) e.g.

      // This is all pseudocode I’m assuming
      Set accs = new Set();
      List opps = new List();
      for(Opportunity opp: [select name, Account.name ID limit 10000]){
      if(!accs.contains(account.name) { // or if not contains account.id
      opps.add(opp);
      accs.add(Account.name);
      }
      }
      return opps;

      Reply
  4. does anyone have a good example in vb?

    I am pulling data from an api into a list and the query language does not have a distinct command, so I will have to do this after the list has been created..any ideas would be helpful (in vb please)

    Reply
  5. does anyone have a good example in vb?

    I am pulling data from an api into a list and the query language does not have a distinct command, so I will have to do this after the list has been created..any ideas would be helpful (in vb please)

    Reply
  6. Just a food for thought – Can’t we put it this way:

    List lastnames = new List();
    for(AggregatedResult ar : [select lastname, count(Id) from Contact group by lastname])
    {
    lastnames.add(string.valueof(ar.get(‘lastname’)));
    }

    I think this way can eliminate the bulk of the processing in apex if not the number or rows fetched by query – just a food for thought.

    Mustafa

    Reply
  7. Just a food for thought – Can’t we put it this way:

    List lastnames = new List();
    for(AggregatedResult ar : [select lastname, count(Id) from Contact group by lastname])
    {
    lastnames.add(string.valueof(ar.get(‘lastname’)));
    }

    I think this way can eliminate the bulk of the processing in apex if not the number or rows fetched by query – just a food for thought.

    Mustafa

    Reply
  8. I know this is a very very VERY old post 🙂
    However, I wanted to drop a note for anyone interested in this now. We can easily avoid the nested FOR loops by using a map to check if a particular name has already been found in the list so far. And for this, we would not need to sort the list as well.

    Following is the approach:
    While looping through the duplicates list,
    1. Convert the last name to lowercase and use it as a key for the map. The value for the map will be the actual last name.
    2. Check if the map already contains a value for this key.
    3. If not, then add this value to the map.
    4. Finally after the loop ends, the values stored in the map are distinct values.

    Map mapDistinctLastNames = new Map();
    List distinctLastnames = new List();
    for(String lastName : duplicates){
    String key = lastName.toLowerCase();
    if(mapDistinctLastNames.get(key) == null){
    mapDistinctLastNames.put(key, lastName);
    }
    }

    distinctLastnames = mapDistinctLastNames.values();

    Reply

Leave a Comment