The Silver Lining

Lessons & Learnings from a salesforce certified technical architect.

SOQL ‘Distinct’ Keyword

with 16 comments

It doesn’t exist. Sad but true.

In SQL there is a clause that ensures a duplicate free result set is returned by any query you issue e.g. You want to retrieve a duplicate free list of first names from the table clients. The column holds the following values: Alice, Bob, Alice. In Oracle SQL you would use the following query (and it would return the names Alice and Bob each listed once)

SELECT DISTINCT firstname
FROM clients;

Unfortunately SOQL doesn’t have this capability although there is an Apex trick that can be employed as a work-around. Let’s imagine you have a similar situation as the one above except that now you wish to retrieve the distinct lastnames of contacts. The code would be something like this

public static List<String> getDistinctLastnames(List<String> duplicates){

List<String> distinctLastnames = new List<String>();

for(String lastname: duplicates){

Boolean found = false;

for(Integer i=0; i< distinctLastnames.size(); i++){

if(lastname.equalsIgnoreCase(distinctLastnames[i])){ //Check if current lastname has been added yet

found=true;

break;

}

}

if(!found)

distinctLastnames.add(lastname);

}

return distinctLastnames;

}

You might be thinking, ‘Wes you silly chap, why not simply use a Set to store the lastnames?’. Well my friend a Set would consider Strings of different cases to be unique and thus ‘Wes’ and ‘wes’ would be stored as two values. The code above ensures that lastnames that have different cases are considered to be the same value. With some dynamic Apex and perhaps a tweak to the method signature you could make the above method generic, and use it within an utility class thereby having this functionality available whenever necessary. In fact I recommend it.

Of course the situation becomes slightly more complex should you require to distinct two or more fields at a time. Let’s say you now want distinct fullnames (a combination of firstname and lastname) for all contacts. In this case I would suggest using a Map<Id,String> where the String value would be a concatenation of the firstname and lastname and the key the Id of the object. The solution in this case would be slightly less ‘clean’ but at the time of writing I don’t think there’s any cleaner alternative.

Advertisement

Written by Wes

June 28, 2009 at 3:54 pm

Posted in Apex, SOQL

Tagged with , , , , ,

16 Responses

Subscribe to comments with RSS.

  1. Hi Wes

    Can you drop me an email please? The comment form on your website doesn’t work.

    Thanks
    Jon

    Jon Mountjoy

    June 30, 2009 at 12:39 pm

  2. Nice pattern, i like it and think this would be very useful on my next project,

    Thanks

    Ron Hess

    July 8, 2009 at 4:34 pm

  3. How would this perform on a very large list? I’m guessing this algorithm is non-linear in execution times, i.e. double the list size and you quadruple the execution time.
    Wouldn’t keeping the output list in alphabetical order and using a binary-chop type of finding a match perform better with large lists? (like the bubble-sort algorithm is optimal for sorting)
    If the max list size in SF is small then I guess it isn’t an issue. Also if the time-cost of adding an item to the middle of a large list increases with the size of the list then it may be a non-starter.

    Thoughts anyone?

    Colin Kenworthy

    October 1, 2009 at 7:14 pm

    • Interesting.. but we’d still have to parse the entire list to ensure we didn’t miss any values, in this example at least. The advantage of a bubble-sort algorithm is realised because at least sometimes you won’t need to look through the entire list of artifacts to find your special one. Does this make sense? Or have a misunderstood your proposition?

      Wes

      October 1, 2009 at 7:24 pm

      • Agreed, you would have to parse ‘duplicates’ entirely. However if you maintained ‘distinctLastnames’ in alphabetical order you would be able to use a binary-chop lookup to determine if the lastname was already in the list. e.g. for a list size of 1000 you would be able to determine if your lastname was present in a maximum of 10 lookups. In your code you would average 500 lookups for the same result.

        e.g. current lastname=’Smith’ and ‘distinctLastnames’ now has 1000 lastnames in it.

        1) I could just go through ‘distinctLastnames’ 1 by 1 until I found a match or exhausted the list. On average this would take 500 comparisons.

        2) If ‘distinctLastnames’ is maintained in ascending order I could check ‘distinctLastnames’ entry 500, if it was < 'Smith' then I would check entry 750, if it was ‘Smith’ then I would look at entry 812, and so on until I had a match or ran out of a range to divide. By this method I would know within 10 comparisons *every time*. If not present I also have the index to insert the lastname into ‘distinctLastnames’

        distinctLastnames.add(834,lastname)

        Does that make sense?

        Colin Kenworthy

        October 1, 2009 at 11:10 pm

      • It does indeed. A marvelous suggestion. The ‘searching’ part could definitely do with some tinkering and I suppose could be swapped out quite easy with a more efficient search algorithm. Nice work!

        Wes

        October 1, 2009 at 11:17 pm

  4. Lets say we create a set of Strings. In the set, before adding any value, we convert it into Uppercase. So all my values in the set would be in Uppercase. Similarly before comparing a new string, I would convert it to Uppercase and then compare.
    I guess using this, the algorithm can be made linear.

    Rajesh Shah

    December 1, 2009 at 10:08 pm

  5. great info

    waqas

    February 2, 2010 at 2:11 pm

  6. My situation is a little different …

    List opps = new List();
    for(Opportunity opp: [select name, Account.name ID limit 10000]){
    //here I want to only add an opp if this account is //not in my list.
    // ideally opps.contains(account.Name);
    opps.add(opp);
    return opps;
    }

    —-

    How do I accomplish this without using another for loop, (O(n^2)) worst case is 10K * 10K records …

    blissme

    April 1, 2010 at 6:24 pm

    • You could maintain a set of the account names (or better ids) e.g.

      // This is all pseudocode I’m assuming
      Set accs = new Set();
      List opps = new List();
      for(Opportunity opp: [select name, Account.name ID limit 10000]){
      if(!accs.contains(account.name) { // or if not contains account.id
      opps.add(opp);
      accs.add(Account.name);
      }
      }
      return opps;

      Wes

      April 1, 2010 at 8:14 pm

  7. does anyone have a good example in vb?

    I am pulling data from an api into a list and the query language does not have a distinct command, so I will have to do this after the list has been created..any ideas would be helpful (in vb please)

    Zach

    April 22, 2010 at 1:10 am

  8. Thing that still sucks is that you have to query all records. You may end up querying hundreds of rows only to get the 10 values you need. Off to the idea exchange… http://sites.force.com/ideaexchange/apex/ideaview?id=08730000000Brr2AAC

    Jason

    September 3, 2010 at 9:29 pm

  9. Just a food for thought – Can’t we put it this way:

    List lastnames = new List();
    for(AggregatedResult ar : [select lastname, count(Id) from Contact group by lastname])
    {
    lastnames.add(string.valueof(ar.get(‘lastname’)));
    }

    I think this way can eliminate the bulk of the processing in apex if not the number or rows fetched by query – just a food for thought.

    Mustafa

    Mustafa

    December 27, 2011 at 8:56 am

    • Great solution 🙂 I wrote this code before aggregate queries existed so yes you’re solution will in most cases be better.

      Wes

      December 27, 2011 at 12:32 pm

      • Brilliant solution thanks.

        Only thing that I had to change was AggregatedResult (This gave an invalid Type error) –> AggregateResult.

        JamesGally

        March 2, 2012 at 1:11 pm

  10. […] this possible? I’ve seen other links, such as this one by Wes Nolte, but am not sure it applies to this situation. Should I use a map for this scenario? […]


Leave a Reply to Ron Hess Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: