SQL to find World Cup matches with comebacks

Lucas Jellema is writing a great series on SQL applied to the World Cup of football / soccer. In the article about finding matches with comebacks, he challenged readers to “find ‘dramatic comebacks’ where a team was two goals behind at some stage and yet managed to win the game.” Here is my reply, since replying directly on his blog was too challenging for me!

Here is a copy of Lucas’ DDL and test data. I added three rows to his data, one with a draw, one with a scoreless draw and one with a “dramatic comeback”.

drop table match_results;
CREATE TABLE "MATCH_RESULTS" 
   (    "GROUP1" VARCHAR2(1 BYTE), 
    "HOME" NUMBER(1,0), 
    "AWAY" NUMBER(1,0), 
    "HOME_GOALS" NUMBER(2,0), 
    "AWAY_GOALS" NUMBER(2,0), 
    "SDM_ID" NUMBER(2,0), 
    "LOCAL_START_TIME" DATE, 
    "SCORING_PROCESS" VARCHAR2(20 BYTE), 
    "WEATHER_CATEGORY" VARCHAR2(20 BYTE), 
    "ID" NUMBER(2,0) 
   ); 
Insert into MATCH_RESULTS values 
  ('A',1,2,3,1,5,to_date('12-JUN-14','DD-MON-RR'),'1000',null,1); 
Insert into MATCH_RESULTS values
  ('A',3,4,1,0,3,to_date('13-JUN-14','DD-MON-RR'),'0',null,2); 
Insert into MATCH_RESULTS values 
  ('A',1,3,0,0,6,to_date('17-JUN-14','DD-MON-RR'),'0',null,17); 
Insert into MATCH_RESULTS values 
  ('A',4,2,0,4,2,to_date('18-JUN-14','DD-MON-RR'),'1111',null,18); 
Insert into MATCH_RESULTS values 
  ('B',1,2,1,5,12,to_date('13-JUN-14','DD-MON-RR'),'011111',null,3); 
Insert into MATCH_RESULTS values 
  ('B',3,4,3,1,7,to_date('13-JUN-14','DD-MON-RR'),'0010',null,4); 
Insert into MATCH_RESULTS values 
  ('B',1,3,0,2,1,to_date('18-JUN-14','DD-MON-RR'),'11',null,19); 
Insert into MATCH_RESULTS values 
  ('B',4,2,2,3,10,to_date('18-JUN-14','DD-MON-RR'),'10011',null,20); 
Insert into MATCH_RESULTS values 
  ('B',4,2,2,3,10,to_date('18-JUN-14','DD-MON-RR'),'1001',null,21); 
Insert into MATCH_RESULTS values 
  ('B',4,2,2,3,10,to_date('18-JUN-14','DD-MON-RR'),null,null,22); 
Insert into MATCH_RESULTS values 
  ('B',4,2,2,3,10,to_date('18-JUN-14','DD-MON-RR'),'00011111',null,23); 
commit;

The SCORING_PROCESS column shows who scored in what order: for example ‘100’ means one team scored the first goal, then the other team scored 2 goals.

Who won?

Conceptually, the first thing I do is determine who won the game, by counting the number of ‘1’ and the number of ‘0’. When the ‘1’s are ahead, I say the winner is 1; when there are more ‘0’s, I say the winner is -1. For draws, the “winner” is 0; I then remove draws in my WHERE clause.

select id, scoring_process, 
sign(length(replace(scoring_process||'01','0','')) 
   - length(replace(scoring_process||'01','1',''))) 
winner 
from match_results
where sign(length(replace(scoring_process||'01','0','')) 
         - length(replace(scoring_process||'01','1',''))) 
  != 0;
ID SCORING_PROCESS WINNER
1 1000 -1
2 0 -1
17 0 -1
18 1111 1
3 011111 1
4 0010 -1
19 11 1
20 10011 1
23 00011111 1

How far behind did the winner get?

For each row, I go through the SCORING_PROCESS column , turning ‘1’ into the number 1 and ‘0’ into the number -1. This makes it easy to use the SUM() analytic function to calculate who led by how much after each goal. Here is an example using a hard-coded value.

select sum(
  decode(to_number(substr('00011111',level,1)),0,-1,1) 
) over (order by level)
cum_score
from dual 
connect by level <= length('00011111');
CUM_SCORE
-1
-2
-3
-2
-1
0
1
2

In this case, I know the winner was team 1, so I can see that they came back from a 3-goal deficit. Putting it all together, here is a solution that shows all the comebacks and how far behind the winning team got.

select id, scoring_process, 
max(-winner*column_value) max_behind 
from ( 
  select id, scoring_process, 
  sign(length(replace(scoring_process||'01','0','')) 
     - length(replace(scoring_process||'01','1',''))) 
  winner 
  from match_results
  where sign(length(replace(scoring_process||'01','0','')) 
           - length(replace(scoring_process||'01','1',''))) 
    != 0
) mr, 
table(cast(multiset(
  select sum( 
    decode(to_number(substr(mr.scoring_process,level,1)),0,-1,1) 
  ) over (order by level) 
  from dual 
  connect by level <= length(mr.scoring_process)
) as sys.odcinumberlist)) l
group by id, scoring_process
having max(-winner*column_value) > 0;
ID SCORING_PROCESS MAX_BEHIND
23 00011111 3
3 011111 1
1 1000 1
20 10011 1
About these ads

One thought on “SQL to find World Cup matches with comebacks

  1. Hi Stewart,

    I really like your approach. It is elegant and straightforward, just as SQL should be (much better than my own approach, although I can see some similarities). Thanks for posting this article.

    Lucas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s