Longest Consecutive Sequence

October 16, 2019

Given an array of inte­gers, find the length of the longest sequence that con­tains con­sec­u­tive inte­gers.

The array is unsort­ed, and may con­tain dupli­cates and neg­a­tive inte­gers.

For exam­ple: the array [100, 5, 1001, 4, 7, 6] should return the answer 4, since the longest sequence is [4, 5, 6, 7].

This is an exam­ple of a book­keep­ing prob­lem - there isn’t any­thing spe­cial to do except to keep track of the posi­tions of cur­rent sequences.

Con­sid­er the ele­ments as laid out on the inte­ger number line. As we con­sid­er each ele­ment from left to right, imag­ine mark­ing each ele­men­t’s posi­tion on the line. Note that we can ignore dupli­cates with­out loss of gen­er­al­i­ty, since dupli­cates do not con­tribute to the final answer.

There are four cases to con­sid­er. When mark­ing the cur­rent ele­ment on the number line:

  1. nei­ther its left or right neigh­bour has been marked.
  2. its right neigh­bour has been marked, but not its left neigh­bour.
  3. its left neigh­bour has been marked, but not its right neigh­bour.
  4. both neigh­bours have been marked.

In cases 2 and 3, the cur­rent ele­ment extends an exist­ing sequence by 1. In case 4, the cur­rent ele­ment con­nects two exist­ing sequences togeth­er.

Case 2: considering 2, while 3 and 4 have been marked previously
  * ~ ~
1 2 3 4

Case 3: considering 3, while 1 and 2 have been marked previously
~ ~ * 
1 2 3 4

Case 4: considering 3, while 1, 2, and 4 have been marked previously
~ ~ * ~
1 2 3 4

We main­tain a map whose keys are the inte­gers to be con­sid­ered, and a 2-tuple con­tain­ing the start­ing and ending inte­gers of a sequence. Since we are only con­sid­er­ing con­sec­u­tive sequences, we can infer the size of a sequence by its start and end.

def longest_consecutive_sequence(nums):
    dic = {}
    output = 0
    for i in range(len(nums)):
        curr = nums[i]
        # ignore duplicates
        if curr in dic:
            continue
        # case 1
        if curr - 1 not in dic and curr + 1 not in dic:
            dic[curr] = [curr, curr]
        # case 2
        elif curr - 1 not in dic:
            right = dic[curr + 1]
            dic[curr] = [curr, right[1]]
            right_edge = dic[right[1]]
            right_edge[0] = curr
        # case 3
        elif curr + 1 not in dic:
            left = dic[curr - 1]
            dic[curr] = [left[0], curr]
            left_edge = dic[left[0]]
            left_edge[1] = curr
        # case 4
        else:
            left = dic[curr - 1]
            right = dic[curr + 1]
            dic[curr] = [left[0], right[1]]

            left_edge = dic[left[0]]
            # we update the leftmost entry's ending integer 
            # to be the right neighbour's ending integer
            left_edge[1] = dic[curr][1]

            right_edge = dic[right[1]]
            # we update the rightmost entry's starting integer 
            # to be the left neighbour's starting integer
            right_edge[0] = dic[curr][0]

        # update the length of the longest sequence
        # the length is just the end - start + 1, since it's consecutive
        output = max(output, dic[curr][1] - dic[curr][0] + 1)
    return output

O(n) in both time and space.

Alter­na­tive­ly, one can also solve it by sort­ing the array and main­tain­ing a con­stant number of vari­ables to keep track of the longest sequence seen so far in one pass from left to right. That will be O(nlogn) in time and O(1) space.