Longest Consecutive Sequence

October 16, 2019

Given an array of integers, find the length of the longest sequence that contains consecutive integers.

The array is unsorted, and may contain duplicates and negative integers.

For example: the array [100, 5, 1001, 4, 7, 6] should return the answer 4, since the longest sequence is [4, 5, 6, 7].

This is an example of a bookkeeping problem - there isn’t anything special to do except to keep track of the positions of current sequences.

Consider the elements as laid out on the integer number line. As we consider each element from left to right, imagine marking each element’s position on the line. Note that we can ignore duplicates without loss of generality, since duplicates do not contribute to the final answer.

There are four cases to consider. When marking the current element on the number line:

  1. neither its left or right neighbour has been marked.
  2. its right neighbour has been marked, but not its left neighbour.
  3. its left neighbour has been marked, but not its right neighbour.
  4. both neighbours have been marked.

In cases 2 and 3, the current element extends an existing sequence by 1. In case 4, the current element connects two existing sequences together.

Case 2: considering 2, while 3 and 4 have been marked previously
  * ~ ~
1 2 3 4

Case 3: considering 3, while 1 and 2 have been marked previously
~ ~ * 
1 2 3 4

Case 4: considering 3, while 1, 2, and 4 have been marked previously
~ ~ * ~
1 2 3 4

We maintain a map whose keys are the integers to be considered, and a 2-tuple containing the starting and ending integers of a sequence. Since we are only considering consecutive sequences, we can infer the size of a sequence by its start and end.

def longest_consecutive_sequence(nums):
    dic = {}
    output = 0
    for i in range(len(nums)):
        curr = nums[i]
        # ignore duplicates
        if curr in dic:
            continue
        # case 1
        if curr - 1 not in dic and curr + 1 not in dic:
            dic[curr] = [curr, curr]
        # case 2
        elif curr - 1 not in dic:
            right = dic[curr + 1]
            dic[curr] = [curr, right[1]]
            right_edge = dic[right[1]]
            right_edge[0] = curr
        # case 3
        elif curr + 1 not in dic:
            left = dic[curr - 1]
            dic[curr] = [left[0], curr]
            left_edge = dic[left[0]]
            left_edge[1] = curr
        # case 4
        else:
            left = dic[curr - 1]
            right = dic[curr + 1]
            dic[curr] = [left[0], right[1]]

            left_edge = dic[left[0]]
            # we update the leftmost entry's ending integer 
            # to be the right neighbour's ending integer
            left_edge[1] = dic[curr][1]

            right_edge = dic[right[1]]
            # we update the rightmost entry's starting integer 
            # to be the left neighbour's starting integer
            right_edge[0] = dic[curr][0]

        # update the length of the longest sequence
        # the length is just the end - start + 1, since it's consecutive
        output = max(output, dic[curr][1] - dic[curr][0] + 1)
    return output

O(n) in both time and space.

Alternatively, one can also solve it by sorting the array and maintaining a constant number of variables to keep track of the longest sequence seen so far in one pass from left to right. That will be O(nlogn) in time and O(1) space.