Understand Shortest Common Supersequence of two strings Problem

Problem: Shortest Common Supersequence Of Two Strings

What is it?

Given two strings, str1 and str2, find the shortest string that contains both str1 and str2 as subsequences.

Definition: A string S is a subsequence of another string T if you can remove some characters from T (without changing the order of the remaining characters) to get S.

Key Points

  • The resulting string must include both input strings as subsequences.
  • There may be more than one correct answer; any valid one is acceptable.
  • A useful relation is:
    Length of SCS = len(str1) + len(str2) - len(LCS(str1, str2))
    where LCS stands for the longest common subsequence.

Examples

Example 1

Input:

str1 = "abac"
str2 = "cab"

Output:

"cabac"

Explanation:

  • "abac" is a subsequence of "cabac" (for example, remove the first "c").
  • "cab" is also a subsequence of "cabac" (for example, remove the last "ac").

Example 2

Input:

str1 = "aaaaaaaa"
str2 = "aaaaaaaa"

Output:

"aaaaaaaa"

Explanation:
Both strings are identical; therefore, the SCS is the same as each string.

Test Case 3

Input:

str1 = "geek"
str2 = "eke"

Possible Output:

"geeke"

Explanation:

  • "geek" is a subsequence of "geeke" (remove the extra "e" in the middle).
  • "eke" is also a subsequence of "geeke".

Test Case 4

Input:

str1 = "AGGTAB"
str2 = "GXTXAYB"

Possible Output:

"AGGXTXAYB"

Explanation:
The output "AGXGTXAYB" is one valid SCS that contains both "AGGTAB" and "GXTXAYB" as subsequences.

Summary

  1. Understand subsequences: Ensure both input strings appear in the output in the same order as they appear in the respective strings.
  2. Leverage the LCS: The length of the SCS can be computed as:
    len(str1) + len(str2) - len(LCS(str1, str2))
    
  3. Return any valid SCS: There may be several shortest common supersequences that satisfy the conditions.

How the DP Array Is Useful

The DP array (or table) plays a crucial role in solving the Shortest Common Supersequence (SCS) problem by first helping us compute the Longest Common Subsequence (LCS) of the two input strings.

Key Points

  • DP Array Construction:

    • We create a 2D array dp with dimensions (m+1) x (n+1), where m and n are the lengths of the two strings.
    • dp[i][j] stores the length of the LCS between the first i characters of str1 and the first j characters of str2.
  • This is computed using the following rules:

    • If str1[i-1] == str2[j-1], then dp[i][j] = dp[i-1][j-1] + 1.
    • Otherwise, dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
  • Why LCS Matters:

    • The LCS represents the sequence of characters common to both strings (in order) and helps us identify overlapping parts.
    • The formula len(SCS) = len(str1) + len(str2) - len(LCS) shows that the more characters the strings share (in order), the shorter the SCS can be.
  • Backtracking Using the DP Array:

    • Once the DP array is filled, we backtrack from dp[m][n] to reconstruct the SCS.
    • Backtracking Strategy:
      • If the characters at str1[i-1] and str2[j-1] are the same, that character is part of the LCS, so we include it in the SCS and move diagonally (i.e., i--, j--).
      • If the characters differ, we compare dp[i-1][j] and dp[i][j-1]:
        • We move in the direction of the higher value, which means we are choosing the string that contributes more to the LCS.
      • After processing both strings, any remaining characters from either string are appended.
      • Since we construct the SCS in reverse order during backtracking, we reverse the result at the end.

Summary

The DP array is useful because it:

  1. Stores Intermediate Results: It calculates the LCS lengths for all possible prefixes, which is essential for determining where the two strings overlap.
  2. Guides the Reconstruction: It helps decide which character to include next when building the SCS, ensuring that the order of characters is preserved.
  3. Minimizes Redundancy: By using the LCS information, we avoid repeating common characters, thus ensuring the supersequence is as short as possible.

In essence, the DP array is the backbone of the solution—it not only provides the length of the LCS (which directly relates to the minimum SCS length) but also guides the process of merging the two strings into one minimal supersequence.

Feel free to use these examples to test and understand your solution!

Category:
  • Leetcode Problem of the Day
  • Arrays
  • Strings
Programming Language:
  • Java
Reference Link:

https://leetcode.com/problems/shortest-common-supersequence/description/

Java
Output:

Loading component...

Loading component...