Understand Shortest Common Supersequence of two strings Problem

Problem Name: Shortest Common Supersequence of two strings
Problem Description:

Problem: Shortest Common Supersequence Of Two Strings

What is it?

Given two strings, str1 and str2, find the shortest string that contains both str1 and str2 as subsequences.

Definition: A string S is a subsequence of another string T if you can remove some characters from T (without changing the order of the remaining characters) to get S.

Key Points

  • The resulting string must include both input strings as subsequences.
  • There may be more than one correct answer; any valid one is acceptable.
  • A useful relation is:
    Length of SCS = len(str1) + len(str2) - len(LCS(str1, str2))
    where LCS stands for the longest common subsequence.

Examples

Example 1

Input:

str1 = "abac"
str2 = "cab"

Output:

"cabac"

Explanation:

  • "abac" is a subsequence of "cabac" (for example, remove the first "c").
  • "cab" is also a subsequence of "cabac" (for example, remove the last "ac").

Example 2

Input:

str1 = "aaaaaaaa"
str2 = "aaaaaaaa"

Output:

"aaaaaaaa"

Explanation:
Both strings are identical; therefore, the SCS is the same as each string.

Test Case 3

Input:

str1 = "geek"
str2 = "eke"

Possible Output:

"geeke"

Explanation:

  • "geek" is a subsequence of "geeke" (remove the extra "e" in the middle).
  • "eke" is also a subsequence of "geeke".

Test Case 4

Input:

str1 = "AGGTAB"
str2 = "GXTXAYB"

Possible Output:

"AGGXTXAYB"

Explanation:
The output "AGXGTXAYB" is one valid SCS that contains both "AGGTAB" and "GXTXAYB" as subsequences.

Summary

  1. Understand subsequences: Ensure both input strings appear in the output in the same order as they appear in the respective strings.
  2. Leverage the LCS: The length of the SCS can be computed as:
    len(str1) + len(str2) - len(LCS(str1, str2))
    
  3. Return any valid SCS: There may be several shortest common supersequences that satisfy the conditions.

How the DP Array Is Useful

The DP array (or table) plays a crucial role in solving the Shortest Common Supersequence (SCS) problem by first helping us compute the Longest Common Subsequence (LCS) of the two input strings.

Key Points

  • DP Array Construction:
    • We create a 2D array dp with dimensions (m+1) x (n+1), where m and n are the lengths of the two strings.
    • dp[i][j] stores the length of the LCS between the first i characters of str1 and the first j characters of str2.
  • This is computed using the following rules:
    • If str1[i-1] == str2[j-1], then dp[i][j] = dp[i-1][j-1] + 1.
    • Otherwise, dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
  • Why LCS Matters:
    • The LCS represents the sequence of characters common to both strings (in order) and helps us identify overlapping parts.
    • The formula len(SCS) = len(str1) + len(str2) - len(LCS) shows that the more characters the strings share (in order), the shorter the SCS can be.
  • Backtracking Using the DP Array:
    • Once the DP array is filled, we backtrack from dp[m][n] to reconstruct the SCS.
    • Backtracking Strategy:
      • If the characters at str1[i-1] and str2[j-1] are the same, that character is part of the LCS, so we include it in the SCS and move diagonally (i.e., i--, j--).
      • If the characters differ, we compare dp[i-1][j] and dp[i][j-1]:
        • We move in the direction of the higher value, which means we are choosing the string that contributes more to the LCS.
      • After processing both strings, any remaining characters from either string are appended.
      • Since we construct the SCS in reverse order during backtracking, we reverse the result at the end.

Summary

The DP array is useful because it:

  1. Stores Intermediate Results: It calculates the LCS lengths for all possible prefixes, which is essential for determining where the two strings overlap.
  2. Guides the Reconstruction: It helps decide which character to include next when building the SCS, ensuring that the order of characters is preserved.
  3. Minimizes Redundancy: By using the LCS information, we avoid repeating common characters, thus ensuring the supersequence is as short as possible.

In essence, the DP array is the backbone of the solution—it not only provides the length of the LCS (which directly relates to the minimum SCS length) but also guides the process of merging the two strings into one minimal supersequence.

Feel free to use these examples to test and understand your solution!

Category:
  • Leetcode Problem of the Day
  • Arrays
  • Strings
Programming Language:
  • Java
Reference Link:

https://leetcode.com/problems/shortest-common-supersequence/description/

Online IDE

Scroll down for output
Java
Output:

Loading component...

Loading component...

Tracking code (View only. In case you want to track the code, click this button):
Main Function:

Main Function is not defined.

Helper Function:

INPUT: str1 = "AGGTAB" str2 = "GXTXAYB"

Output: "AGGXTXAYB"

public String shortestCommonSupersequence1(String str1, String str2) {

int m = str1.length(), n = str2.length();

int[][] dp = new int[m+1][n+1];

for (int i = 1; i <= m; i++) {

for (int j = 1; j <= n; j++) {

if (str1.charAt(i-1) == str2.charAt(j-1)) {

dp[i][j] = dp[i-1][j-1] + 1;

}//If End

else {

dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);

}//Else End

}//Loop End

}//Loop End

StringBuilder sb = new StringBuilder();

int i = m, j = n;

while (i > 0 && j > 0) {

if (str1.charAt(i-1) == str2.charAt(j-1)) {

sb.append(str1.charAt(i-1));

i--;

j--;

}//If End

else{

if (dp[i-1][j] > dp[i][j-1]) {

sb.append(str1.charAt(i-1));

i--;

}//If End

else {

sb.append(str2.charAt(j-1));

j--;

}//Else End

}//Else End

}//Loop End

while (i > 0) {

sb.append(str1.charAt(i-1));

i--;

}//Loop End

while (j > 0) {

sb.append(str2.charAt(j-1));

j--;

}//Loop End

return sb.reverse().toString();

}//function end

Utility Functions and Global variables:

Utility Function is not required.