Character deletion visualization

3545: Minimum Deletions for At Most K Distinct Characters

Difficulty: Easy
Topics: Strings, Greedy Algorithms, Frequency Counting
Companies: Google, Meta, Amazon
Key Insight: Optimal solution requires keeping the most frequent characters and removing others. Frequency prioritization is crucial for minimal deletions.

Problem Statement

Given a string and maximum allowed distinct characters (k), determine the minimum deletions needed to achieve at most k unique characters.

Example 1

Input: s = "abc", k = 2
Output: 1
Explanation: Remove one character (e.g., 'c')

Example 2

Input: s = "aabb", k = 2
Output: 0
Explanation: Already has 2 distinct characters
Problem Link: View on LeetCode ↗

Approach 1: Frequency-Based Greedy Algorithm

Prioritize keeping characters with highest frequencies to minimize deletions.

Algorithm Steps

  1. Calculate character frequencies
  2. Sort frequencies in descending order
  3. Keep top k frequencies, sum remaining

Complexity Analysis

Time Complexity Space Complexity
O(n + m log m) O(m)
Optimal Solution
class Solution {
public:
    int minDeletions(string s, int k) {
        unordered_map<char, int> freq;
        for(char c : s) freq[c]++;
        
        vector<int> counts;
        for(auto& [ch, cnt] : freq) counts.push_back(cnt);
        
        sort(counts.rbegin(), counts.rend());
        
        int total = 0;
        for(int i = k; i < counts.size(); ++i) {
            total += counts[i];
        }
        return total;
    }
};
class Solution {
    public int minDeletions(String s, int k) {
        Map<Character, Integer> freq = new HashMap<>();
        for(char c : s.toCharArray()) {
            freq.put(c, freq.getOrDefault(c, 0) + 1);
        }
        
        List<Integer> counts = new ArrayList<>(freq.values());
        Collections.sort(counts, Collections.reverseOrder());
        
        int total = 0;
        for(int i = k; i < counts.size(); i++) {
            total += counts.get(i);
        }
        return total;
    }
}
from collections import Counter

class Solution:
    def minDeletions(self, s: str, k: int) -> int:
        freq = Counter(s)
        counts = sorted(freq.values(), reverse=True)
        return sum(counts[k:]) if len(counts) > k else 0
Optimization Insight: Sorting frequencies in descending order ensures we keep the most impactful characters first.

Approach 2: Combinatorial Subset Selection

Brute-force approach suitable for small input constraints (n ≤ 16).

Algorithm Steps

  1. Generate all possible character subsets
  2. Filter subsets with size ≤ k
  3. Calculate deletions for each valid subset
  4. Track minimum deletions across all subsets

Complexity Analysis

Time Complexity Space Complexity
O(2m * m) O(m)
Brute Force Solution
class Solution {
public:
    int minDeletions(string s, int k) {
        unordered_map<char, int> freq;
        for(char c : s) freq[c]++;
        
        vector<char> chars;
        for(auto& [ch, _] : freq) chars.push_back(ch);
        
        int min_del = INT_MAX;
        int m = chars.size();
        
        for(int mask = 0; mask < (1 << m); ++mask) {
            int bits = __builtin_popcount(mask);
            if(bits > k) continue;
            
            int sum = 0;
            for(int i = 0; i < m; ++i) {
                if(mask & (1 << i)) sum += freq[chars[i]];
            }
            min_del = min(min_del, (int)s.size() - sum);
        }
        return min_del;
    }
};
class Solution {
    public int minDeletions(String s, int k) {
        Map<Character, Integer> freq = new HashMap<>();
        for(char c : s.toCharArray()) {
            freq.put(c, freq.getOrDefault(c, 0) + 1);
        }
        
        List<Character> chars = new ArrayList<>(freq.keySet());
        int m = chars.size();
        int minDel = Integer.MAX_VALUE;
        
        for(int mask = 0; mask < (1 << m); mask++) {
            int bits = Integer.bitCount(mask);
            if(bits > k) continue;
            
            int sum = 0;
            for(int i = 0; i < m; i++) {
                if((mask & (1 << i)) != 0) {
                    sum += freq.get(chars.get(i));
                }
            }
            minDel = Math.min(minDel, s.length() - sum);
        }
        return minDel;
    }
}
from itertools import combinations
from collections import Counter

class Solution:
    def minDeletions(self, s: str, k: int) -> int:
        freq = Counter(s)
        distinct = list(freq.keys())
        min_del = float('inf')
        
        for r in range(0, k+1):
            for subset in combinations(distinct, r):
                current = sum(freq[c] for c in subset)
                min_del = min(min_del, len(s) - current)
                
        return min_del
Use Case: This approach is feasible due to small constraints (n ≤ 16), demonstrating brute-force optimization potential.

Approach Comparison

Approach Time Space Best For
Greedy Frequency O(n + m log m) O(m) General case, large inputs
Combinatorial O(2m * m) O(m) Small inputs (m ≤ 16)
Important: Greedy approach is exponentially faster for larger character sets, while combinatorial works for constraint-limited cases.

Edge Case Analysis

1. All Characters Identical

Input: "aaaaa", k = 1 → Output: 0

2. k Equal to Distinct Count

Input: "abcde", k = 5 → Output: 0

3. Empty String Requirement

Input: "aab", k = 0 → Impossible (per constraints)
Critical Note: Always verify if distinct count ≤ k before processing.

Frequently Asked Questions

1. Why does sorting frequencies guarantee optimal solution?

Sorting lets us prioritize characters contributing most to the string length, minimizing needed deletions.

2. How does the combinatorial approach handle duplicates?

It considers distinct characters only, as duplicates are handled through frequency counts.

3. What if k exceeds distinct character count?

Immediate return 0 since no deletions needed (already satisfies condition).

4. Why use bitmask in combinatorial approach?

Efficiently represents character subsets through binary encoding (each bit = character presence).

5. Can we use dynamic programming here?

Possible but unnecessary due to effective greedy solution. DP would add complexity without benefit.

6. How to handle frequency ties?

Arbitrary selection works since tied frequencies contribute equally to total length.

7. What's the worst-case scenario for greedy approach?

When all characters have equal frequency (still O(m log m) sorting time).

8. Why calculate sum(counts[k:])?

Represents frequencies of characters excluded after keeping top k frequent ones.

9. How does input size affect approach choice?

Greedy works for all sizes, combinatorial only viable for m ≤ 16 due to exponential growth.

10. Can we optimize space further?

Yes, by storing frequencies in-place without sorting, but time efficiency would suffer.

Final Analysis: This problem demonstrates the power of frequency analysis in string manipulation problems. The greedy approach provides an optimal O(n) space solution while the combinatorial method offers insight into brute-force possibilities for constrained inputs.