187. Repeated DNA Sequences

The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.

  • For example, “ACGAATTCCG” is a DNA sequence.
    When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

哈希表 + 滑动窗口 + 位操作。
将四个字符映射到四个二进制字符上。这样字符串就可以用20bit表示。
这样就可以用一个整数来表示这四个字符。
然后采用哈希表记录出现次数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Solution {
static final int L = 10;
public List<String> findRepeatedDnaSequences(String s) {
List<String> ret = new ArrayList<String>();
if(s.length() < L) return ret;
HashMap<Integer, Integer> map = new HashMap<>();
int left = 0;
int right = 10;

int[] bin = new int[26];
bin['A'-'A'] = 0;
bin['T'-'A'] = 1;
bin['C'-'A'] = 2;
bin['G'-'A'] = 3;

int x = 0;
for(int i = 0; i < L-1; i++){
x = (x << 2) | bin[s.charAt(i)-'A'];
}

for(int i = 0; i <= s.length() - L; i++){
x = ((x << 2) | bin[s.charAt(i + L -1)-'A']) & ((1 << L * 2) - 1);
map.put(x, map.getOrDefault(x, 0)+1);
if(map.get(x) == 2) ret.add(s.substring(i, i+L));
}
return ret;
}
}

遍历将字字符串加入哈希表并记录出现次数,然后返回出现次数大于1的字符串。
注意在循环时就可以直接添加结果到列表,这样可以减少操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> ret = new ArrayList<String>();
HashMap<String, Integer> map = new HashMap<>();
int left = 0;
int right = 10;

while(right <= s.length()){
String sub = s.substring(left, right);
map.put(sub, map.getOrDefault(sub, 0)+1);
if(map.get(sub) == 2) ret.add(sub);
left++;
right++;
}
return ret;
}
}
Author

Xander

Posted on

2022-04-24

Updated on

2022-04-24

Licensed under

Comments