The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.
For example, “ACGAATTCCG” is a DNA sequence. When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
哈希表 + 滑动窗口 + 位操作。 将四个字符映射到四个二进制字符上。这样字符串就可以用20bit表示。 这样就可以用一个整数来表示这四个字符。 然后采用哈希表记录出现次数。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 class Solution { static final int L = 10 ; public List<String> findRepeatedDnaSequences (String s) { List<String> ret = new ArrayList <String>(); if (s.length() < L) return ret; HashMap<Integer, Integer> map = new HashMap <>(); int left = 0 ; int right = 10 ; int [] bin = new int [26 ]; bin['A' -'A' ] = 0 ; bin['T' -'A' ] = 1 ; bin['C' -'A' ] = 2 ; bin['G' -'A' ] = 3 ; int x = 0 ; for (int i = 0 ; i < L-1 ; i++){ x = (x << 2 ) | bin[s.charAt(i)-'A' ]; } for (int i = 0 ; i <= s.length() - L; i++){ x = ((x << 2 ) | bin[s.charAt(i + L -1 )-'A' ]) & ((1 << L * 2 ) - 1 ); map.put(x, map.getOrDefault(x, 0 )+1 ); if (map.get(x) == 2 ) ret.add(s.substring(i, i+L)); } return ret; } }
遍历将字字符串加入哈希表并记录出现次数,然后返回出现次数大于1的字符串。 注意在循环时就可以直接添加结果到列表,这样可以减少操作。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class Solution { public List<String> findRepeatedDnaSequences (String s) { List<String> ret = new ArrayList <String>(); HashMap<String, Integer> map = new HashMap <>(); int left = 0 ; int right = 10 ; while (right <= s.length()){ String sub = s.substring(left, right); map.put(sub, map.getOrDefault(sub, 0 )+1 ); if (map.get(sub) == 2 ) ret.add(sub); left++; right++; } return ret; } }