- trim(): allows to remove the whitespaces in a string
- string to int()
- int / float to string:
string word = to_string(variable_to_convert);
- Char: is a single character
- A char in C/C++ uses 8 bits in memory, because Cand C++ uses the ASCII value;
- Java stores each char in UTF-16 encoding, usign 16 bits to store a single characters.
Create and use a string in C:
// Create string in run-time; We cannot create the aeray of chars iwthout initialize and then assign the full string
char str[] = "davide"
cout << str;
cout << sizeof(str); //Output wil be 6 and not 7, because in C, the compiler put a \0 at the end of the string;
cout << strlen(str)<<endl;
// Initialie string after char[] is created (without generate erros)
char surname[10];
strcpy(surname,"pollicino");
string in C++ is a class; where a class has a list of methods that are accessible by the '.' (dot) operator.\
- substr():
- find():
- getline() : getline(cin, full_name); Getline Allows to store all the string included whitespaces; We can also use a third optional parameter in the getline used to indicated when the line must be stopped to be read. Suppose that we want to read our line until we do not read the dollar sign '$'; getline(cin, full_name, '$'). By default, the third parameter is the new line.
for (char letter : full_name) cout <<letter;
In the english alphabet there are 26 letters (characters).
cout << (int)'a';
Everytime we heard the word frequency, we automatically think about hashmap. It is correct, but in this case, we can simply exploit an array the ASCII value of each caracter to solve this problem in O(N) time and O(1) space; We we will see that we will use always in this case a support array made by 26 elemements in any case, independently from the input size. For this reason we consider it as constant.
Suppose to have the string "ddaviideee", the output will be: a : 1 d : 2 e : 3 i : 2 v : 1
string name = "ddaviideee";
// Creat an array of 26 elements and set them to 0.
int count[26] = {0};
for(int i=0; i < name.size(); i++)
{
// A will be at the index 0, b at the index 1 and so on
count[name[i]-'a']++;
}
for(int i=0; i< 26; i++)
{
// If the character is present at least once in the string (so has frequency >0)
if(count[i] >0)
cout << char(i+'a') << " " << count[i]<<endl;
}
Given two strings, check if these two strings are anagrams of each other. This means that nay characters present in the string1, with a frequency F, must be present also in the string string2 with the same frequency F.
// O(NlogN) time (time required by the sort function)
bool are_anagrams(string s1, string s2)
{
if(s1.size() != s2.size())
return false;
sort(s1.begin(), s1.end());
sort(s2.begin(), s2.end());
return (s1==s2);
}
A secondf better solution is the standard counting character approach, comparing the frequencies of each character in both string
const int CHAR = 256;
bool areAnagram(string &word1, string &word2)
{
int len1 = word1.size();
int len2 = word2.size();
if(len1 != len2)
return false;
int count[CHAR] = {0};
for(int i =0; i< len1; i++)
{
count[word1[i]]++;
count[word2[i]]--;
}
for(int i =0; i < CHAR; i++)
{
if(count[i] != 0)
return false;
}
return true;
}
// Source: GeeksForGeeks
Efficient Solution:
const int CHAR =256;
int IndexLeftMostRepeatingChar(string word)
{
bool visite[CHAR] = {false};
// Default result values is -1: means that there are NO repeating chars;
int result = -1;
// we traverse from right to the left, for find the LEFTMOST;
for(int i = word.size()-1; i >=0; i--)
{
if(visited[word[i]])
result = i;
else
visited[word[i]] = true;
}
return result;
}
int index_leftmost_repeating_char(string s)
{
unordered_map<char, int>mp;
int i, len = s.size();
for(i=0; i < len; i++)
mp[s[i]]++;
// Return index of the most left repating char
for(i=0; i < s; i++)
{
if(mp[s[i] > 1])
return i;
}
// All elements are unique
return -1;
}
- The non repeating character is a character with frequency =1;
Naive / easy solution; O(N^2)
int indexNonRepeatingChar (string word)
{
int len = word.size();
for(int i =0; i < len; i++)
{
bool flag = false;
for(int ij = 0; j < len; j++)
{
if( i!= j && word[i] == word[j])
{
flag = true;
break;
}
}
if(flag == false)
return i;
}
return -1;
}
Optimized Solution: O(N)
int GetIndeNonRepeatingChar (string word)
{
cont int CHAR = 256;
int counter_frequency[CHAR] = {0};
int len = word.size();
for(int i =0; i < len; i++)
counter_frequency[word[i]]++;
for(int i =0; i < len; i++)
if(counter_frequency[word[i]] == 1)
return i;
return -1;
}
/* Function checks if passed strings (str1
and str2) are rotations of each other */
bool areRotations(string str1, string str2)
{
/* Check if sizes of two strings are same */
if (str1.length() != str2.length())
return false;
string temp = str1 + str1;
return (temp.find(str2) != string::npos);
}
This is a quite popular problem, where we have to reverse the order of the words in a sentence, without actually recerve the order of the characters of any word; Suppose to have the sentence: "This is Competitive Programming", the output will be: "Programming Competitive is This". Sometimes, you will also find a small variant of this problem, where each word will be divided by a dot '.';
A first/naive approach is:
- Creat a stack
- Iterate the string and push each word in a stack
- Pop each element from the stack and append it to an initially empty string (adding a whitespace after each pop);
- Print the result.
An other approach (more efficient) is to:
- Reverse all the single words inside the string
- Reverse at the end ALL the string (and not just the subwords);
void reverse(char word[], int start, int end)
{
while(start <= end)
{
swap(word[start], word[end]);
start++;
end--;
}
}
void reverseSingleWords(char sentence[], int len)
{
int start = 0;
for(int end =0; end< len; end++)
{
if(sentence[end] != ' ')
{
reverse(sentence, start, end-1);
start = end +1;
}
}
// Reverse last word (because it does not have any whitespace)
reverse(sentence, start, len-1);
// Reverse ALL the full sentence
reverse(sentence, 0, len-1);
}
- Naive Pattern Searching: we slide the pattern over text one by one and check for a match. If a match is found, then slides by 1 again to check for subsequent matches.
n: length of the biggest string m: length of the substring to search
There are many algorithms for pattern search, few of them are:
- Naive: O((n-m+1)*m). In Naive approach, there is no backtracking.
// The vector will contains the first index of where a valid pattern is found
vector<int> search_with_naive(string word, string sub)
{
int n = word.size();
int m = sub.sizea();
vector<int>solution;
int j, i;
for(i =0; i <= n-m; i++)
{
for(j=0; j < m; j++)
{
if(word[i+j] != sub[j])
{
break;
}
}
if(j==m)
solution.push_back(i);
}
return solution;
}
We can actually optimize the Native solution to O(N) int he follwing way:
// The vector will contains the first index of where a valid pattern is found
// THIS IS PSEUDOCODE
vector<int> search_with_naive(string word, string sub)
{
int n = word.size();
int m = sub.sizea();
vector<int>solution;
int j, i;
for(i =0; i <= n-m; i++)
{
// We manage the increment
for(j=0; j < m;)
{
if(word[i+j] != sub[j])
{
break;
}
}
if(j==m)
solution.push_back(i);
// We increment I in according with with the match ror non-match of the pattern
if(j==0)
i+=1;
else
i += i+j;
}
return solution;
}
- Rabin karp: same time complexity of the Naive Pattern search, but easier to implement. It uses the idea or rolling hash to optimize the algorithm.
It's worst case is N^2, but in general, Rabin Karp is faster than Naive search and allows us to save lots of tiem when we have a long sentence and a long substring to compare, too see if there will be one or more occurrency of the same pattern.
Like the Naive algorithm it slide the pattern one by one, but instead to compare each character, the Rabin kARP compare the has values of the two string windows, saving lots fo comparison, especially when we have long patterns. If the two hashes values matches, we compare the characteres otherwise, we skip to the next window.
- Spurius hit: it's when the two substring have same hash value but do not match;
- rolling hash: means that we calculate the current hash value of our window based on the last hashvalue calculated.
It actually looks like that it has the same time complexity of Naive but it doesn't, because we save up time to calculate the has of a window string using also the past hashes (this is the concept of roll hashing).
- KMP : O(n). It is one of the most popular algorithm in computer Science.
- Suffix tree : O(m). Suffix tree is not an algorithms but a data-structure.
RK algorithm is also used not just to find in a specific substring exists in another string, but also used to get all the starting index where all the occurrencies of our pattern exists in the main string;
- If the string is empty, the result is 0;
- If all the characters are equals, the answer is going to be 1.
For this problem, I will provide you different solutions with different time complexity but same Space complexity.
Considering that the longest_distinct function has a time complexity O(n^2), and it is calling the function are_distinct(), that has O(N) complexity, the total total time complexity of this Naive approach is: O(N^3).
bool are_distinct(string word, int i, int j)
{
// Create a visited vector of bools
vector<bool>visited(256);
for(int k = i; k <=j; k++)
{
if(visited[str[i] == true)
return false;
visited[str[i]] = true;
}
return true;
}
// Return the length of the longest substring with all differnt characters
int longest_distinct(string word)
{
int len = word.size();
int result = 0;
// Iterate thought all the possible substrings
for(int i =0; i < len; i++)
{
for(int j=0; j < len; j++)
{
if(are_distinct(word, i, j))
result = max(result, j-i+1);
}
}
return result;
}
Let's see a faster approach to search the lenght of the longest string with different characters.
int longestDistinct(string sentence)
{
int i, j, result = 0;
int len = sentence.size();
for(int i =0; i < len ; i++)
{
vector<bool>visited(256);
for(int j=i; j < len; j++)
{
if(visited[sentence[j]] == true)
break;
else
{
result = max(result, j-i+1);
visited[sentence[j]] = true;
}
}
}
return result;
}
Find the number of different chars between equa-length strins.
String Alignment problem definition:
Suffix array is a Data Structure created by Udi MAnber in 1991. Udi Manber is now one of the Google vice presidents of enginnering.
The suffix Array has similar functionalties to the Suffix Tree but much simpler to implement and even faster. Suffix Tree has O(N) construction time while Suffix Array construction time is O(n log n).
Suffix Tree:
Applications:
BOye-Moore's
- We have a string a and a pattern b to search in a;
- We know that a.siz() > b.size();
- Rabin-arp matched the has val;ues pf the pattern with the has value of current substring of text. If the has values match then only it start matching indiviidual characters.
-
Naive pattern search is not fast enough when we have many matching characters followed by mismatching characters (false alarms).
-
With the KMP algorithm, we can have a worst case time complexity of O(N).
-
The optimization of KMP stands in the number of n characters that are skipped for the pattern search. A foundamental question is, how many characters do we need to skip to guarantee that a correct check is done? To get this answer, we pre-process pattern and prepare an integer array lps[] that tells us the count of characters to be skipped
-
Size of lps(temp array) is = to the size of the pattern.
We can use C++ STL map<string, int> to keep track the frequency of each word. Every time we encounter a new token (which is a string), we increase the corresponding frequency of that token by one. Finally, we scan through all tokens and determine the one with the highest frequency.