Feature #6: Most Common Token
Implementing the "Most Common Token" feature for our "Language Compiler" project.
We'll cover the following...
Description
For this feature of the language compiler, the program statements are given to us as a string. We want to determine the variable or function that is most commonly referred to in a program. In this process, we want to ignore the language keywords. For example, a keyword may be used more frequently than any variable or function. The language keywords are given as an array of strings.
Let’s say we are given the following program as input:
int main() {int value = getValue();int sum = value + getRandom();int subs = value - getRandom();return 0;}
The list of keywords given to us is ["int", "main", "return"]
. In this example, our function will return "value"
. Note that, our functions should ignore syntax, such as parentheses, operators, semicolons, etc.
Solution
We can solve this problem by normalizing the code string and processing it step by step. The complete algorithm is as follows:
-
First, we replace all the syntax, including parentheses, operators, semicolons, etc., with spaces. Now, the string only contains alphanumeric tokens.
-
Then, we split the code obtained from the previous step into
tokens
. -
We iterate through the tokens to count the appearance of each unique token as keys, excluding the keywords.
-
We create a Dictionary
count
, which has tokens as keys and occurrences as values. -
In the end, we check the tokens in
count
to find the token with the highest frequency.
import Swiftimport Foundationfunc mostCommonToken(code: String, keywords: [String]) -> String {// Replacing the syntax with spacesvar normalizedCode: String = ""// Convert all the characters other than Alphanumeric into spaces// and insert them into normalizedCode variablenormalizedCode = String(code.map {!($0.isLetter || $0.isNumber) ? " " : $0})// Split based on the spaceslet delim: String = " "var tokens: [String] = normalizedCode.components(separatedBy: delim)tokens = tokens.filter {$0 != ""}//split(normalizedCode, delim, tokens);var count: [String : Int] = [:]var bannedWords: Set<String> = Set<String>()for keyword in keywords { bannedWords.insert(keyword) }// Count occurence of each token, excluding the keywordsfor token in tokens {if !(bannedWords.contains { $0 == token }) {if count[token] == nil {count[token] = 0}count[token] = count[token]! + 1}}// return the maximum valued keyvar pair: (String, Int) {var max_pair: (String, Int) = ("", Int.min)for cn in count {if max_pair.1 < cn.1 {max_pair.0 = cn.0max_pair.1 = cn.1}}return max_pair}return pair.0}// // Driver codelet code: String = "int main() {\nint value = getValue();\nint sum = value + getRandom();\nint subs = value - getRandom();\nreturn 0;\n}";let keywords: [String] = ["int", "main", "return"]print(mostCommonToken(code: code, keywords: keywords))
Complexity measures
Time complexity | Space complexity |
---|---|