One of the first things you learn to do with Ruby is to evaluate a mathematical expression in irb
. But what's the magic behind this seemingly trivial operation? Let's create our own evaluation engine and find out!
Note: You could cheat and just use
eval
, but that would defeat the point of this exercise.
Tokenizing Our Input
We are given a string as input, but what we need are numbers and operators. We can use the StringScanner class and 'tag' the different parts of the input so we can recognize them.
This is the main method of the Tokenizer
class:
def parse(input) @buffer = StringScanner.new(input) until @buffer.eos? skip_spaces read_tokens end @tokens end
In this method, I made a StringScanner
object with the input string as an argument, then I have a loop which keeps going until the end of the string is reached. Inside this loop, I call two methods: skip_spaces
and read_tokens
.
The skip_spaces
method is very simple:
def skip_spaces @buffer.skip(/\s+/) end
The /\s+/
part is a regular expression that matches spaces (including tabs and newlines). Then we have read_tokens
, which is where most of the work happens:
RULES.each do |regex, type| token = @buffer.scan(regex) if token @tokens << Token.new(type, token) end end
In this method, we try to match part of the input string against one of our parsing rules, which are just regular expressions. When a match is found, we create a new token object with the type and value. For example, for the number '5' we would have Token.new(:int, 5)
.
RULES = { /\d+/ => :int, /[\/+\-*]/ => :op }
And that's all we need to convert our input string into tokens. Now let's see how we can turn those tokens into something more useful.
The Shunting Yard Algorithm
Now that we have our tokens, we need to convert them into a format we can work with. The normal format for a math expression is called 'infix'. The problem with this format is that it's hard to evaluate because of operation precedence. There is another way to represent a mathematical expression: the 'postfix' notation.
Examples:
Infix ___________
Postfix
3 + 4
3 4 +
9 * 4
9 4 *
2 * 5 + 1
2 5 * 1 +
We are going to use the Shunting-yard algorithm to convert from the infix
notation to postfix
. The way this algorithm works is by using two stacks, one for numbers and another for operators.
Follow these steps:
Read a token.
If it's a number, put it in the numbers stack.
If it's an operator, put it in the operators stack, but first, check the precedence of the operator on the top of the stack. Move it into the numbers stack if the precedence is higher or equal.
When you've read all the tokens, pop all the operators and put them into the numbers stack.
Done!
Let's take a look at the code, starting with the initialize
method:
def initialize(tokens) @tokens = tokens @output = [] @operators = [] end
In this method, we do our initial setup by using the tokens from the previous phase (tokenization) and creating two arrays that will serve as our stacks. Let's continue our exploration of the code:
@tokens.each do |token| push_number(token) if token.type == :int process_operator(token) if token.type == :op end
Inside the run
method, we iterate over our tokens array and put them in the correct stacks.
def operator_priority_is_less_or_equal_than(token) @operators.last && (token.priority <= @operators.last.priority) end
This method helps us handle operator precedence. The priority
is a method defined in the Token
class.
def priority return 1 if value == '+' || value == '-' return 2 if value == '*' || value == '/' end
When we're done, we just dump the remaining operators into our output stack.
@output += @operators.reverse
We need to call reverse
here. Putting things into a stack and taking them out reverses them, but in this case, we want the original order.
And that's all it takes to convert to postfix notation (also know as Reverse Polish Notation or RPN).
Evaluation
Now that we have our expression in postfix notation, it's very easy to evaluate using a single stack.
Algorithm:
We put all the numbers we find on the numbers stack.
Whenever we find an operator, we pop two numbers from the stack.
Then we calculate the result and put the result back on the top of the stack.
Repeat until done. Last number will be the final result.
tokens.each do |token| @numbers << token if token.type == :int if token.type == :op right_num = @numbers.pop left_num = @numbers.pop raise 'Invalid postfix expression' unless right_num && left_num result = evaluate(right_num.value, left_num.value, token) @numbers << Token.new(:int, result) end end
This is the main loop for the evaluation method. Once we find an operation, we call the evaluate
method and create a new token object with the results.
def evaluate(right_num, left_num, operation) case operation.value when '+' then left_num + right_num when '-' then left_num - right_num when '*' then left_num * right_num when '/' then left_num / right_num end end
There is nothing special about this method; we just apply the operation and return the result.
@numbers.last.value
Once all the operators have been processed, we'll have one token left on our numbers stack. This token is the final result.
Conclusion
In this post, you learned about tokenization, which lets you break a string into its component values. This is what most programming languages (like Ruby or JavaScript) do behind the scenes to your source code so they can turn it into computer instructions.
Then you learned about two different ways to arrange a mathematical expression, 'infix' and 'postfix', and how to convert between them by using the 'Shunting-Yard algorithm'. You've also learned about the 'stack' data structure and what it can be useful for.
I hope you enjoyed this thought exercise! You can find the final code here: https://github.com/matugm/math-eval