Some notes I took while watching Rebuilding a Web Server, a brief walkthrough by Marc-André Cournoyer on writing a simple Rack-compliant web server. The code for the class is here.
#Concurrency
The entire stack looks like this:
Browser -> Socket -> HTTP Parser -> Rack -> Your App
There’s also a scheduler running alongside, handling concurrent connections. Such a scheduler can be implemented in different ways: threads, pre-forked processes, or an event loop.
Threads
A naive implementation would look like this, spawning a new thread for each incoming socket connection:
# inside the server's class definition
...
def start
loop do
socket = @server.accept
Thread.new do
connection = Connection.new(socket, @app)
connection.process
end
end
end
...
Web servers like Puma use threads. Thread spawning is quite expensive, so web servers that use threads for concurrency will usually spawn a number of threads (thread pool) on bootup and reuse them.
Pre-forked Processes
Preforking is a popular concurrency model used by servers such as Unicorn and Nginx. fork
creates a copy of the current process, and this child process is attached to its parent process. The two of them share the same socket1.
# inside the server's class definition
...
def initialize(port, app)
@server = TCPServer.new(port)
@app = app
end
def prefork(workers)
workers.times do
fork do
start
end
end
Process.waitall
end
def start
loop do
socket = @server.accept
connection = Connection.new(socket, @app)
connection.process # goes on to process the raw socket data
end
end
...
server.prefork(5) # for 5 child worker processes
Worker processes are forked beforehand, and all of them share the same listening socket. Whichever process is free will be scheduled by the OS scheduler to handle the next incoming connection on the socket. Presumably, leveraging on the OS scheduler is really efficient.
Event Loop
We can simulate an event loop in Ruby using a gem called eventmachine
. eventmachine
is a feature-packed gem, and comes with helper methods that handle accepting, reading and writing to and from socket connections for us.
# inside the server's class definition
...
def start_event_machine
EM.run do
EM.start_server "localhost", 3000, EMConnection do |conn|
conn.app = @app
end
end
end
class EMConnection < EM::Connection
attr_accessor :app
def post_init
@parser = Http::Parser.new(self)
end
def receive_data(data)
@parser << data
end
...
end
...
server.start_event_machine
#readpartial
readpartial
is an instance method of the IO class in Ruby which allows us to read data off a socket as soon as data is available. The APIDock entry on readpartial
elaborates further:
readpartial is designed for streams such as
pipe
,socket
,tty
, etc. It blocks only when no data immediately available. This means that it blocks only when following all conditions hold.
- the byte buffer in the IO object is empty.
- the content of the stream is empty.
- the stream is not reached to EOF.
Using the readpartial
method, we can read off a socket like this:
data = socket.readpartial(1024) # reads at most 1024 bytes from the I/O stream
puts data
# do other things with data
sysread
is a method with similar functionality.
#http_parser.rb
http_parser.rb
is a gem that wraps around Node’s HTTP parser.
#Rack
Rack is a set of specifications that web servers, middleware applications, and application frameworks must adhere to. Rack apps must have a single point of entry named call
, which must return an array containing the status code, the headers, and the body of the response.
Things which behave exactly like Rack tells them to (e.g. Unicorn, Rails) are Rack-compliant, and the benefit of this is that Rack-compliant things can be used in conjunction, layered on top of each other, or swapped out and replaced, without each having knowledge of the other (yep, abstraction).
Noah Gibb’s nice book Rebuilding Rails offers an excellent practical tutorial on Rack. The book covers more than just Rack, but the chapters on Rack are particularly illuminating.
KIV: Notes on Rebuilding Rails
#Footnotes
-
More explicitly, the reason why they share the same socket is because of the file descriptor inheritance that happens in
fork
. According to Linux’s man pages:The child inherits copies of the parent’s set of open file descriptors. Each file descriptor in the child refers to the same open file description (see
open(2)
) as the corresponding file descriptor in the parent. This means that the two descriptors share open file status flags, current file offset, and signal-driven I/O attributes.