Posts tagged code



Runder is a rudimentary clone of CodePen that I wrote to see how much I could accomplish in a span of 24 hours. It leverages on Google Caja’s cajoling to “tame” user-submitted code before it is executed. Syntax highlighting and autocomplete is accomplished with Ace. All in all, it was a rewarding experience. I also got to present Runder at NUS Hackers last Friday! Looking Forward I’d definitely look into the option of including libraries, such as Bootstrap and jQuery. This might be useful for people to get acquainted with libraries without having to do so locally. Also, could it be possible to do client-side generation of encoded URLs so that users can share their code? Runder on GitHub

Things I Wish Someone Had Told Me When I Was Learning How to Code


Link. When you get to this stage, most of the tutorials and online resources available to you are much less useful because they assume you’re already an experienced and comfortable programmer. The difficulty is further compounded by the fact that “you don’t know what you don’t know.” Even trying to figure out what to learn next is a puzzle in itself. This really hit home for me.

Goodbye Mou, Hello StackEdit


When studying I find that writing notes help me to process that information better, even if I never go back to those notes. Since I’m such a Markdown fanatic, my workflow for writing notes with Markdown has changed many times over the past few years. Basic requirements for my workflow include the need for live preview, as well as customisable HTML/CSS stylesheets for Markdown conversion. iAWriter was my first experience writing with Markdown, and I thought that the iPad and iPhone apps would be useful for writing on the go. My usage patterns proved otherwise though, and I quickly moved away from it because it was too minimalistic for me (oh, the horror). Previewing was clumsy and there was no support for user-defined stylesheets (at least not the non-Pro version). I moved to Mou, which was perfect because I could finally define styles, even for the writing area. Live preview was quite buggy (there was always an offset when synchronised scrolling was used, which was annoying), but I was happy with it. The local Markdown files were synchronised with Dropbox. I used Mou for the longest time, but gradually grew sick of the Solarized Dark theme that I made. (It’s great for coding, but not so great for long-form reading or writing.) I switched to Marxico, a Chrome app. Marxico allowed me to sync the converted rich text to Evernote, which was really nice. However, the inability to export or sync the raw Markdown files was completely unacceptable for me. If Marxico disappeared overnight, I’d be left with my Evernote notes and no other recourse. Finally, I came across StackEdit, which is a Markdown editor that lives purely in the browser. That put me off slightly at first, until I saw the rest of the feature set. Instead of the more commonly used text editor Ace (which I used when building Runder, and which Dillinger, a StackEdit competitor, also uses), StackEdit uses PageDown, the same editor used by Stack Overflow. It’s WYSIWYG, which I strangely appreciate after years of writing in monospace. Remember the annoying scrolling offset problem I experienced in Mou? StackEdit fixes that too, with its so-called Scroll Sync extension. It also supports synchronization and collaboration through Google Drive and Dropbox, and has some in-house mechanism for the non-trivial task of “merging the modifications”. (I haven’t tried this feature out for myself yet.) StackEdit even supports custom extensions, and I already have a few in mind which I want to write, which I’ll…

Blogging Workflow


In this post, I want to talk about some of the technical details behind this site and the workflow that I have for writing. This site is built with Jekyll and hosted on a private Github repository, with two branches. The branch is where the published site lives (basically, the contents of the folder), while the branch is where the working files are kept, under version control. “Doesn’t Github Pages build your site for you automatically?” Yes, it’s supposed to, but not if you use custom plugins that are not supported by Github Pages. I have a custom Ruby plugin that generates category pages, so there’s that. I use fish, so I wrote a function that acts on the surface like a command line binary. For the purposes of this post, I’ve renamed the directories. is the directory where my website source files live, and is the directory where the published files live. There is also a folder which contains a backup of the previous version of the published files. (Yes, everything is managed under VC, but still.) This is how looks like, in its entirety: Let’s go through each option one by one. , when used with the flag, opens the website directory in Sublime Text and runs , which opens a local copy for development preview. Straightforward enough. The option builds the source files in and copies it to , but not before it makes a copy of original published files in . There’s a lot of potentially dangerous -ing here, and I’ll take another crack at refactoring it when I have the time. The option is where it starts to get interesting. In the description, it says: takes three (or more) options: , , and . It takes these options and creates a new Markdown file in the appropriate location in the directory, with the necessary front matter appended. For example, if the following command is run (today): A new post will be created in with the following front matter: Finally, this file is opened in Sublime Text for immediate editing. is by far the most complex bit of . Its arguments are similar to ’s, except it takes in an additional folder argument. Its main purpose is to generate HTML markup for the photo galleries on the site, by acting as a wrapper for a Python script that I wrote, . Its first argument is the full location of a folder, located anywhere, containing the images to be used in the post. The images in the folder must be arranged in a specific manner: Inline images live in the root of the folder. Gallery images should be placed in a subfolder called…

Updating EOL Ubuntu Releases


I was handed an EC2 instance running 12.10 (Quantal Quetzal), which had reached EOL. Due to some reasons, upgrading wasn’t an option. When I tried to , snapped back at me with multiple 404s along the lines of: It turns out Ubuntu moves sources for EOL releases to a separate subdomain , so all I had to do was change the subdomain of all the sources in , like so (note that you need for it, and its also a good idea to backup your original file before making any changes): Digital Ocean also hosts a mirror of these: If you have a EOL Ubuntu distro that you need to maintain for any reason, you can head over to to see if your distro is covered.

Monitoring Processes with Monit and Slack


(This post was originally posted on Cubicle Rebel’s engineering blog. Check it out!) Our clients rely on us to provide rock-solid long-term stability for their projects, and one critical aspect of maintaining uptime is providing realtime logging and monitoring of long-running services (also called daemons) on our servers. We use Slack exclusively for communication and notifications regarding technical matters. Because of Slack’s great webhook support, we decided to hook it up to Monit and have Monit automatically notify us and attempt to restart the process if anything goes down. Here’s how we went about it. For the purposes of demonstration, let’s assume that we have an application called Beerist (a social network for beer drinkers). It happens to be a Rails app, and we want to use Unicorn as our Rack server so that we can serve requests concurrently with forked OS processes. 1 - Set Slack up The first thing you want to do is to get the Webhook URL from Slack by visiting this page and signing in. You will be prompted to choose a channel to post to, but it doesn’t really matter which one you pick because you will be able to override the (default) channel in your payload. After you get the URL, don’t close the page just yet - it contains useful information we need for Step 3. 2 - Install Monit on your server Monit is available on most Unix distributions. If you’re running Ubuntu, the usual and will suffice. You can verify that Monit is working by starting it: . will print out Monit’s status in verbose mode, which will come in handy. 3 - Familiarize yourself with Slack’s incoming webhooks endpoint Go back to the page with the Webhook URL, or visit the equivalent documentation page here, and documentation on message attachments here. Read through them, but I’ll give a brief overview here anyway, as well as the options we went with. We’ll be sending serialized JSON in the request body. The JSON payload structure looks like this: 4 - Configuring Monit The hardest part of this tutorial is this step. Monit can monitor all sorts of things, and depending on what you want to monitor, this can either be very painful or very easy. In the case of processes, Monit will require a pidfile as well as the shell command for starting (and stopping) the process, if you want Monit to be able to automatically restart the process for you. Let’s say our app happens to be located at , and our unicorn pidfile is located at . Open the Monit configuration file at . It should be…

Aurora and Learning Erlang


Aurora’s Github repo TLDR: I learned Erlang and wrote a JSON API server called Aurora for a school project. Introduction I recently concluded a school project involving writing an Android app with “some elements of concurrency”. My group wrote a chat messaging app - a stripped-down clone of Whatsapp, but with some additional features such as message tagging, conversation notes (think Google Docs, but on a per-conversation basis), and event polling. I volunteered to write the API server that would serve the Android front-end application. I chose Erlang because I heard it was cool and I wanted to learn functional programming. It seems any blog post regarding projects written in Erlang should include performance benchmarks and an eventual (and completely obvious) conclusion as to why Erlang provides superior performance under heavily concurrent loads. Instead, I will write about my journey picking up Erlang and some other things I learned from writing Aurora. Learning Erlang, and Why People Think That Learning Functional Programming is Difficult Because it is, especially if you’re used to thinking imperatively. The soundbite I use most when talking to people is: “Imagine programming without loops”. Of course, its not as bad it sounds, and when you get the hang of it, loops will seem completely stone-age. , and will be your new best friends. The difficulty is compounded by what I call the “I Think I Know What’s Going On Wait What” effect. Because functional programs (or at least, well-written ones), can be more terse and/or expressive than their imperative counterparts, it sometimes provides a false sense of understanding. When it comes time to write something from scratch, the programmer is completely paralyzed. I was stuck in many ruts as I made my way through Fred Hebert’s excellent Erlang book, Learn You Some Erlang. The chapters on socket programming in Erlang (chapter 23), ETS (chapter 25), and Mnesia (chapter 29) were particularly illuminating. The Importance of Ripping Other People’s Code Off It wasn’t until I found some fairly basic code for a TCP server off a Github repo did I slowly begin to understand how to write my own. And even then, I experimented a lot by copying and pasting and changing each line, line by line, to get a sense of how everything came together. Beyond the initial scaffolding stage, I continued to find code samples indispensable - often finding alternative ways to achieve the same functionality, corroborating and finding the…

Single Table Inheritance in Rails


Single table inheritance (STI) in Rails allows you to store Ruby subclasses in the same database table. Let’s get started with a brand-new Rails project to learn what that means. I’ll be using Rails 4.2.3 and SQLite as the database. First, an empty User model: In : Generate a migration for it. To implement STI, we add a column called of type to the class. Let’s also have a name column: Migrate the SQLite database: Fire up the Rails console: Let’s create a new user: No problems so far. Now, let’s say we want to differentiate users between regular users and power users. At this juncture, let’s pretend we don’t know about STI. Since we’ve already created the column, let’s use that, so we can find different types of users by writing something like this: Go ahead and create such a user: Uh oh. What just happened? Why is Active Record complaining about not being a subclass of ? It turns out that Active Record, upon, seeing a column named , automatically assumes you’re implementing STI. Into Rails Down The Stack Let’s take a dive into the Rails source code to find out how Active Record automatically assumes this behaviour, by finding out step by step what happens when we attempt to type to create a new power user. The source code below is as of commit on the branch. Sean Griffin explains briefly in his comments in Active Record’s module. In : method in : We start our investigation from Active Record’s module, which contains the method definitions of some of the most-commonly used Active Record methods, such , , , and (protip: use in Sublime Text to search by method definition). We first check if the supplied argument is an array. If so, we recursively call for each member in the array. This means you can do something like: and Active Record will create three users, and even return them to you in the same array structure you specified: If we supply a hash, then the method is called: method in : Notice the splat operator, which converts all but the last argument into an : We take the first member of the array and run the method on it, which is located in the same file, some ways down: method in : This method checks through all of the attributes in the model to see if any of their names match the specified inheritance column, which in this case is . Therefore, returns . Let’s see where and come from. method in : where is: method in : I’m going to stop here because steps into whole new territory: namely, into the scary module…

Managing Nginx Processes


The second in a series of Nginx posts. Nginx runs two different types of processes: the master process and worker processes. Master Process This should be started as , because this will allow Nginx to open sockets below 1024 (it needs to be able to listen on port 80 for HTTP and 443 for HTTPS). Worker Processes These are spawned by the master process, and the user and group will as specified The Nginx binary accepts command-line arguments, of which a full list can be obtained by: tells you about Nginx as well as the options it was built with: Starting To start Nginx, simply run the binary without any switches: Stopping There are two ways to stop Nginx - immediately using the signal, or gracefully using the signal: Reloading Configuration All of the above commands will verify the configuration file each time they are run, even when you’re trying to stop Nginx. When you’re unable to stop Nginx using , you may use or : Testing Your Configuration Optionally, you can specify a path with so that you can test another configuration file: Nginx as a system service Adding Nginx as a system service allows us to: Control it with standard commands Have it launch and quit at system startup and shutdown automatically To add Nginx as a system service, we simply need to include a script in called . There’re many resources out there which covers Nginx init scripts. This one seems quite popular and well-documented. Install it into your and make the script executable: Then, we need to associate the script with the operation system’s default so that the system runs this script on startup: At this point, you should be able to start, stop, restart, or poll for Nginx status (assuming you used the script above): Don’t forget to make it executable! () Linux-based operating systems have 7 s which correspond to different system states (0 means the system is shut down, 3 means the system is in multiuser mode, etc). Each state is associated with a folder in called . Each folder contains symbolic links to scripts located in . A daemon called is responsible for running the scripts associated with each state . Thus, what does is it creates a symbolic link to the script within the that is associated with the OS’s default state upon startup (for Ubuntu its 3).

Downloading and Installing Nginx


The first in a series of Nginx posts. The information here is heavily inspired from my reading of Clément Nedelcu’s excellent book Nginx HTTP Server and can be considered as my own personal reference. There are five main steps involved in installing Nginx, as outlined below: Install dependencies Download Configure Compile Install Nginx should be compiled from source, because: It may not be available in the enabled repositories of your Linux distro, Even if it is, it’s often an outdated version, and certain options and flags can only be configured at compile time. 1. Install dependencies Nginx dependencies vary, according to the modules you require. Core dependencies include: GNU Compiler Collection (GCC) Nginx is written in C, so we need a C compiler. It’s quite likely that you already have GCC installed on your system. Test if is installed: Success: Failure: To install it: Perl Compatible Regular Expression (PCRE) The Rewrite and HTTP Core modules need PCRE for parsing their regular expressions. We need to install both the library and its source: and : To install it: zlib The library contains compression algorithms and is required in Nginx for gzip compression. Like PCRE, we need both the library and its source. To install it: OpenSSL Nginx relies on OpenSSL to serve secure pages. To install it: 2. Download Nginx has three main branches: stable, mainline and legacy. It’s generally fine to use the mainline branch. Download and extract Nginx onto a directory of your choice: 3. Configure The configuration process consists of appending switches to the command. Some of these options affect the project binaries and cannot be changed post-compilation. contains configuration error logs. Below is a brief overview of the available configuration switches. Path Options Default: Specify an absolute path on which to install Nginx to. If a relative path is specified, it will be taken as relative to . Paths can also be specified for the Nginx binary (), the main configuration file (), the pid file (), the lock file (), fallback log files (, ), as well as paths where Nginx should look for its dependencies. Since the default path does not include a version number, upgrading Nginx will override the existing directory. It is recommended to override to include a version number, like this: , then create a symbolic link to point to the latest versioned Nginx folder. Make sure the path specified by exists and is read/writable by the user running the configuration…

Nginx Configuration Syntax


The third post in the Nginx series. Nginx configuration consists essentially of key-value pairs called directives, which can be organised and grouped in blocks. Directives A directive is a key-value(s) pair that looks something like this: Directives may accept more than 1 value, like the directive. Directives may also have type restrictions on values - the directive only accepts a single integer as a value. Directives are terminated with semicolons. Each module may introduce its own directives and blocks (discussed below) that can be set. Importing Nginx configuration files can be imported with the directive: The effect of this importing is that the contents of the file will be inserted at the exact location of the directive. directives are processed recursively. The directive supports filename globbing: where may match any number (>0) of consecutive characters. This will import all of the files in the folder. If a file specified by an directive cannot be found, Nginx’s configuration check will fail, unless the directive path includes a wildcard: Blocks Modules may introduce blocks, which are logical structures that group module-specific directives together. Many directives can only be used within their associated blocks. The root of the main configuration file is also known as the main block. Blocks may be nested. Below we define a block as introduced by the Nginx HTTP module. The block accepts multiple blocks defining Nginx virtual hosts, and each block itself accepts multiple blocks which contain directives specific to certain locations of the website: In nested blocks, directives are inherited by child blocks and can be overriden. In this example, logging will be disabled just for the path of . Units One may use units: or : Kilobytes or : Megabytes : Milliseconds : Seconds : Minutes : Hours : Days : Weeks : Months (30 days) : Years (365 days) Variables Variables in Nginx start with . Some modules introduce variables can be used when setting directives. Some directives do not support variables: will actually log to a file called . String values Strings may be inputted without quotes unless they include blank spaces, semicolons or curly braces, then they need to be escaped with backslashes or enclosed in single/double quotes. Variables in quoted strings are expanded normally unless the is escaped. Comments Comments is Ruby-like, with lines prepended by a :

Core Nginx Configuration and HTTP Load Testing with Autobench


The Core module controls essential Nginx features, some of which will have a direct impact on performance, such as the number of worker processes. It also includes some directives that are useful for debugging. Official Nginx docs on the Core module Below, I cover the directives that I think have the greatest impact on performance or are critical to change from default values for security purposes. Core directives Syntax: Accepted values for : , , , , , and Default: This directive can be placed in , , , and blocks to indicate specific rules for logging. Nginx file importing. See Nginx Configuration Syntax. Syntax: Default: Defined at compile time Defines the path of the pid file. Syntax: Default: off Defines whether worker processes will accept all new connections (on), or one new connection at a time (off). Syntax: Default: Nginx will automatically choose the fastest one Specifies the connection processing method to use. The available methods are , , , , , , and . For Linux systems, seems to yield the best performance. Here is an interesting post comparing and . Syntax: Defines the user that will be used to start the worker processes. It’s dangerous to set the user and group of worker processes to . Instead, create a new user specifically for Nginx worker processes ( is canonical). Syntax: Default: 1024 This sets the number of connections that can be received by each worker process. If you have 4 worker processes that can accept 1024 connections each, your system can accept a total of 4096 simultaneous connections. Related to below. Syntax: Allows you to assign worker processes to CPU cores. For example, if you’re running 3 worker processes on a dual-core CPU (which you shouldn’t, see below), you can configure the directive to assign 2 worker processes to the first CPU core and 1 to the second CPU core: There are 3 blocks for 3 worker processes, and each block has 2 digits for 2 CPU cores. Syntax: Default: 0 Adjusts the priority level of worker processes. Decrease this number if your system is running other processes simultaneously and you want to micromanage their priority levels. Syntax: Default: 1 This number should match the number of physical CPU cores on your system. Syntax: Default: None, system determined sets the limit on the number of file descriptors that Nginx can open. You can see the OS limit by using the command. Check out this excellent post for more on . An excerpt: When any program opens a file, the operating…

Nginx HTTP Module and Configuration


Nginx’s HTTP module comes with its own set of directives and structure blocks. Structure Blocks The block acts as the overarching block for all directives and sub-blocks. A block defines a website, as identified by a hostname (or many), or in Nginx parlance, a virtual host. blocks cannot be placed outside of a block. A block allows you to define settings for specific locations in a website. blocks can be placed in a block or nested within another block. A location block is specified with a pattern that will be matched against the requested URI. This pattern can be quite complex, involving location modifiers. modifier This modifier means the URI must match the pattern exactly. Only simple strings can be specified (no regular expressions). No modifier This modifier means the URI must begin with the pattern. Only simple strings can be specified (no regular expressions). modifier This modifier means the URI must match the pattern, and is case sensitive. Regular expressions are allowed. modifier This modifier means the URI must match the pattern, and is case insensitive. Regular expressions are allowed. modifier This modifier behaves the same as no modifier, except that Nginx will stop searching for other patterns if it has matched a location block with this modifier (see below for search order and priority). modifier This modifier is used to define a named location block, which can only be accessed internally by or (see below). As an example, given a pattern of or : will match all modifiers will match modifier, as well as and no modifier if your OS is case-insensitive will match all modifiers (query strings are ignored) will match only no modifier, due to the trailing slash will match only no modifier Search order and priority Different modifiers have different priorities when Nginx is searching for a location block that matches a particular URI. In order from most important to least important: Matching on the modifier Matching exactly on no modifier Matching on the modifier Matching on the and modifiers Matching on no modifier Module Directives Socket and host configuration Context: Sets the address and port for IP, or path for a UNIX-domain socket. If the address is left out, then a wildcard (*) is assumed. If the port is left out, port 80 is assumed (unless the master process is run without root privileges, in which case port 8000 will be assumed). Also accepts and parameters for SSL and SPDY (SPDY practically requires SSL…

Notes on "Rebuilding a Web Server"


Some notes I took while watching Rebuilding a Web Server, a brief walkthrough by Marc-André Cournoyer on writing a simple Rack-compliant web server. The code for the class is here. Concurrency The entire stack looks like this: There’s also a scheduler running alongside, handling concurrent connections. Such a scheduler can be implemented in different ways: threads, pre-forked processes, or an event loop. Threads A naive implementation would look like this, spawning a new thread for each incoming socket connection: Web servers like Puma use threads. Thread spawning is quite expensive, so web servers that use threads for concurrency will usually spawn a number of threads (thread pool) on bootup and reuse them. Pre-forked Processes Preforking is a popular concurrency model used by servers such as Unicorn and Nginx. creates a copy of the current process, and this child process is attached to its parent process. The two of them share the same socket. Worker processes are forked beforehand, and all of them share the same listening socket. Whichever process is free will be scheduled by the OS scheduler to handle the next incoming connection on the socket. Presumably, leveraging on the OS scheduler is really efficient. Event Loop We can simulate an event loop in Ruby using a gem called . is a feature-packed gem, and comes with helper methods that handle accepting, reading and writing to and from socket connections for us. is an instance method of the IO class in Ruby which allows us to read data off a socket as soon as data is available. The APIDock entry on elaborates further: readpartial is designed for streams such as , , , etc. It blocks only when no data immediately available. This means that it blocks only when following all conditions hold. the byte buffer in the IO object is empty. the content of the stream is empty. the stream is not reached to EOF. Using the method, we can read off a socket like this: is a method with similar functionality. is a gem that wraps around Node’s HTTP parser. Rack Rack is a set of specifications that web servers, middleware applications, and application frameworks must adhere to. Rack apps must have a single point of entry named , which must return an array containing the status code, the headers, and the body of the response. Things which behave exactly like Rack tells them to (e.g. Unicorn, Rails) are Rack-compliant, and the benefit of this is that Rack-compliant things can be used in conjunction, layered on top of…

Rails Boot Sequence (Part 1)


Today, we investigate Rails’ boot sequence by observing what happens when we run . Part 2 will look at . Github links to relevant files are provided as necessary. Our journey begins inside the binary, which is executed by : It calls , which corresponds to : In : is in charge of executing the inside your Rails application. It will look for it recursively, meaning that you can call anywhere in your application directory. In fact, or is equivalent to calling or See the abridged contents of below: Next, we turn our focus temporarily to your Rails application. In , two files are required: (in your app directory) determines the location of the Gemfile and allows Bundler to configure the load path for your Gemfile’s dependencies. parses options passed in as command line arguments, including alias mapping ( for , for , etc.) is in charge of throwing errors in the case of invalid commands, or delegating valid commands to the respective methods, themselves split into files in the directory: For example, if is run, the method in requires and runs the class method from the class, passing it your application as the first argument ( is made known of your application by requiring , which you’ve kindly provided previously in ): In , you can see the class method instantiating itself and calling the new instance’s instance method: As it is instantiated, is set as your Rails application, and is set to if present, or defaults to : Let’s see if the above code actually works by setting your application config to use as the console instead: Great success! Now let’s look at the actual instance method, whose code is relatively self-explanatory: Finally, boots the console. Next, we’ll look at the code path taken by . As indicated in the comments, this file is auto-generated by RubyGems. How does it know to load Rails, as in the last line ()? Taking a look in gives us the answer: What does the above mean? RubyGem’s documentation: Take a look inside the directory - its contents will be very familiar soon :) The binary is defined by the sha-bang to be executed by , which is a thin wrapper that allows RubyGems to run initialization hooks () before Ruby runs the actual code (). This is what the actual binary looks like: You can test this out for yourself with just 3 lines of code. Create a file with the following: Run it and see what happens:

Apply, Apply


While going through Jasmine’s source code, I came across a weird idiom: What in the world is ? I decided to find out for myself. As a refresher, let’s write a function that simply prints out its and . When invoked as a method on a function, executes its caller with its first parameter as the caller’s , and its second parameter as an array of arguments as the caller’s argument list. The parameters to can be anything, even primitives, as demonstrated below: prints: To preserve our sanity later on, let’s see in more detail by defining a wrapper around : Running with this modified version of prints: So what does it mean when you call again on itself, as in ? Calling on means that will execute , and since needs to be executed in the context of a function, the first parameter must be a function. (I only know this from working backwards from the interpreter’s errors, go figure). In this case, since in is the primitive , is obviously not a function since it doesn’t inherit from . Now that we know that the expects a function as its first parameter, let’s try putting there: Okay, so at least it’s a different error now. Why is complaining that its argument list (the second parameter) has the wrong type? That’s because its trying to call ! Don’t forget that expects an array as its second parameter. So I guess we can wrap the second in an array: So it finally works! Wait, doesn’t this look… familiar? It should, because what its doing is exactly what is doing (see above) - executing with as and its arguments as . And isn’t it stupid that we’re mentioning twice? What exactly does the first even do? Let’s get rid of it: It turns out that it doesn’t matter which function calls , because the only function that is going to be executed is ’s. ’s original context is irrelevant (imagine it as behaving as a static method). It’s just there to facilitate this process by helping “promote” the first parameter (the function) into the executing function itself. Maybe its a little clearer if you remove from the equation: So there you have it. And after all this trouble, I still don’t actually know why Jasmine uses this idiom. Instead of It seems the following is equivalent: Bonus What happens if we chain more than 2 s together? Looks like doesn’t even get run at all. This is puzzling at first, but makes sense when you think about it - ’s original , , has been “diverted” to (remember when I said above that ’s original context is irrelevant?) Thus, when…

Setting Up A Test-Driven React Project From Scratch - Part 1: webpack


You will learn how to scaffold a React project from scratch with Karma and webpack. I’ve tried to make it as unopinionated as possible. Hopefully, once you understand the rationale behind each of the moving parts, you can roll your own or use one of the many templates available elsewhere. At the end of this part, you would have set up webpack with: Source maps with the flag Let’s create a simple project that allows users to create reviews of beer and populate a list. We’ll call it Beerist. Let’s create a project directory: Initialize a by filling in the fields as appropriate: Create a directory that will hold all our source files. Within , create a directory and a that will act as an entry point for the app. Our directory now looks like this: In , let’s just write a simple one-liner for the purposes of testing: webpack webpack was born as a agnostic module loader. Its Motivation page introduces itself quite nicely. Since its inception, webpack has grown to encompass some elements of build tooling as well, which greatly reduces the need for Gulp or Grunt. We’ll install webpack first so that we can write our React files in ES6 (or ES2015, or ES6+, or ES7, or whatever they call it nowadays) from the get-go and have webpack run the files through Babel to transpile our ES6 down to ES5. We’ll also install webpack’s dev server so that we can serve the files locally using a built-in Express server and sockets (for hotloading): This creates a directory if it hasn’t previously been created, and downloads the latest version of all the modules specified from . saves it as a development dependency entry within your : Create a file called , which tells webpack what to do: The config starts off by requiring and , which is a built-in Node library that provides utility methods for manipulating file path strings. In particular, we use together with the “global” variable to resolve relative file paths to absolute ones. Now how do we run webpack and/or its dev server? If you’ve installed webpack and/or webpack-dev-server globally, you can simply run or on the command line to fire it up. Else, you can call it from an npm script inside : Now, we can run or (check out the difference between the two for yourself). Our directory should look like this: If you open , you’ll see some crazy stuff that webpack is doing to tie all your dependencies together. Of course, you’ll see your source code - all the way at the bottom: To see it in action in the browser though…

Setting Up A Test-Driven React Project From Scratch - Part 2: webpack niceties


This post will explore some more niceties that using webpack provides. ESLint ESLint functionality is provided by way of : To ensure that linting only runs on your un-transpiled ES6 source files, specify the loader as a : ESLint may seem to appear to work right out of the box, but in order for it to function effectively, you need to configure ESLint to suit your coding style/needs. takes in a config file whose path is specified in : In your (the above setting specifies it in your project root), you may wish to include at least the following sane barebones config: This barely scratches the surface of how ESLint can be configured. The ESLint docs are excellent. ESLint React is a ESLint plugin that provides React-specific linting rules. To install: Then, modify your file to add the following: Hot Reloading TODO

Topick - JavaScript NLP library to extract keywords from HTML documents


I recently wrote Topick, a library for extracting keywords from HTML documents. Check it out here! The initial use case for it was to be used as part of a Telegram bot which would archive shared links by allowing the user to tag the link with keywords and phrases: This blog post details how it works. HTML parsing Topick uses for HTML parsing. By default, Topick will pick out content from , , , and tags, and concatenate them into a single document. Cleaning That document is then sent for cleaning, using a few utility functions from the library to: Expand contractions (e.g. from I’ll to I will) Remove interpunctuation (e.g. ? and !) Remove excess whitespace between words Remove stop words using the default stop word dictionary Remove stop words specified by the user Stop words are common words that are unlikely to be classified as keywords. The stop word dictionary used by Topick is a set union of all six English collections found here. Generating keywords Finally, the cleaned document can be used as input for generating keywords. Topick includes three methods of doing so, which all relies on different combinations of library functions to generate the final output: The method relies solely on the method to generate keywords/phrases based on frequency. The generated words or phrases are then sorted by frequency and filtered (those with frequency 1 are discarded). The method relies on the method to guess keywords or phrases that are capitalized/don’t belong in the English language/are unique phrases. There’s also a frequency-based criterion here. The method combines both by running both and and merging their output together before sorting them and filtering them. This method is the slowest but generally produces the best and most consistent output. Custom options Topick includes a few options for the user to customize. ngram The method defines options for n-gram generation. min_count is the minimum number of times a particular n-gram should appear in the document before being considered. There should be no need to change this number. max_size is the maximum size of n-grams that should be generated (defaults to generating unigrams). progressiveGeneration This options defaults to true. If set to true, progressiveGeneration will progressively generate n-grams with weaker settings until the specified number of keywords set in maxNumberOfKeywords is hit. For example: if for a mincount of 3 and maxNumberOfKeywords of 10, Topick only generates…

Load Balancing Graylog2 with HAProxy


This post covers quick and dirty TCP load balancing with HAProxy, and some specific instructions for Graylog2. (As an aside, if you’re looking for a gem that can log Rails applications to Graylog2, the current official gem only supports UDP. I’ve forked the repo and merged @zsprackett’s pull request in, which adds TCP support by adding as an option. I’ll remove this message when the official maintainer for merges @zsprackett’s pull request in.) Technical context: Ubuntu 14.04, CentOS 7 1. Install HAProxy On Ubuntu 14.04: On CentOS 7: 2. Configure HAProxy You’ll probably need root privileges to configure HAProxy: There will be a whole bunch of default configuration settings. You can delete those that are not relevant to you, but there’s no need to at this moment if you just need to get started. Simply append to the file the settings that we need: This directive block named graylog tells HAProxy to: Listen on port 12203 - you can change this if you want Operate in TCP (layer 4) mode Enable TCP logging (more info here) Use round robin load balancing, in which servers are distributed connections in turn. You can even specify weights for different servers with different hardware configurations. More on the different load balancing algorithms that HAProxy supports here Proxy requests to these three backend Graylog2 servers through port 12202, and check their health periodically 3. Create a TCP input on Graylog2 Creating a TCP input on Graylog2 through the web interface is trivial. We’ll use port 12202 here as an example: 3. Start HAProxy You can test if HAProxy is proxying the requests successfully by sending TCP packets through to HAProxy and checking the number of active connections on Graylog2’s input page. You should see something like: 4. Change HAProxy’s health check to Graylog2’s REST API The last thing to do, and really, the only part of HAProxy that’s specific to Graylog2, is to change the way HAProxy checks the health of its backend Graylog2 servers. Normally, HAProxy defaults to simply establishing a TCP connection. However, HAProxy accepts a directive called , in which HAProxy will send a HTTP request to some specified URL and check for the status of the response. 2xx and 3xx responses are good, anything else is bad. For Graylog2, they’ve exposed a REST API for the express purpose of allowing load balancers like HAProxy to check its health: The status knows two different states, and , which is also the response of the resource…

Setting Up Advanced Policy Firewall (APF) and Brute Force Detection (BFD)


This post is a fairly comprehensive reference to Advanced Policy Firewall (), a user-friendly interface of . We will also cover BFD (), a script that automates IP blocking using APF. Technical Context: Ubuntu 14.04, APF v9.7, BFD v1.5-2 Installation Basic Usage What actually does is add the IP entry to the file. does the same thing for . removes the IP entry from either or , if it exists. All three commands will call as well. APF supports CIDR notation for specifying rules for IP blocks, as well as fully qualified domain names (FQDR). There are basically three ways to use APF: Restrict on a per-IP basis Restrict on a per-port basis Restrict on a IP-port combination basis Restrict on a per-IP basis The most straightforward to do this is, as mentioned earlier, by using , and . Of course, you can edit or directly as well (specify each IP address on a new line). Restrict on a per-port basis By default, APF blocks a number of known malicious ports (see the main config file for an exhaustive list). To allow all incoming or outgoing connections on a per-port basis, we can edit the or setting respectively in APF’s main config file : Notably, these settings are overriden by rules in and . Restrict on a IP-port combination basis The and is very well commented regarding the syntax for specifying granular restrictions, so I’ll cover them only briefly here: For example: in will allow incoming connections from to port 22. Multiple IPs to the same port need to be specified on separate lines: APF Configuration Some other noteworthy APF configuration settings in that you should change: Development Mode When set to , APF will deactivate itself after every 5 minutes. This prevents you from setting stupid rules and cutting yourself out from a remote machine. Remember to set this to once APF is determined to be functioning as desired. Monokernel It might be an issue in situations where is installed into the kernel rather than as a package. In those cases, you’ll see something like: or Setting it to will fix the problem. Ban Duration I recommend setting this a lot higher than the default of 300 seconds. 21600 (6 hours), maybe? Reactive Address Blocking Set this to “1” to activate APF’s reactive address blocking. Subscriptions APF can subscribe to known lists of bad IP addresses. The below is an abridged portion of the config file that deals with this: BFD Configuration BFD barely has any configuration (which is A Good Thing™). The below is pretty much it…

Setting Up a Second Graylog2 Server Node


Technical Context: Ubuntu 14.04, first Graylog2 IP: , second Graylog2 IP: 1. Install Graylog2 Instructions here. (Note that the installing the Graylog web interface, , is optional). 2. MongoDB If your MongoDB instance already runs on a seperate machine from any of your Graylog, all you have to do is adjust your firewall rules for that machine (if any exists) to allow the IP address of the new Graylog2 server node to connect to port 27017 (or whatever custom port you’ve defined for your MongoDB instance). Otherwise If your MongoDB instance lives on the same machine as an existing Graylog2 node, that means your current configuration () will look something like this (it should, or you’re in big trouble): This means that your MongoDB instance is only accessible to other processes running on the same machine. If so, you may or may not have authentication set up on your MongoDB instance - it doesn’t really matter. You will need to change your MongoDB configuration to listen on a publicly accessible interface. Change by either commenting it out, or changing it to . Now that your MongoDB instance is publicly accessible, we’re going to have to take necessary security measures. MongoDB authentication Here, I’ll cover authentication in MongoDB very quickly. Open a MongoDB shell, make sure that you’re using the correct database, then create a new user with read and write privileges: Once that’s done, we can tell Graylog2 to use these credentials when connecting to MongoDB. In recent versions of Graylog2, the MongoDB connection is recommended to be specified using MongoDB connection string URI format, which may look something like this: Firewall After setting up authentication, you’d also want to set up appropriate firewall policies. Specifically, you should allow only the second Graylog2 server node to connect to MongoDB. I wrote a comprehensive guide to using APF and BFD here, which you should read. The APF rule for allowing to connect to port 27017 looks like this: 3. Graylog2 Most of these instructions come straight from the official docs: Change to false: Copy the from the existing Graylog2 server node: Change the MongoDB connection credentials (see above). Change the Elasticsearch settings to match your first Graylog2 server node’s (most importantly, the setting, which tells Graylog2 which Elasticsearch nodes to connect to) 4. Graylog2 Web Interface The web interface runs independently of any Graylog2 server nodes, so all we have to do now is inform it…

JavaScript Object Creation and Prototype Chains


There are 4 ways to create new objects in JavaScript: Object initializers, also known as literal notation Constructors ES6 classes Depending on which method you choose, the newly created object will have a different prototype chain. 1. Object initializers Objects created in this manner will have as its top-level prototype: Arrays and functions also have their own literal notation: In these cases, ’s and ’s prototype chains will be and respectively. 2. takes in an arbitrary object (or ) as a first argument, which will be the prototype of the new object. Thus, ’s prototype chain is: is actually quite special because any arbitrary object can be specified as the prototype, so we can do otherwise nonsensical things such as: In this case, ’s prototype chain will be: 3. Constructors When a function is invoked with the keyword, as in , it behaves as a constructor function, which means the following things will happen: A new, empty object is created, whose prototype is Thing.prototype (the prototype object of the function object) The body of the function is executed, with its set to the new empty object The return value of the function is the result of the expression, unless no return value is specified, then the new object is returned To highlight the fact that the property object is distinct from the object to which it belongs to, notice the following: If we think of as simply an object, this shouldn’t come as a surprise. In fact, if we were do something like this: Thus, ’s prototype chain looks like: and not ES6 Classes Prototype chains in ES6 classes behave almost exactly like constructors (that is because classes are syntactic sugar around constructors): Thus, ’s prototype chain is: And of course, as mentioned earlier, classes really are just syntactic sugar for constructors: See footnote for a little more detail on how subclassing with actually works and how it affects the prototype chain between the subclass and the superclass. I’ve used here for better readability, but you can also substitute it for its inverse, , like When I mean “a”, I actually mean any arbitrary function. Of course, functions meant to be used as useful constructors should look a certain way. And another optional object as a second argument that specifies property descriptors. Part of Babel’s transpiled output for includes a function, the full body of which is below: explicitly creates the subclass’s prototype object using , specifying the super class’s prototype as…

The Browser DOM


October 10, 2019: I haven’t gotten around to porting tables over, yet, so some content may look broken. The DOM, as implemented in browsers, is a model to represent a HTML or XML document as a tree of nodes, and presents an API to access and manipulate these nodes. There are twelve kinds of nodes, but two of them are used the most - element nodes and text nodes. HTML tags are represented as element nodes, while the content inside these tags are represented as text nodes. In a typical browser environment, every node is represented as an object (with properties). The object contains properties to access these DOM node objects. The root node can be accessed with . Another example is the body, which can be accessed with . In the DOM API, elements which are not found, or referenced before they are rendered are : DOM Traversal There are several ways to move around the DOM tree. |--- | Direction | Method |-|- | Up | | Down | , which includes all node types, and , which includes only element nodes | Left/Right | , DOM Search Aside from traversing, there are also methods to access nodes in the DOM tree directly. They are called on the object, as in : |--- | Method | Description |-|- | | Find a node by its | | Find all nodes with the given | | Find all nodes with the matching HTML attribute | | Find all nodes with the given . Understands multiple classes | , | Find the first node/all nodes that matches the given CSS3 query, respectively and offer more powerful queries, but are less performant than the other methods. All these methods that return multiple nodes in a , except , are live, meaning that they are automatically updated when the document is changed. XPath Another way to search through the DOM is using XPath. DOM Node Attributes and Properties Since DOM nodes are represented as objects, they have properties, just like JavaScript objects. These properties include implementations of the various standard DOM API interfaces, which are common across all nodes. Some properties are read-only, while others are modifiable. Some important ones are listed below: |--- | Property | Description | Read-only |-|-|- | | ID of node type | Yes | , | Name of element nodes | Yes | | Contents of element nodes | No | | Contents for other types of nodes except element nodes | No Additionally, each type of HTML DOM node has its own set of standard properties. MDN maintains a comprehensive reference of all these DOM object types, for example, the DOM object. HTML…

JavaScript Timers


Timers in JavaScript are used to schedule functions to be called at a later time. It does so by setting up a timer which, when it’s done counting down, places its associated function into the environment’s event loop. As such, guarantees that the function won’t be fired before its specified time, but it cannot guarantee that it will be fired at that specific point in time too - if event queue has other functions waiting to be executed, then it will be delayed. These timer functions are methods on the object and can be invoked as is: Functions scheduled with can be cancelled with : and have cousins which execute their functions in regular intervals, called and . Debouncing Timers in JavaScript can be used for debouncing to prevent event handlers from being invoked too frequently. There are two effects that can be achieved here: To invoke the event handler after a pause in event firing To invoke the event handler at slower periodic intervals An example of the first effect: We clear the timeout to reset the timer. An example of the second effect: We check if is true fist before creating a scheduled function. When the scheduled function fires, it resets to false.

new-Agnostic JavaScript Constructors


Constructors in JavaScript have gotten a bad rep because there’s nothing stopping the unwitting programmer from doing something like this: I wrote comprehensively about object creation here, which may be useful to read as a prerequisite to this post. This tip from Item 33 of Effective JavaScript by David Herman makes your constructor behave as intended, regardless of whether the keyword is present. The first method: An alternative method to save that extra recursive function call:

Installing Composer Globally


Technical Context: Composer 1.0-dev, PHP 5.5.27, OS X Yosemite Install Composer: This will create a PHP archive as well as a directory. To keep things cleaner, I move inside : Then, simply alias to by adding some variation of the following to your shell config (e.g. or ):

Liquid Tag Parameters


This is a nifty one. I wanted to create manual categories for my coding posts, so I ended up doing something like this in my (a simplified excerpt): … which was really ripe for DRYing up. The problem was that the partial would need to accept a parameter of some sort (for the category name). The official Liquid documentation leaves much to be desired, but there’s a tiny part in the Templates section of the Jekyll documentation that saved the day: You can also pass parameters to an include. Omit the quotation marks to send a variable’s value. Liquid curly brackets should not be used here: These parameters are available via Liquid in the include: So I abstracted the above into a partial, naturally so: and the resulting : Neato!

Daily brew with launchd


It’s a good idea to run every day. Instead of , I use instead. has better integration with system behavior, such as running scripts on system wake. (Admittedly, the XML syntax is quite horrid, but we’ll live with it.) Create a file in : This will run , , , and at midnight every day. Load the file To load the file, there are two options available: is a wrapper around :

Viterbi's Algorithm for first-order and second-order hidden Markov models


I originally wrote this post to help me understand and internalize HMMs and the Viterbi algorithm, but school got too busy and I couldn’t finish the post. Still, there’s a fair amount of material here so I decided to publish it anyway. I hope it helps you at least a little bit, as it did for me. Hidden Markov models (HMMs) are representations of many kinds of real observations in real life. For example, when a baby is crying, you don’t actually know why she’s crying - it could be that she is hungry, or that she is tired, or any other multitude of reasons. As a parent who is also a machine learning practitioner, you decide to use data to predictably determine why, so that you can soothe them as effectively as possible. Firstly, you want to establish the parameters of the model. This means establishing 3 kinds of parameters: Set of (hidden) states For simplicity’s sake, we’re just going to assume that the baby can only be in states “normal”, “tired”, and “hungry”. These states not directly observable by us, or put another way, they are hidden from us - thus the “hidden” in hidden Markov models. This is what we want to be able to predict. Set of observable outputs In this case, the set would consist of and . This output is a cause of the hidden state that the baby is in, and is said to be “emitted” from the current state. For each state, there is a chance of emitting a certain output. Transition probabilities The transition probabilities is a set of numbers that describe the probability of transiting from any state to another. In this case, this would be described by: So, given that the baby is currently crying, there’s a 0.8 probability that she will continue to cry, a 0.1 probability that she will stop, and a 0.1 probability that the observed sequence will end (this is when you pass the baby to your spouse and question, for the 124th time, why you became a parent). Emission probabilities The emission probabilities is a set numbers that describe the probability of an observable output being “emitted” given a current state. So, given that a baby is currently not crying, there is a 0.6 probability that she is feeling okay, a 0.2 probability that she is tired, and a 0.2 probability that she is hungry. ( and represent the start and end of an observed sequence, so an example would look something like ) The question is: If I observed my baby to be , what’s the most likely sequence of states my baby was feeling throughout that period of observation? Could it be…

Initial Form Values with redux-form


Caveat emptor: I’m currently using of . As with most things on the bleeding edge, this post will probably be irrelevant in a few months (or weeks). There’re a few reasons one may need to have form values initialized upon render - edit forms, or default values. In vanilla HTML, one would simply do: It’s a little different in JSX. In JSX, Facebook recommends the following way. To set initial form values in that are actually initialized in the Redux state, one needs to pass in an object with a main key and field key-value pairs as props to the component: If all goes well, you’ll see the initial values being loaded into Redux state upon initialization, which can be submitted (to ) right away:

Route Organisation with Express 4+


Caveat Emptor: As far as I can tell, this routing functionality will continue to work in Express 5, which at this time of writing is still in early alpha stage. Read the Express 5 Upgrading Guide here. Also, in the Nested Routes section below, I mention passing the option the method - this was only introduced in Express 4.5.0. The class was introduced in Express 4 to allow developers to instantiate routers as standalone modules with their own middleware and routing systems. For me, the greatest benefit is that this affords more flexibility and makes it easier to manage different combinations of middleware of varying specificity. This simple example gives a bird’s eye view of the different components of a working router implementation: At this point, the active routes would be: How I Organise my Routes Now, with a big app, we can split its many routes into different files, and instantiate/mount them neatly. This is great if you’re intending to be RESTful, because the organisation will come quite intuitively. I’m assuming a directory structure like this, where apples, oranges and pears are our resources: In , all we have to do is bootstrap the routes: In , we require each route file iteratively. Take note that usual routing precedence rules apply - both the file order as well as the route order inside a file matters. We can DRY it up even more by getting the file path from the mount point (or the other way around): There are many different ways to specify middleware for one or many route files. Here are some ideas: We can run multiple separate loops: We can even use regex (regex for flex!): Within a route file like , we can specify a middleware specific to the routes inside the file: And last but not least, if you only want to apply middleware to a single endpoint, you can do that too: Well, I did mention flexibility, didn’t I? Nested Routes Since we’re on the topic of RESTful, how can we do nested routes? Let’s say pineapples is a resource nested within apples, so we’ll want to have routes that look like: and so on. You can add nested routers as middleware. The key is to add to the Router factory so that the nested router can access the params from its parent router, which in this case is . In , it’ll look like this: Of course, if your app is really complex, you can put nested route files inside folders: and change the code above to reflect this file structure.

mySQL Native Bindings in OS X El Capitan


Booting up a Django project, I ran into the following problem: Googling around for solutions was really confusing because there seemed to be a mix of old and recent problems and solutions, like this StackOverflow question which was asked 4 years ago regarding OS X 10.6, but with answers from this year addressing El Capitan (yes, I know, wtf). Another one exactly like it. I installed mySQL with , but wherever it is, you can find the file: It seemed the cleanest way was to use , so I tried it: But that didn’t work. After a quick google and wading through tons of non-starters, I finally found a fix: all you have to do (inside the or ) is set the environment variable like so: It’s not the most elegant solution, but it’s the simplest and least intrusive one.

OpenSSL woes with Ruby 2.2.3 and rvm


If anyone is getting any errors along the lines of the following: for any reason at all (/, , etc.) after upgrading to Ruby 2.2.3 with , reinstalling it from source fixes it: Another one of those supremely obscure bugs. I hope this saves a few hours/days of desperate googling/stackoverflowing.

Adding MiniMagick support to carrierwave-video-thumbnailer


TLDR: I forked and added MiniMagick post-processing support. Using good ol’ Carrierwave to upload video files to S3, I came across a need to generate video thumbnails. A quick Google revealed , which worked perfectly. However, I needed to rotate the thumbnails (90 degrees clockwise) before uploading them. I thought it would be trivial, but it turned out that Carrierwave’s DSL made using multiple processes in a version block intractable. I couldn’t find a reasonable way to proceed from here, and after spending 2-3 hours trying different permutations of the above, wrangling Carrierwave’s DSL and trying to get things to work, I finally dug into the gem to see if I could get MiniMagick to work inside of the call. To expose as much of MiniMagick’s API as possible, I opted for the user to pass in a Proc. To use it, simply pass in a Proc as : You can look a look at my commit here for more detail on how I implemented the change. For some reason, the thumbnails that were generating from portrait iOS videos were rotated 90 degrees counter-clockwise. Some others have faced this issue as well.

Setting up Golang with Fish and Homebrew


Install Golang with Homebrew Install Mercurial (optional) I’m not sure if Mercurial is strictly required, but may be useful for fetching packages from certain Mercurial repositories: Set $GOPATH and add to path In the world of Golang, all your Go projects live in a single directory which is specified by the environment variable. This is quite different from development environments in other languages, but you should get used to it. If you’re not sure where to put your , I’d suggest putting it in the home directory: Create the directory: Then add these lines: to your (or create one in if it doesn’t exist). Check Boot up a new shell and type to check your environment variables. Your path should contain both and 👍

Cathode, or, My Favourite Music from 2015


As a school holiday project, I made a Hugo site to document all of my favourite music from 2015, and released the theme separately as a standalone called Cathode. Check the site out here, and the theme here. Note: I’m writing this post concurrently, so it’ll be kinda be stream-of-consciousness. Objectives: Learn something new Document my favourite music I have about 4-5 days. Learn something new I looked through a list of static site generators over at and picked the highest-ranking one that was written in a language I wasn’t already familiar with. If I have pick up a new language, this will be a great motivator. I picked Hugo. I also wanted to learn PostCSS, having heard many great things in the past. I’ve become very comfortable with SASS, Compass, and Susy. A little too comfortable. Document my favourite music I could have done it the traditional way: 1/2-column layout, top 20 albums, top to bottom, that’s it. Easy to do, a no-brainer with a blog-aware generator like Hugo. Also, at around this point I realize how ridiculously fast Hugo is. I want to replicate iTunes’s album view interface. I think it’s possible - might be quite challenging, but it’ll be fun. Well-defined end goal is also a good thing. Does it make sense for the web? Performance The site will be image-heavy, because album art. But it needs to be as fast as I can make it in the time I have. Image optimisation, yup, ImageOptim - lazy loading, yeah, I can do that I think, but I’ll leave the difficult parts to the last. jQuery jQuery is like fast food. It gets you your nutrition, and fast. It’s not good for you in the long run, but I only have 4 days; I need to pick my battles. Also, I don’t want to waste one whole day setting up Webpack or any dev workflow, so no ES6. I hate ES5 so this will force me to write as little JavaScript as possible. This is a good thing because performance and because I’m working on a Redux and Express project concurrently and I’m finding out that it’s possible to OD on JavaScript, even with ES6 and fancy experimental ES7 features. Templating with Go Hugo uses Go’s html/template library. Having used my fair share of templating languages - ERB, Handlebars, DTL (Django template language), Liquid, Slim, Jbuilder - Go templates by far uses the weirdest syntax I’ve ever seen. This is an actual example from their docs: I’m sure part of it is because I don’t actually know much Go. I really like the template functions though, because they…

Invalid Gemspec


A minor, but frustrating bug. While working on Rogger, I came across the following problem while bundling my local Rogger into a test project: I was quite confused. I checked the gem file but nothing inside seemed like a reference to . I tried building the gem: The same error happened. It turns out this happens because the gets the list of files from the command. I happened to check the gem file in, which was why this error was occurrring. Removing it, then, fixed it:

Rogger 0.1.2 Released


Rogger v0.1.2 has been released - a patch that makes logging to Graylog2 even easier. For example, in a Rake task you can do something like: Notably, this is also the first time ever that I’m using metaprogramming in actual code. The actual bit is below: This bit defines the , etc. module-level methods that you see above using , encapsulating singleton’s methods as module methods instead. These levels correspond to Rails log levels and are used elsewhere in as well, so Rogger takes advantage of that. The user, then, does not have to deal with the actual GELF logger instance, as is already the case pre (Rogger hooks onto the Rails logger by using ).

Logging and Notifications with Rogger, Graylog2 and Twilio


Rogger 0.1.2 was released yesterday, so check it out if you need a way to log your Rails applications / Rake tasks on Graylog2! Anyway, this post is a showcase of how we’re using Rogger in production to log exceptions raised in Rake tasks and notify us by SMS using Twilio. We have a few Rake tasks that run daily. Some of them are pretty sizable, and we want to know if any of them fails and throws an exception. Using Rogger’s , we can wrap the task like so: If the task does throw, Rogger will log it in Graylog2: Notice the level. Level 4 corresponds to the level (level 7 for ), which is what sends exception messages with. Graylog2 Streams We then create a stream in Graylog2 with the following rules: The source name corresponds to the setting as configured in , and defaults to . Twilio Notifications We install the official Graylog2 Twilio SMS plugin on all our Graylog2 nodes. The instructions are in the README but it’s really straightforward - just the file into your Graylog2 plugins folder and restart the Graylog2 server process (). Once it’s successfully installed on your Graylog2 nodes, you can add the alert as a callback, and add your Twilio account details in: We can test if the alert is working by using “Send Test Alert”, and even do a full integration test by writing a sure-fail test (something like will do nicely): Conclusion Now, everytime any of our aforementioned daily Rake tasks throw an exception, we will be notified by SMS - thankfully we set it up to run daily at a sane time (no, not in the middle of the night).

How Ruby Objects Are Implemented


Next post: How Ruby Classes are Implemented I’m currently reading Pat Shaughnessy’s excellent book Ruby Under a Microscope, and these are notes that I’ve summarized from the chapters I’m currently going through. This post, and the next, are notes from Chapter 6, Objects and Classes. It can be a bit confusing to describe the content purely in words, but the book itself contains many helpful diagrams, so pick it up if you’re interested! Every Ruby object is the combination of a class pointer and an array of instance variables. Pat Shaughnessy A user-defined Ruby object is represented by a structure called an , and is referred to by a pointer called . Inside , there is another structure called , which all Ruby values will have. Aside from the structure, also contains , a count of how many instance variables the object has, , a pointer to an array of values of the instance variables, and , which is a pointer to a hash table stored in the object’s associated structure that maps the name/identity of each instance variable to its position in the array. In this case, the RObject representing will have a of 2, and its will be a pointer to an array containing the values and . The RObject representing will have a of 1, and its will be a pointer to an array containing just . Both and will have a structure whose pointer references the same Fruit structure. Generic Objects (, …) Generic objects such as strings and arrays are represented by more specialized versions of , called , , etc. Their internal representations are more optimized for the kind of values they store. An example of this optimization is the presence of , which is an array of a certain fixed size. This array will be used to store the values of the instance variables in the structure itself if they fit, instead of allocating memory for and referencing an external array. They also contain the structure. contains a few internally-used flags and a pointer to its associated class, called . Classes are represented by a structure, which is discussed in the next post. Simple Values Simple values like (small) integers, , and do not have an associated -like structure. Instead, their value is stored directly in itself. The identity of these values are indicated by different flags in (do not confuse the flags in with those in , they are different). For example, if the is 1, then Ruby knows to intepret the rest of as an integer value instead of a pointer address to its associated (or etc…

MVC Organisation with Express and Thinky


I wrote an example to demonstrate a way to organise your Express app in a MVC manner without resorting to complicated frameworks. Check it out here. This particular example is as barebones as possible. In particular, there’re no explicit view layer to speak of - we’re simply returning JSON. We could have implemented, for example, server-side rendered React/Redux into this example, and I intend to do that in as a separate example. There are a few concepts I want to talk about here: Model files Controller files First, we import so that we can use , which greatly simplifies working with promises and is the main reason behind the leanness of the code. Then, we import and to create the main Express object, and to parse the body of requests that are sent in either or JSON format. Then, we require the file, which contains the connection details for RethinkDB. We inject the config object as a dependency to , which creates the connection to RethinkDB and bootstraps our database schema and table objects. I will be using this pattern a lot more throughout the example. In , we define the entire schema, just like in Rails. I will defer the details of the syntax to thinky’s documentation. Note that this file can swapped out for any other ORM if you want (Sequelize, etc). After the schema is defined, we pass the thinky database objects to the model files: As a matter of convenience, I chose to pass all the models so that model methods can reference other database objects as well. Model Methods Model methods serve as a way to encapsulate ORM logic and make it reusable. Even though ORMs themselves already offer a layer of abstraction on top of the database, I prefer having model methods as an additional layer to meet common needs in your application’s business logic. Model methods are also handy in case you ever want to swap out the ORM for another one down the road. As an example, let’s look at the User model: A user has and methods, which behaves like and in Rails. The method is simply a thin wrapper around the thinky method, while wraps around the and thinky calls. This is especially notable because using constructor functions () is generally a bad idea (a questionable design decision on thinky’s part), but wrapping it around our model method limits the potential damage it can cause. Routing Back to , the routing is initialized by passing in the Express object and our models. The routes are found in , and is designed in a way as to mimic…

Pretty Form Validation with Redux Form


I made a nice and unobtrusive animation for form validation with redux-form: a To implement it, I made a component: The CSS for the component is surprisingly simple: The style is for overriding the browser’s default border styling, and can be left out if you already have existing border styles for your inputs. The gives a bit of spacing between the indicator and the text in the input. It’s important to note that, for the width of the input to stay fixed even as the border is transitioned in, you need to make sure that your is set to . Paul Irish’s CSS snippet works like a charm: Now we can use it in a redux-form component like so: 🙌

Preprocessing in Searchkick


Coming back to Jekyll from Hugo, I’d grown accustomed to shortcodes, which are awesome to keep your Markdown source as HTML-free as possible. You can emulate shortcodes with custom liquid tags! In your folder, create a new file, and then write a class that inherits from . Let’s see an example below of a tag for embedding Youtube videos: The general thing to take note of here is that whatever appears after the tag name will be stringified and sent to the second argument of , so for example, a custom tag that looks like: in the method above will be (note the trailing whitespace). There’s no further treatment, so you will have to parse the string yourself somehow to identify what is what, and then assign what you need to instance variables: The class must also implement a method, which returns a string representation of the HTML, so in this case of Youtube embeds: And voila! You can now use your Youtube tag like so: Great success 👍

Submit Behaviour in the button HTML Element


Not sure how I only found out about this only today, but a HTML element without a attribute defaults to the value. Test it yourself with the test rig: So if you still want to have buttons in the form that do not trigger the submit event, you have to explicitly give it a of . This is confirmed by the W3C spec: The missing value default is the Submit Button state.

Agnostic HTTP Endpoint Testing with Jasmine and Chai


In this post, I’m going share my strategy for endpoint testing. It has a few cornerstones: It should test against a running server by sending HTTP requests to it, instead of hooking onto the server instance directly, like supertest does. This way, the strategy becomes agnostic and portable - it can be used to test any endpoint server, even servers written in other languages, as long as they communicate through HTTP. Each suite should be written as a narrative. To this end, BDD-style testing is very suitable. As an example, consider the narrative desribing the authentication flow for an app: I register as a user, providing a suitable email and password. The server should return a 200 response and an authentication token. Then, I login using the same email and password as earlier before. The server should return a 200 response and a authentication token. I login using a different email and password. This time, the server should return a 401 response. If I register with the same email as before, the server should return a 422 response and an error message in the response body indicating that the email has been taken. A few points to take note of: Even though the strategy is meant to be as agnostic as possible, you need to find a way to run the server with a empty test database, and then have some (hopefully scripted) way to drop it once the tests are complete. This part will depend on what database adapter/ORM you are using. I will share my solution for an Express server backed by RethinkDB later. Remember that the database is a giant, singular hunk of state. If you’re going to be adopting this style of testing, there is no way around this. You’re not just going to be running GET requests - you’re going to be running POST and PUT and DELETE requests as well. This means that you need to be very careful about tests running concurrently or in parallel. It’s great to have performant tests, but don’t trade performance for tests that are easy to reason about and which reveal clearly which parts of your app are breaking. I tried Ava first, and was actually halfway through writing the test suite for a project with it. I really liked it, but Ava was built for running tests concurrently and in parallel. There came a point where the test suite would fail unpredictably depending on the order in which the tests were run. Although it’s possible to run Ava tests in serial, I felt like I was fighting against the framework. I also considered Tape, but I consider Ava to be…

Partial Functions in Haskell Prelude


Partial functions are functions that are not defined for all possible arguments of its specified type. The most common example of a partial function is . It has the innocent-looking type signature of fails when an empty list is given: Because of Haskell’s type erasure, doesn’t even know what the supplied type is, so there’s no way for it to return a value of type when there are no values in the list (besides the obvious fact that there are no values in the list). In Brent Yorgey’s CIS 194 Haskell course: head is a mistake! It should not be in the Prelude. Other partial Prelude functions you should almost never use include tail, init, last, and (!!). Haskell’s official wiki provides a complete list of partial functions in Prelude. Note that some of these functions are considered partial functions because they do not terminate if given an infinite list. Given all that, I think using something like is okay if composed with other functions that guarantee non-emptiness of the list, or if the function type signature is , a type class which guarantees a list with at least one element. For example, consider , a function which returns the last item in a list: In addition, consider , a function which, when given an value, returns a list of its constituent digits: A function like guarantees a non-empty list, and when combined with , we can get the last digit of an value: Or consider , which accesses an element in a list by index. If the index provided is too large, an exception is thrown: An idea is to wrap with the data type in a function like this: I’m surprised that such commonly used functions in the Prelude are so dangerous, so it’s good to pay attention when using them. Partial functions like are easy to replace with pattern matching, but others may be harder to supplant.

Importing .rdb files into Redis with Protocol Streams


As a preface, read this for the motivation and concept behind Redis protocol streams (as usual, Redis documentation is amazingly well written and a joy to read). redis-rdb-tools is a utility that creates a protocol stream from a file. However, the original repo has a Unicode decoding bug that prevents it from working properly. Thankfully, someone forked it and patched it, and I can confirm that the patch works for me. To install (make sure you’re on Python 2.x, not 3.x): (I’m installing the specific patch commit because it’s a fork and who knows what’ll happen to it in the future.) Then, to import the file, just a simple one-liner: If successful, you’ll see something like: as well as this in your Redis server logs: Notes: Make sure you already have a Redis server running. The flag is only available for Redis 2.6 onwards. If you want to inspect the protocol stream visually before importing, you can leave out the piping to and it will pipe the stream to (or you could pipe it to a text file). The alternative way to import is to copy the file to the location specified in your (or modify to point to your file). However, I think using protocol streams is a cooler solution. 😎

Exporting ES6 Library Modules as Global Scripts with Webpack


After authoring a library (Remerge) using ES6 style modules, aside from consumption with , I also wanted to release it as a standalone script so that users could use it by simply including it with a simple script tag - well, if they wanted to, at least (appealing to the lowest common denominator here heh). The Scenario So I wanted to be able to do something like this: Notice that both scripts have a slightly different export pattern. In my code, I wrote such that it was a default export: and as named exports, spread across a number of files (tackling the general case here): The Problem The problem is that, as it is, bundling these ES6 modules will result in objects that look like this: which is technically usable, but to use it would look like: which is a real PITA for ensuring a consistent API. The Fix The fix for this is simple: simply write a file that imports those ES6 modules the ES6 way, but exports them in nodeJS style. Webpack will then handle them correctly! For the top-level function export, simply export the function as is: For the object whose properties should be utility functions, the solution is similarly straightforward - simply create an (in this case anonymous) object whose properties are those functions: For my library, I’ve placed these “helper” files in a folder, as their sole purpose is for Webpack to generate the script files. You can place them anywhere it makes sense. The Webpack configuration for handling this is very simple (as far as build configs go): Then you can use them as shown in the HTML snippet above! For an actual live example of this in action, check out my library Remerge.

Circular References in GraphQL Type Definitions


When defining GraphQL types, it’s common to run into situations when two types reference each other. This is a problem because one type will be undeclared or undefined when the other is evaluated. To fix this, the reference JavaScript implementation allows us to indicate the fields using a function that returns an object, instead of a plain object. This function is lazily evaluated during runtime, so we will not run into problems with the interpreter.

Exposing React Performance Tools in the Browser Console


The React docs on Performance Tools is quite lacking as to how it should be used, so I had to do some digging around to figure out how it worked. It turns out doesn’t work in component lifecycles, raising an ugly internal error that hasn’t been properly dealt with yet (see this issue for more details). This prevents us from doing some things like: Calling and measuring performance repeatedly. Using to measure between arbitrary points in deeper parts of your component hierarchy. I solved these problems by exposing React’s performance tools in the browser. There are a few ways to do this. If you’re using webpack, you can use , which allows us to attach modules to the global object: You can then start measuring in the console at any time: and you can call in a component lifecycle method: or in the browser console: and the rest of the printing methods do.

Migrating to Visual Studio Code


I’ve been testing out Visual Studio Code for writing Node stuff, and I really like what I’m seeing so far. I’d like to confess that 99.9% of the reason why I even considered trying it out in the first place was its ability to debug Node apps. I tried Intellij IDEA previously and was so put off by the IDE-like interface (my Eclipse days are still not quite past me) that I couldn’t bring myself to migrate over. On first boot I immediately taken aback by how familiar to Sublime Text the interface was. I might even add that it feels somewhat snappier, although I can’t pin it down to a specific thing. A comparison of VSC vs ST, listing only the features that I’ve used and enjoyed and the features I’m missing: What VSC has Drag and dropping in the sidebar Node debugging that actually works consistently and without fuss If this was the only thing going for it I would still have used it IntelliSense features First class support for TypeScript Nifty features What VSC doesn’t have Tabs Sublime Text has the same intuitive tab handling as Chrome, and I was very surprised to see VSC depart from this pervasive pattern, instead opting for a strange “Working Files” way of handling, well, working files. It’s not completely bad, just strange. I’d love to see the usual tab handling functionality introduced into VSC as an option in the future. Find in Files I use “Find in Files” very regularly, and Sublime Text’s way of presenting the search results is still more familiar for me. VSC’s search results is tucked into the criminally narrow sidebar, which makes it very difficult to inspect the results. No Emmet/Zen Coding package Not that applicable to Node development, but I still use it occasionally for writing JSX Babel/ES6 Support Of course, debugging my apps in VSC would mean that it has to be set up to run ES6 code, and I’m happy to report that this is very possible and very easy. In , simply set the to the binary in your project’s : Configure the rest of your as per VSC’s documentation. Note: There seems to be no good way of launching script commands from - will be investigating this in more detail. Key Bindings Most of the key bindings I use are the same. These are the only key bindings I’ve had to change so far to make it consistent with ST’s: Theming I use Afterglow in ST, and I’m happy to report that there is also Afterglow for VSC (just search for “Afterglow” in Extensions). Packages I’ve also found the following packages to be useful so far: ES6 Snippets Not…

Data Structures in TypeScript


Took some time off this weekend to learn some TypeScript, and reimplemented some data structures as practice. I don’t think I’ll continue to use much TypeScript moving forward, because it seems to promote the classical inheritance style of thinking in a language that doesn’t need or want it. TL;DR: Prototypical/concatenative inheritance is much nicer than classical inheritance. That said, first-class support for TypeScript in Visual Studio Code is quite awesome. While implementing the data structures, I came up with a TypeScript/prototypical hybrid style for writing object factories. In this hybrid style, the only place I’m writing any TypeScript is using interfaces for prototypes. Let’s take a class for example: I return a object whose prototype is using , and its initial state using . This style allows VSC to shout at me if the prototype doesn’t implement the interface correctly. It’s nice to have, but not worth switching over to a TypeScript workflow from the Babel workflow that I’m already used to.

Custom Liquid Tags


Coming back to Jekyll from Hugo, I’d grown accustomed to shortcodes, which are awesome to keep your Markdown source as HTML-free as possible. You can emulate shortcodes with custom liquid tags! In your folder, create a new file, and then write a class that inherits from . Let’s see an example below of a tag for embedding Youtube videos: The general thing to take note of here is that whatever appears after the tag name will be stringified and sent to the second argument of , so for example, a custom tag that looks like: in the method above will be (note the trailing whitespace). There’s no further treatment, so you will have to parse the string yourself somehow to identify what is what, and then assign what you need to instance variables: The class must also implement a method, which returns a string representation of the HTML, so in this case of Youtube embeds: And voila! You can now use your Youtube tag like so: Great success 👍

Jekyll Variable in SCSS Files


You can use Jekyll site variables in top-level SCSS files. For example: These site variables will also visible in partials that are imported after it:

0-1 Knapsack Problem in Python


The 0-1 Knapsack problem is a variation on the knapsack problem with one condition: there is only one copy of each item. You may choose to pick it or not. Given this condition, it’s possible to iterate through the items and memoize the decisions sequentially. At each iteration of and : Check if the current item is larger than the capacity. If it is, ignore the item. In this context, ignoring the item means that for this particular combination of and , the maximum value is the same as the previous value of . If the current item can fit in the knapsack, then the maximum value for this combination of and is the larger of the values resulting from both outcomes. represents the value if you choose to ignore the item, and represents the value if you choose to put the item in your knapsack.

Iterative Tree Traversals in Python


Tree traversals are most naturally expressed in recursion, but iterative versions are cool too, plus they take only O(1) space. Inorder traversal: Visit the left subtree first, then the node, and the right subtree. Preorder traversal: Visit the node first, then the left subtree, then the right subtree. Postorder traversal: Visit the left subtree, then the right subtree, then the node. The concept behind the iterative versions are as follows. There are three states a traversal can be in: You’ve just visited the left or right child of a parent node. You’ve just gone back to a parent node from its left child. You’ve just gone back to a parent node from its right child. Keeping three pointers: to designate the previous node, to designate the current node, and to designate the next node, we can codify the above conditions like so: With that in mind, I present the three different traversals, whose function signatures take a as the first argument and a function to operate on the tree nodes as the second argument.

An Interesting Exercise in Dynamic Programming


Given an array of building heights (with unit width), calculate the area of the largest rectangle that “fits” within these buildings. For example, for an array : the correct answer is 10. You might want to attempt the question first before proceeding further. The brute force method takes time by taking each pair and finding the minimum, which is the “height”, and multiplying it by the “width”. However, there’s actually a linear time solution. Let us iterate through the building array from left to right. For some building , we want to find out the left and right boundaries of the largest rectangle whose height is the height of building and which includes building itself. Let’s talk about finding the left boundary first. To do this, we can, for each index , iterate leftwards and check. This results in a quadratic time solution. However, we can do better than this. The insight to achieving linear time is the fact that, when looking for the boundary of the rectangle, we can “throw away” buildings to the left of which are higher than building itself. In effect, we are only looking for the building that will “prevent” the rectangle from extending further. The reason we can do this safely is because, for future calculations (of buildings to the right of building ), these buildings won’t be considered in any case because the (current) building is shorter than them and would be “bottlenecking” them. We can use a stack to do this. For a building , we push it onto the stack if it’s higher than the building of the stack. If it’s not, we continuously pop buildings off the stack until the building on the top of the stack is shorter than building . Since each building is pushed on and popped off the stack at most once, this results in an amortized constant time check for each building . We repeat this linear-time procedure twice, one in each direction of the array, to obtain the left and right rectangle indices for each building in the array. At the end, we can calculate the largest rectangle by iteratively taking the difference in matching indices in both the left and right indices table and multiplying it by the height of the building. Now, a different question: Given a 2-dimensional by matrix with only 0s and 1s, calculate the area of the largest rectangle that contains only 1s. For example, for the matrix below: The correct answer is 8. You can give it a try too, before proceeding. Surprisingly, there is also a linear time solution for this problem. The…

Building a Dynamic UI System at Carousell (Part 1)


For an e-commerce platform past a certain scale and size, having some sort of a dynamic UI system becomes a necessity due to the complexity of dealing with the real world and the sheer variety of physical products. It becomes even more complex at Carousell because of the following two reasons: Since Carousell is primarily a classifieds platform, many listings are user-generated (as opposed to being created by businesses/inventory managers) which means that standardization is an issue. Inventory management concepts such as SKU have limited bearing. Carousell doesn’t just deal with physical products — our ambition of becoming a marketplace platform means that we need to have the ability to support listings that are non-physical in nature, e.g. jobs, services, insurance. In this 3-part series, we will talk about a dynamic, server-driven UI system we’ve built to cater for and scale along with our business needs. This first part will talk about some of the problems we faced before we had such a system. The second part will go into more depth about the system itself, and the final part will address some miscellaneous parts of the system in a Q&A format. We hope this will be a useful reference for anyone who is either: wondering if they need a system like this at some point in the future, or already knows that they need a system like this and are looking for inspiration on how to start Some time ago, when you wanted to list something for sale on Carousell, all you had to do was fill in 5 things: Pictures Category Title Description Asking price And off you’d be to the races. To everyone’s surprise, the real world wasn’t quite this simple. Having only the description meant that sellers had to get creative, using prior experience or guesswork to decide what information to fill in the title and description. New sellers who were not privy to the emergent behaviours of the dizzyingly busy marketplace would leave the platform disheartened, discouraged from not having their listings sell. The problems didn’t end with our sellers. We found that high-intent buyers often needed to narrow down their search to listings matching some very specific criteria. This behaviour was particularly evident in categories such as cars and properties, where the limitation of having only a textual search was thrown into sharp relief. Imagine a user searching for 4-room flats, where possible listing titles could look like this: Super rare 4bedrm, walking dist to Bishan MRT or Cheap! 3+1 Blk…

Building a Dynamic UI System at Carousell (Part 2)


This is the second part in a trilogy of articles about a dynamic UI system we’ve developed at Carousell over the past one and a half years to solve a series of related problems. In the first part, we talked about these problems. This second part will go into more depth about what fieldsets are and some of the properties of the fieldset system. What is a fieldset? A fieldset is a document (in the same sense of a HTML document) with a fixed schema. This schema can be expressed as a hierarchy of entities: The lowest level of this hierarchy consists of entities called fields. Several fields make up a group, several groups make up a screen, and several screens make up a fieldset. To draw a connection between the abstract concept of a fieldset and what it represents visually, let’s take a look at the sell form for the iPhone category: Here is the same sell form, annotated: Now let’s dive into the annotation to get a better handle on things. A fieldset, coloured in blue, represents the entire user journey of creating a listing. In this particular journey of selling an iPhone, there is only one screen, which is coloured in red. Within the single screen exists multiple groups, coloured in green. In the context of the sell form, each group represents the grouping of related information — information pertaining to product details are grouped together, as is information about various deal methods such as meetup or mailing and delivery. Groups are not just containers for fields either; they have their own data as well. Fields, coloured in purple, represent an atomic piece of information to be filled in by the seller. Fields form the bulk of the information within the fieldset, containing information such as UI rules and validation rules that determine the look and behaviour of the component. We will discuss fields in more detail soon, as they represent the core of the Fieldset system. In the following image, we apply the same annotations the listing details page: On the listing details page, we have a top-level fieldset with a single screen. Within that screen, we can see groups being marked out quite clearly, with the largest group containing the bulk of the listing information. Within this group, each listing attribute such as title, description, model, and storage is represented as a field. Fieldsets are canonically represented and stored as JSON documents, a decision we made early on due to its portability across client and server and relative ease of…

Building a Dynamic UI System at Carousell (Part 3)


This is the third and final part in a trilogy of articles about a dynamic UI system we’ve developed at Carousell over the past one and a half years to solve a series of related problems. In the first part, we talked about some problems we faced at Carousell, and in the second part, we introduced fieldsets and how the system works by showing some examples of client-server interactions. This third part will talk an assortment of topics related to the Fieldset system. These are some of the topics we will touch on in this part: Was fieldsets inspired from other similar systems? What did the prior art look like? How do we manage (the large number of) fieldsets? How we decide the fieldset markup for a component How the development and deployment process looks like How the testing suites work How we do localization Were fieldsets inspired from other similar systems? What did the prior art look like? Back during the early planning phase of fieldsets in April 2017, we were unable to find much literature on developing such systems, and none of us in the team had experience building anything like it. So at that time, the blank slate was both challenging and daunting. Fieldsets was born out of not just what we needed at then, but sheer imagination of how we could reasonably predict Carousell evolving in the next few years. Since then, Airbnb has published something which seems to be very similar in spirit. How do we manage (the large number of) fieldsets? Because we couldn’t predict how fieldsets would be used, we decided to go with the easiest and most straightforward approach that we could think of, and evolve as needed. The approach was to keep the fieldset definitions under version control with Git. While Git provides a nice GUI and version control, it doesn’t provide anything in the way of fieldset-specific features. Something we’ve since observed being done a lot, is mass modifications of fieldsets, like updating a component being shared by many fieldsets with a new field (see the section on fieldset markup below). We’re working on an internal tool to make these kinds of changes easier by allowing certain classes of mass modifications trivial, while at the same time exposing an escape hatch for scripting more complex changes. How we decide the fieldset markup for a component The markup for a component refers to the actual definition of the component. For a text component, it might look like this: One can think of this markup as the “latitude” of the component…

Longest Consecutive Sequence


Given an array of integers, find the length of the longest sequence that contains consecutive integers. The array is unsorted, and may contain duplicates and negative integers. For example: the array should return the answer 4, since the longest sequence is . This is an example of a bookkeeping problem - there isn’t anything special to do except to keep track of the positions of current sequences. Consider the elements as laid out on the integer number line. As we consider each element from left to right, imagine marking each element’s position on the line. Note that we can ignore duplicates without loss of generality, since duplicates do not contribute to the final answer. There are four cases to consider. When marking the current element on the number line: neither its left or right neighbour has been marked. its right neighbour has been marked, but not its left neighbour. its left neighbour has been marked, but not its right neighbour. both neighbours have been marked. In cases 2 and 3, the current element extends an existing sequence by 1. In case 4, the current element connects two existing sequences together. We maintain a map whose keys are the integers to be considered, and a 2-tuple containing the starting and ending integers of a sequence. Since we are only considering consecutive sequences, we can infer the size of a sequence by its start and end. in both time and space. Alternatively, one can also solve it by sorting the array and maintaining a constant number of variables to keep track of the longest sequence seen so far in one pass from left to right. That will be in time and space.

A Redis recipe for lists of recency


A nifty recipe for maintaining a list by recency. For example, a list of recently viewed page ids, maintained per user. When inserting a new entry: When fetching the list: ZADD We use Redis sorted sets. adds an an entry into the sorted set whose value is , and whose score is an epoch timestamp or some other sufficiently precise and monotonic source. If an entry with an existing is added, its score is updated. This has the effect of removing the entry from the list and appending it to the front. ZREMRANGEBYRANK We then trim the sorted set by some specified length. This step is optional, and can be tweaked to your requirements. If performance is a concern, this command can also be run probabilistically (flip a coin, trim the set if heads). EXPIRE We set a TTL on the entire sorted set. This step is also optional. ZREVRANGE We fetch the entire sorted set in reverse order in the read path. This can be tweaked to have a limit. The trimming can also be done in the read path instead of the write path, although this may consume a lot more space if writes are proportionately higher than reads (as it was in my case). Each series of operations is idempotent, which is a property that can be exploited to simplify implementation. Note, however, that operations are not pair-wise commutative. An entry with a lower timestamp will override an entry with a higher timestamp. Memory usage Some back of the envelope calculations: Assuming a key format like and a page id with 9 characters, and extrapolating from the numbers above, a sorted set with 30 entries occupies (69-12) + (30 * 12) + 12 = 429 bytes. This seems to correspond with my numbers in production so far (μ = 225 bytes).

How Ruby Classes Are Implemented


Previous post: How Ruby Objects are Implemented Ruby classes are represented by a structure. It’s a fairly big structure, so I’ve broken it down according to what a Ruby class actually does and what it contains. A Ruby class is a Ruby object that also contains method definitions, attribute names, a superclass pointer, and a constants table. Pat Shaughnessy A Ruby class contains method definitions. Method definitions are stored in a method table. In Fruit, this will be a table that contains the keys , , , and , and whose values are pointers to the actual method definitions and their YARV instructions. A Ruby class contains instance-level attribute names. As mentioned in the previous post, a contains a pointer which is a hash table that maps attribute names to their positions in the array. In Fruit, this will be a hash table that contains the keys and , and whose values are their indices in the array. A Ruby class is also a Ruby object. If you print out Fruit’s class, it shows Fruit as being an instance of the class. A Ruby class is also a Ruby object. As a Ruby object, a Ruby class will also have: A class pointer that references the of its class (and is stored, like any other , in a structure) Its own methods (called class methods - those that you define with ). These methods are not stored in the class itself, but in its metaclass, or sometimes called the singleton class. This singleton class is what is set to, not Class. A table of its own class-level instance variables called A Ruby class has a pointer to a superclass, which allows it implement inheritance. A Ruby class will have a pointer that references its superclass. In this case, Pear will have a pointer referencing Class, and a pointer referencing Fruit. There is a dinstinction between the two - refers to the class from which the Pear class object is instantiated, whereas refers to Pear’s logical superclass. The Pear class object is not instantiated from Fruit, but Class. A Ruby class contains constants. A Ruby class can contain constants. Constants are stored in a constants table. Ruby classes are also objects. When a class is created, Ruby actually creates two objects - the class object itself, as well as a metaclass object. Class Variables In Ruby, there are class variables, which are dintinct from class-level instance variables, as mentioned above. Class-level instance variables are defined with a single prepended to the variable name and are scoped to each individual class…

How Closures Actually Work


Like objects, closures are a mechanism for containing state. In JavaScript, a closure is created whenever a function accesses a variable defined outside the immediate function scope. It’s easy to create closures: Simply define a function inside another function, and expose the inner function, either by returning it, or passing it into another function. The variables used by the inner function will be available to it, even after the outer function has finished running. Eric Elliott, The Two Pillars of JavaScript — Pt 2: Functional Programming Most online articles on closures start with a definition that resembles something like the above. It’s a description of its behaviour and how to create one. What most of them (read: all of them) fail to do is explain how closures are actually implemented, and then, why they behave the way they do. I believe this is important for truly understanding closures, and eventually, why closures are as important as they are. This is a post on how closures are implemented in Ruby, and it is directly inspired from Pat Shaughnessy’s excellent book Ruby Under a Microscope. I also wrote 2 other posts as notes while going the book: How Ruby Classes Are Implemented and How Ruby Objects Are Implemented. A closure is a data structure that contains a lambda expression, which is a function that takes a set of arguments, and an environment to be used when the lambda expression is invoked. Closures in Ruby can be created in a few different ways. Here, a closure is created using a block: When the first line is executed, Ruby first creates a structure representing the string on the heap, and then pushes a reference to the onto its internal stack, in the current stack frame. At the same time, the current on the Ruby call stack also has a pointer (EP stands for Environment Pointer) referencing the current stack frame: When the second line is executed, the Ruby tokenizer has already determined that a block is present to be passed to the method as an argument. YARV pushes a C structure representing the block, called , onto the call stack. contains a few things: is the value the self pointer in the block’s environment. refers to the class of the current object. is identical to the pointer found in the preceding , which references the current internal stack frame. is a pointer to the compiled YARV instructions that correspond to the Ruby code inside the block, which in this case is: If we disassemble the line above, we can see the…