Deploying a Load-Balanced To-Do List App with Docker, AWS, and Nginx

The following is a very indulgent exercise (I wanted to spend some more time working with AWS, Docker, and DevOps in general).

Background:

Let’s imagine a world in which everyone was using pen-and-paper (and maybe sticky notes) to keep track of their to-do lists but no one had thought to write an app to do that.

But you’re a smart guy/gal and decide to build an app to do just that. After a couple hours, you have a pretty barebones to-do list running locally on your Macbook Air. When you’re done with a task, you can check it off, or you can delete it. There’s no database (the app relies on localStorage in the user’s browser). You haven’t deployed it yet – it only exists on your local machine and in a private GitHub repo.

Screen Shot 2016-05-17 at 5.02.01 PM

At some point, a buddy of yours sees you using this to-do list — let’s call it Tudoo. He wants to use it, too. So you end up deploying Tudoo, serving it up on a single machine. Now your friend (and the rest of the office) can use it. This works for about a week.

Word quickly spreads, and now you’re getting thousands upon thousands of users simultaneously going to Tudoo. Your single machine deploy can’t handle the load of that much traffic. There’s a lot of downtime. People are getting annoyed.

Screen Shot 2016-05-17 at 5.07.15 PM

But this is good – this means there is high demand for what you’ve built. You may need to start thinking more long-term with your development goals. How do you allow for more concurrent connections to your site? How do you continue to build without worrying that what worked in development might not work in production? Where do you get started with all this if you’re a young developer?

Roadmap:

In the following post, I’ll take some basic steps to resolving some of the issues above (or at least start resolving them). These steps will generally look like this :

  1. Setting up development on an EC2 instance
  2. Dockerizing the app
  3. Deploying the app to a load-balanced cluster of machines
  4. Stress-testing the app

Of course, I’m not saying this is the 100% correct way to do this, but it is one way I’ve seen it done for apps operating at a much larger scale than Tudoo. You wouldn’t actually do something like this for something as trivial as a to-do list (unless, of course, you needed to).

I chose to develop on a remote EC2 instance to maintain a consistent, predictable development environment, and so I wouldn’t have to use up any of the disk space of my local machine, but you definitely don’t have to do this. (I just wanted to get familiar with doing things this way).

We’re containerizing our app using Docker so that we don’t run into any incompatibilities from development to production. If you’re on a Mac, installing Docker Toolbox sets you up with a lightweight Docker Virtual Machine, Docker daemon+client, and Docker-Compose. Since I’m developing on an Amazon Linux machine, things will be slightly different.

And we’re load-balancing our app with an Nginx reverse proxy so that all traffic doesn’t end up hitting just one server (and thus overwhelming it).

Taking these measures will ultimately speed up further development and allow us to extend Tudoo’s functionality more reliably.

OK, let’s begin.

Developing on an EC2 Instance

Reasoning

To reiterate, this part is overkill — I just wanted to get practice doing this. Why is it overkill? Because the very reason someone would actually do this is basically taken care of by using Docker. Docker virtualizes the operating system so that, regardless of whether your local development machine is a Mac or Windows or whatever, the container in which your app runs sits atop a Linux distro.

Some valid reasons why someone would use a remote dev machine, even if they’ve taken care of the above by using Docker?

  • Now your local computer doesn’t eat up any more of the hard drive installing global dependencies! You just install everything on that remote EC2 instance.
  • You also now happen to have backups in case anything goes weird. Your source code is on Github, your environment is on AWS, your images are on Docker Hub, and you have local versions of the code on your local machine if you so choose to keep it locally.
  • I’ve spoken to freelancers who use this setup when working with multiple clients with varying tech needs.

Launching an EC2 instance

(I’m assuming you already have a non-root user set up in Identity & Access Management aka IAM. If you don’t, check out this article on how to do that!).

(1) Log in to your non-root admin user account’s AWS console.(https://console.aws.amazon.com)

Screen Shot 2016-05-20 at 5.18.52 PM

(2) Set up a security group with rules that expose ports 22 (SSH), 80 (HTTP), 443 (HTTPS), and 8080 (custom TCP rule). This security group, when associated with an EC2 instance, will let AWS know to allow inbound requests to those ports. Security Groups are located in the Network/Security section of the EC2 Management console.

Screen Shot 2016-05-20 at 5.24.07 PM

(3) Configure a new EC2 instance to run a t2.micro Amazon Linux AMI box, associating with it the security group you created in step 2. You can start the process of launching a new instance by going to the Instances section of the EC2 Management console.

Screen Shot 2016-05-20 at 5.26.53 PM

Screen Shot 2016-05-20 at 5.27.55 PM

Screen Shot 2016-05-20 at 5.28.49 PM

(4) Launch the instance, making sure to associate your non-root user’s key-value pair. Take note of your instance’s IP address and/or public DNS! These will come in handy later when preparing to work off of the instance. Now you have an EC2 instance launched (it might take a couple minutes for it to get up and running)!

create-key-pair
From Yevgeniy Brikman’s awesome blog

(5) SSH into your EC2 instance from your terminal.

/*
assuming this is where you saved your .pem file
when you created a non-root user in IAM.
*/

cd ~/.ssh

/*
grant 'Read by Owner' access privileges on the .pem file
*/

chmod 400 YOUR_AWS_CREDENTIALS_FILENAME.pem

/*
SSH into your EC2 instance by passing your .pem file
in as an identity file with which to authenticate the connection.
Now is the time to go dig up your EC2 instance's
IP address or public DNS 
*/

ssh -i YOUR_AWS_CREDENTIALS.pem ec2-user@YOUR_EC2_IP_ADDRESS

After entering that last command you should see something like this:

Screen Shot 2016-05-20 at 5.50.42 PM

(6) Install dependencies on your EC2 instance and set up proper user privileging.

/*
YUM is a Red Hat Linux-based package manager
(similar to apt-get on Ubuntu). We'll be installing
emacs, git, and docker, then firing up the Docker service
*/
sudo yum update -y
sudo yum install -y emacs
sudo yum install -y git
sudo yum install -y docker
sudo service docker start

/*
The following command appends ec2-user as an
authorized user in the login group for docker
*/

sudo usermod -a -G docker ec2-user

/*
Now exit! This will end your connection to your EC2 instance.
*

exit 

 

Dockerizing the App

Again, the above sequence is not necessary at all, so I won’t focus too much on it moving forward. But if you were ever interested or are planning on developing this way, now you know 🙂

If you’re developing on your local Mac, make sure that you have the Docker Toolbox installed and that you know how to start up your Docker Machine. If not, here’s a useful resource to figure out how.

If you’re working on a remote EC2 instance, I assume you know your way around emacs. If not, here’s a good place to start.

Reasoning for Docker

docker-containers-vms

Docker takes advantage of Linux containers, virtualizing the operating system and thus providing a much more lightweight way to simulate multiple app instances running on one machine. For more on this, here’s a great resource for beginners to Docker.

What I’ve found particularly awesome about Docker is that, in development, you can run multiple containers that communicate with each other (via Docker networking).

So if you wanted to load-balance your app, putting 2 app instances behind an Nginx reverse proxy, you can set up 3 different containers (1 container running Nginx and 2 containers running instances of your app).

You can now test a nice simulation of the very infrastructure you’d be using in production while in development. And then when preparing this load-balanced setup for production, you don’t have to change much. (At least, as far as I’ve encountered.)

But I’m getting ahead of myself. Let’s get started on building a Docker image, running a Docker image, and then pushing that Docker image up to Docker Hub.

Building a Docker Image

A Docker image is basically a blueprint for how you want your container to be built. To build an image, we first have to “sketch” the image using a Dockerfile. This is a good example of “Infrastructure as Code” — you have a DevOps-related setup file that gets parsed and executed, just like any other script.

Our first task is to build a Docker image for our NodeJS app. When we run this Docker image, we want it to be akin to typing “npm start” or “node server.js” as we usually do it (except our app will be running in a container).

(1) Review app’s directory structure

Just so we’re on the same page, my files are organized as such:

todo-app
-- node
   -- public (folder where all my assets/scripts go)
   -- server.js
   -- package.json
-- nginx
   -- nginx.conf (config file for nginx)
-- .dockerignore (similar to gitignore)
-- .gitignore

(2) Create a Dockerfile in the node directory

/*
Make sure you're in the node directory,
then create a Dockerfile for node.
You can just call it 'Dockerfile', but
I like giving it a more descriptive name
so that I don't get confused later
*/

cd node
touch node.dockerfile

(3) Open this Dockerfile in your favorite text-editor and get to coding.

Screen Shot 2016-05-20 at 6.27.43 PM

Let me walk through this.

FROM signifies which existing image to set as the base for everything else that follows. In this case, I’m saying “node:argon”, which is the Long-Term Support release of NodeJS, but you can also do “node:latest” or any specific release you’d like.

RUN is self-explanatory. This Docker image, when run, will create a directory called usr/src/app, and then set that directory as the working directory WORKDIR for the container.

COPY copies my package.json into my to-be-built container’s working directory, after which we RUN npm install, which installs all our dependencies in the container. Neato.

COPY then copies all of our app data in node/ to the working dir (usr/src/app) of the container.

EXPOSE (you guessed it) exposes the container’s own port 8080 for inbound requests.

Finally, CMD tells the container what to do once the image has finished creating the container. In this case, we just fire up our server with npm start.

To recap, the Dockerfile is an instruction set. When we build from the Dockerfile, we get a full-fledged blueprint called a Docker image. Then when we run that image, Docker creates a container and runs our app in it according to the blueprint that is the Docker image.

Dockerfile to build images_0

(3) Build a Docker image from the Node app’s Dockerfile!

/*
Make sure you're still in the node app's directory.
The -t flag tags your Dockerfile with a particular name.
I suggest naming it your-dockerhub-name/todo-app.

That dot at the very end is important! It is the "context" 
for your Docker image. Which means, in your Dockerfile, 
where are all the paths you're relying on relative to? 
In this case, just our current node/ directory.

(apologies for the multiple lines, it should all be one command)
*/

docker build -f node.dockerfile \
-t <user-namespace/image-name> .

/*
You should now see this image listed by your Docker client
*/

docker images

(4) Running a Docker image

/*
Now to run the image we've created, we do three things:

1) Flag it to run in the background (as a daemon '-d')
2) Configure it with flag '-p' to accept requests on the 
   host's port 80 (the HTTP port in the EC2 instance's Security 
   Group) and the container's port 8080 (the container's port
   that we exposed in the Dockerfile for our app).
3) Specify which image to run.

*/

docker run -d -p 80:8080 <user-namespace/image-name>

/*
We should see this running!
*/

docker ps

/*
And this isn't just a theoretical "running". Navigate to
YOUR_IP_ADDRESS:80 in Google Chrome, and your app will be live.
If you're on Mac/Windows and aren't sure what your Docker IP is,
type docker-machine ip to check. If you're developing on an EC2
instance, this is the IP address you used to SSH in.
*/

/*
If you want to see any more interesting data on your container,
you can run any of the commands followed by the first few letters
of the running container's ID (which can be found with 'docker ps')
*/

docker logs <ID> // displays server logs
docker inspect <ID> // displays info like location of container's volume

(5) Tearing down a Docker container

/*
You've had enough and want to close up the container? Use the first
few letters of the container's ID with the commands below.
*/

docker stop <ID>

docker ps // should show nothing
docker ps -a // should show container we just stopped

docker rm <ID> // kills container

docker ps -a // now should show nothing

(6) Pushing an image up to DockerHub

The added utility of Docker is that they have their own GitHub-like service. Your custom Docker images become accessible from anywhere once you've built them and pushed them up. Just as how GitHub provides a single source of truth for version-controlled code repositories, so does DockerHub for version-controlled image repositories.

This is huge for a couple reasons.

One, you can push new layers of your Docker image up, and anyone on your team can now pull the latest images down.

Second, you can pull down the latest images of anything, ranging from Ubuntu and Node to Nginx and any other convenient image that anyone has ever publicly published.

I assume you already have a DockerHub account. If not, go ahead and create one. Then follow along with the code below.

/*
Login and push your image up. Done.

You can now go up to hub.docker.com and see your image!

If you run into any issues, it's probably because you didn't
namespace your image to match your DockerHub account name. In
which case rebuild your Docker image with the appropriate
namespace and try this again.
*/

docker login // type in username, password, and email
docker push <user-namespace>/<image-name>

/*
What if you don't want to keep the images on your machine? They can
take up some space (500+ MB).
*/

docker images // take note of first few letters of image's ID
docker rmi <ID> // deletes image
docker images // image should now be gone

/*
What if you're working on a new machine and don't have the image
locally? Now that it's on DockerHub, you can just pull it down.
*/

docker pull <user-namespace>/<image-name>

At this point, if you try to visit the app at the IP address:port address, it will no longer connect. We have stopped + killed our container.

Moving on to greener pastures, we are now going to start the process of deploying the app to a load-balanced cluster of machines.

Running Load-Balanced App Instances Locally

Reasoning

Short version: just to show you how we can do this using Docker and AWS. A todo list doesn't need this at all.

Longer version: because a large app with many users trying to simultaneously access the same resources will drop to its knees if there isn't a way to manage the load.

So these are the steps we'll take to get this app load-balanced properly. I'm using Nginx because it is a lightweight and super powerful reverse proxy server. You could've also used Amazon's ECS service that provides a managed load balancer (Elastic Load Balancer aka ELB).

  1. Set up Nginx configuration
  2. Write and build an image for a container to run the Nginx server
  3. Create two different containers that run separate instances of the to-do list app
  4. Create a container for the Nginx service and link it to the two app containers that are already running. This will put the load-balancing action in motion.

(1) Set up Nginx config file

Screen Shot 2016-05-21 at 1.17.18 AM.png

I won't go through all of this, but I'll cover the main points. I'm specifying an HTTP service with upstream nodes node1 and node2, each serving on port 8080. I'm also indicating that this Nginx server will listen on port 80.

In other words, this Nginx server will listen on port 80 for requests, then forward them to either 'node1' or 'node2', variables that will come in use later.

The rest of the information here is important too, but it is related to things like how to delegate requests across the upstream nodes, whether to gzip requests or do anything else with request/response headers.

(2) Write and build an image for a container to run the Nginx server

The next step is to build a Docker image that will set us up to run this Nginx server the way we need it to. So obviously we need a Dockerfile. Below, I've created one in the nginx/ directory called nginx.dockerfile:

Screen Shot 2016-05-21 at 1.24.02 AM.png

This should be pretty straightforward.

FROM sets the base image as nginx, which is pulled locally or down from DockerHub.

RUN creates space for error log files in the container.

COPY copies the config file over to a location the container expects it to be in.

EXPOSE exposes container port 80, making it available for inbound requests.

Next step is to build an image for our Nginx server:

cd nginx // if not already in that folder yet

/*
The below should all be one continuous command. 
Don't forget that trailing dot to set the context
for the image to build from.
*/

docker build -f nginx.dockerfile \
-t <user-namespace>/my-nginx .

docker images // your-name/my-nginx should be there now

docker push <user-namespace>/my-nginx // let's get this up to DockerHub

(3) Run the load-balancer against two app instances on your development machine

/*
First, we run two separate instances of the to-do NodeJS app.
By tagging these containers with specific names, we can refer
to them later. I didn't specify an external port, just the
port I'm publishing for the container itself (8080).

Two things to note: 
- the app names are the same as what we have in our nginx config file
- the ports for these 'upstream nodes' is also the same
*/

docker run -d --name node1 -p 8080 <name-of-app-image>
docker run -d --name node2 -p 8080 <name-of-app-image>

docker ps // these two nodes should be running

/*
Notice what the below command does:
- Publish the nginx server to listen to the external port 80
  and the container's port 80
- Using Docker bridge networking, link the image's 'node1'
  env variable and resolve it with the alias to our 'node1'
  running container. Do the same for 'node2'.
*/

docker run -d --name nginx -p 80:80 --link node1:node1 \
node2:node2 <name-of-nginx-image>

docker ps // should see 3 running containers: node1, node2, nginx

If all went as planned, you can go to the IP address of where you're developing (either your dev EC2's IP address or your Docker Machine's IP) and see the app running on port 80!

All HTTP requests to the local machine's port 80 deals with the Nginx reverse proxy, which then passes the request off to either node1 or node2. To see how this works, a fun thing you can do is to make node2 serve a completely different app than node1 and see that sometimes the Nginx will forward you to node1 and other times to node2.

Screen Shot 2016-05-17 at 5.19.20 PM

This is all happening on the same machine, but it appears as if the Nginx reverse proxy is forwarding requests to two different machines (even though they are two containers running on the same machine).

Note: It was pretty annoying to have to manually run each of those containers. Docker Toolbox ships with something called Docker Compose that actually is kind of like a meta-Dockerfile. It abstracts all these individual commands into one file. I chose not to use it here, but it's definitely far more productive to use Docker Compose, which lets us do the above in a simple 'docker-compose up' and 'docker-compose down'.

Last step is to just stop and kill these running containers.

docker ps // take note of first few letters of all IDs
docker stop <nginx-id> <node1-id> <node2-id> // stops all three

docker ps // should see nothing
docker ps -a // should see node1, node2, and nginx recently stopped

docker rm <nginx-id> <node1-id> <node2-d> // kills all three
docker ps -a // should see nothing

Now when we actually want to deploy this across a cluster of different machines, this same abstraction doesn't have to be changed much.

Deploying the Load-Balanced App to AWS Using Docker Cloud

Because we've pushed our Docker images up to DockerHub, we're going to use this great tool called Docker Cloud. Docker Cloud allows you to provision and deploy EC2 (or the Azure flavor of EC2) instances with your Docker images running in containers on them.

This part is fairly straightforward thanks to Docker Cloud's awesome UI.

(1) Go to cloud.docker.com (log in if you haven't yet)

Screen Shot 2016-05-21 at 11.56.30 AM

(2) Go to 'Cloud Settings' and input your AWS credentials under Cloud Providers.

Screen Shot 2016-05-21 at 11.59.55 AM

(3) Launch a new 'Node Cluster'. Name it, pick your provider + instance type, and choose to launch 3 nodes (1 for Nginx, 2 for the app instances)

Screen Shot 2016-05-21 at 12.02.41 PM

(4) Create & deploy new 'Stack' of services. This is Docker Cloud's version of Docker Compose! It defines which services we want to run on our containers and deploys them across available nodes in our node clusters.

Screen Shot 2016-05-21 at 12.04.44 PM

Here's the Stackfile we need to add here.

Screen Shot 2016-05-21 at 12.05.49 PM.png

I am defining aliases for three services -- nginx, node1, and node2. For each, I specify which Docker image to associate with it, as well as ports to publish. I also link the nginx service to the two app services.

Looks vaguely like these commands we typed in manually a few steps ago:

docker run -d --name node1 -p 8080 <name-of-app-image>
docker run -d --name node2 -p 8080 <name-of-app-image>

docker run -d --name nginx -p 80:80 --link node1:node1 \
node2:node2 <name-of-nginx-image>

The Stackfile handles the above code for us in a more automated fashion.

Go ahead and create the Stack & deploy it. After a couple minutes, you'll see that our stack of services has been deployed across the available nodes in the cluster we defined earlier.

Screen Shot 2016-05-21 at 12.22.44 PM

If you scroll down, you'll see an endpoint for your nginx service. Go visit it. This is what I see now:

Screen Shot 2016-05-21 at 12.27.11 PM

Woohoo! We have successfully deployed our todo list that is being load balanced by an Nginx service.

Screen Shot 2016-05-17 at 5.09.42 PM

The advantage of this is that, as our needs grow, our infrastructure can easily grow with it. Using Docker Cloud, we can easily scale up and scale down the number of nodes deployed, or we can even set up a cluster of nodes in a different regions to allow for better availability.

The next steps here would be to use a custom domain name, but I think this is a good stopping point.

Conclusion

This is by no means comprehensive, and I've definitely flubbed some explanations above, but I hope this helps get you get over the "deployment jitters" and return your focus back on building great stuff.

Advertisements

Recursive Solutions to Flattening an Array

Given a nested array of integers, write a function that returns a flattened array – all nesting is removed, each element is an integer, and order is maintained.

Example:

let nested = [1, [10, 12, 1], [4, [2, 0]], 30];
let flattened = flatten(nested);
console.log(flattened) // [1, 10, 12, 1, 4, 2, 0, 30]

The problem here doesn’t require anything crazy. It does, however, elicit a discussion on how to generally approach problem solving. Successfully solving a problem using recursion usually requires some of this upfront (as opposed to playing around with a while-block and pointers until you get what you want).

First, what are the cases? What could the inputs look like, given an input nested array of integers?

[] // empty
[3] // one-element array (int)
[[1,2,3]] // one-element array (another array)
[1, 2, 3, 4, 5] // array of ints
[1, 2, [3,4,5]] // mix of ints and arrays of ints
[1, 2, [3, [4,5]], 6] // arrays within arrays within arrays

Second, what’s the roadblock?

The complication arises when encountering arrays within arrays (within arrays). If we were dealing with just integers, this would be easy (and pointless) because the input would already be flat.

Third, given the cases and the roadblock, how do we break up this “flattening” problem into smaller subproblems?

Each element in the nested array is either an integer or another array. If its an integer, we just need to add it to our flattened array. If it’s an array, though, we need to find a way to “extract” its integer elements and append them to our output array.

So that’s how our problem gets subdivided: as we go through the input array, if the element is an integer, we add it to our output array. If the element is another array, though, we repeat this same procedure locally on the subarray.

Basic Recursive Solution

This conversational routine forms the basis of a recursive solution.

flattenRecursive

If the current input is just a number, the function returns that number as the sole element of an array.

If the input is an array and has no length, the function returns an empty array.

Otherwise, the function concatenates the return value of a recursive call on the “tail” of the array (list.slice(1)) onto a recursive call on the “head” of the array (list[0]).

Basic Recursion – Pros and Cons

This is just one solution. It’s pretty transparent in how it handles the cases and roadblocks we established. But this solution, depending on the Javascript engine/optimizer being used, may not be the most memory-efficient or time-efficient.

For instance, each time Array.prototype.concat() is called, a new array is created; depending on the browser or environment, concat() and slice() may or may not be optimized.

Additionally, this solution is irksome because it makes two recursive calls per one function call, and both depend on each others return values, which can make recursive calls and allocate new stack frames on the order of 2^n as a worst case.

Tangentially, the recursive call is not in tail position. Tail-positioned recursion is when the return value of that call is the final desired return value (aka we don’t need to otherwise add/divide/multiply/concat/transform it after getting the return value).

In modern Javascript (ES6) and modern Javascript engines, this is important because of tail-call optimization (TCO) being baked in. That is, when the recursive function call is in tail position, TCO ensures that new stack frames aren’t being allocated for every single call because a tail-positioned call just needs to calculate a return value and that’s it. This optimization is huge because now we can express ourselves in Javascript using recursion without having it blow up the call stack.

Phew, hopefully I didn’t botch anything up too badly in that explanation. The great part is that, if we aren’t happy with this solution, since we’ve used recursive thinking to really understand the problem, we won’t have much trouble trying out different approaches to solving it.

Divide-and-Conquer Solution

Another solution to consider takes the divide-and-conquer approach. D&C solves problems by first breaking up the problem into a set of smaller subproblems, then solving and “conquering” each of those back up into one merged solution.

Implementation-wise, this is very similar to the way mergeSort works. You divide the input array into left and right subarrays, and then break each of those down into their own left and right subarrays, until they are at the size we want (either empty or with one element in them).

Once we get to that point, we merge the left with the right (concatenating), then concatenating that merged array onto its complementary right array, and so on, until all of these subarrays have been merged back up into one flattened array.

flattenDC

Let’s walk through the time complexity of this solution.

Divide: Flatten gets called as many times as there are integers in our flattened array. This number (k) is unknown at call time, as we don’t know how deeply nested this input array is.

For many divide-and-conquer algorithms, division works as a function of the number of elements in the input (n), usually at O(log n) because at each step, the array gets divided into two (n, n/2, n/4, n/8, etc). If the input array isn’t deeply nested, we get closer to O(log n) as we can then determine time complexity in terms of input size.

Conquer: This portion takes a time complexity of O(k), as we are simply concatenating k complementary subsections until they merge into 1 array.

Iterative Solution:

While we’re at it, let’s see what a less “recursive” solution would look like.

This time, we’ll use a for-loop to take care of most of the iterating across the array. The only time we’ll make recursive calls is if we run into an element that is not a ‘number’ primitive.

flattenLoop

This reads very nicely, and we only make recursive calls if we absolutely have to. Otherwise, we simply push the current element onto the result array, which is an O(1) operation.

I like the reducer best though:

flattenReduce

There are many ways to solve this simple problem, each with its pros and cons (and I definitely missed out on some other good ones!). Solving small problems like this in many different ways is reminiscent of that phrase “Make it work, make it right, make it fast.” The most important piece is understanding the problem and coming up with something that gets the job done, even if it’s suboptimal, ugly, or hard to follow. Then, as your needs for its performance increase, the need to refactor the solution also increases.

Madlibs and Thomas Piketty

“To put it bluntly, the discipline of economics has yet to get over its childish passion for mathematics and for purely theoretical and often highly ideological speculation, at the expense of historical research and collaboration with the other social sciences. Economists are all too often preoccupied with petty mathematical problems of interest only to themselves.” – Thomas Piketty

Let’s reimagine this as a madlib:

To put it bluntly, the discipline of __________ has yet to get over its childish passion for ___________ and for purely theoretical and often highly ideological speculation, at the expense of historical research and collaboration with the other social sciences. __________ are all too often preoccupied with petty ____________ of interest only to themselves.

Fill in the blanks with “marketing” “social media” “political science” “innovation” “branding” etc

Differentiation

Paraphrasing a cool thought I recently heard:

Products that are extremely similar in quality compete on brand. Think beer and soda. Corona v. Coors v. your-favorite-craft-brew. Coke v. Pepsi.

Products that have truly unique value propositions don’t have the same need to compete on brand. The product itself is the brand. Think Apple v. Samsung.

Connection

How do we get people to care?

Meeting people at the intersection of “How does this very concretely affect me?” and “How does this change my conception of how the world works?” Very tricky to strike this balance.

It’s easy to find the concrete effects — just look at BuzzFeed articles like “25 signs that you’re a ’90s kid” or blog posts like “25 proven tactics to xyz”.

It’s easy to find the paradigm shifters — read anything by Daniel Kahneman or Nassim Taleb.

Concrete effects are excessively accessible. But because they are immensely accessible, they probably don’t have enduring value. Quick popcorn bites.

Paradigm shifters are much less accessible — most of them are locked up in time-intensive books and experiences.

Finding a way to meet both of these halfway may be the key to really get people to care. Most “concrete effects” pieces highly underestimate how smart people are, while most “paradigm shifters” underestimate how scatterbrained people are.

Guidelines and Trimming the Hedges

I was recently reminded of scene from a pretty average movie (Stealing Harvard) that shouldn’t have been as funny as I found it, but it still cracks me up.

John and his goofy landscaper friend Duff are chatting. Duff has a rope pulled taut across the top of a row of hedges he’s trimming. Presumably to guide him as he trims them.

Duff instead goes to town on the hedges with the landscape world’s equivalent of a chainsaw.

John: Uh, Duff? Don’t you think you’re taking too much off the top?
Duff: Look, John, I know what I’m doing– I have to taper it so the sunlight will reach the lower leaves during the growing season.
John: Then what about this string?
Duff: The string is a guide, John. It’s just a guide.

Revisiting this scene reminded me of the strange “guides” that steer our own decision-making. Some of which we adhere to too strictly, others of which we veer from too often. There’s a great thread on Quora that addresses a similar question of guidelines and frameworks. Writer Venkatesh Rao provided the following as a decent cross-section of principles to consider as worthwhile guides:

  1. “Premature optimization is the root of all evil” — Knuth, computer science

  2. “Follow the information” or “what are the primitive random variables?” — information theory, control theory, statistics

  3. Procrastination principle: “Most problems confronting a network can be solved later by others… don’t do anything that can be done later by users.” (an idea from a 1984 paper by Clark, David Reed and Jerry Saltzer).

  4. Never design a law with the worst case in mind — law/legislation

  5. Release early and often — software engineering

  6. Think aspirin, not vitamins — marketing

  7. Start with the simplest problem that you don’t know how to solve — general advice in technical PhDs

  8. Start with a contradiction: “X but also Y” where X and Y are in conflict: many artistic fields such as script writing (this is a useful method for creating character-driven plots, by defining the central tension that drives a character for example)