JAVASCRIPT CHAPTER 3 2018-12-19T12:49:11+00:00

JAVASCRIPT  Chapter 3

Topics:- ( Strict mode, Testing, Debugging, Error propagation, Exceptions, Selective catching, Assertions, Regular expressions, Matches and groups, The Date class, Choice pattern, Backtracking, Greed, Search method, Parsing an INI file, International characters, Modules, Packages,  CommonJS, Module design, Asynchronous programming , Promises, Failure,  Network flooding, Async functions, Generators, Asynchronous bugs)

Bugs and Errors

Flaws in computer programs are usually called bugs. It makes programmers feel good to imagine them as little things that just happen to crawl into our work. In reality, of course, we put them there ourselves. If a program is crystallized thought, you can roughly categorize bugs into those caused by the thoughts being confused and those caused by mistakes introduced while converting a thought to code. The former type is generally harder to diagnose and fix than the latter.

Language

Many mistakes could be pointed out to us automatically by the computer, if it knew enough about what we’re trying to do. But here JavaScript’s looseness is a hindrance. Its concept of bindings and properties is vague enough that it will rarely catch typos before actually running the program. And even then, it allows you to do some clearly nonsensical things without complaint, such as computing true * “monkey”There are some things that JavaScript does complain about. Writing a pro-gram that does not follow the language’s grammar will immediately make the computer complain. Other things, such as calling something that’s not a function or looking up a property on an undefined value, will cause an error to be reported when the program tries to perform the action.

But often, your nonsense computation will merely produce NaN (not a number) or an undefined value, while the program happily continues, convinced that it’s doing something meaningful. The mistake will manifest itself only later, after the bogus value has traveled through several functions. It might not trigger an error at all but silently cause the program’s output to be wrong. Finding the source of such problems can be difficult. The process of finding mistakes—bugs—in programs is called debugging.

Strict mode

JavaScript can be made a little stricter by enabling strict mode. This is done by putting the string “use strict” at the top of a file or a function body. Here’s an example:

function canYouSpotTheProblem() {

“use strict”;

for (counter = 0; counter < 10; counter++) {

console.log(“Happy happy”);

    }

}

canYouSpotTheProblem();

// → ReferenceError: counter is not defined

Normally, when you forget to put let in front of your binding, as with counter in the example, JavaScript quietly creates a global binding and uses that. In strict mode, an error is reported instead. This is very helpful. It should be noted, though, that this doesn’t work when the binding in question already exists as a global binding. In that case, the loop will still quietly overwrite the value of the binding.

Another change in strict mode is that the this binding holds the value undefined in functions that are not called as methods. When making such a call outside of strict mode, this refers to the global scope object, which is an object whose properties are the global bindings. So if you accidentally call a method or constructor incorrectly in strict mode, JavaScript will produce an error as soon as it tries to read something from this, rather than happily writing to the global scope. For example, consider the following code, which calls a constructor function without the new keyword so that its this will not refer to a newly constructed object:

function Person(name) { this.name = name; }

let ferdinand = Person(“Ferdinand”); // oops

console.log(name);

// → Ferdinand

So the bogus call to Person succeeded but returned an undefined value and created the global binding name. In strict mode, the result is different.

“use strict”;

function Person(name) { this.name = name; }

let ferdinand = Person(“Ferdinand”); // forgot new

// → TypeError: Cannot set property ‘name’ of undefined

We are immediately told that something is wrong. This is helpful. Fortunately, constructors created with the class notation will always complain if they are called without new, making this less of a problem even in non-strict mode. Strict mode does a few more things. It disallows giving a function multiple parameters with the same name and removes certain problematic language features entirelyIn short, putting “use strict” at the top of your program rarely hurts and might help you spot a problem.

Types

Some languages want to know the types of all your bindings and expressions before even running a program. They will tell you right away when a type is used in an inconsistent way. JavaScript considers types only when actually running the program, and even there often tries to implicitly convert values to the type it expects, so it’s not much help. Still, types provide a useful framework for talking about programs. A lot of mistakes come from being confused about the kind of value that goes into or comes out of a function. If you have that information written down, you’re less likely to get confused. You could add a comment like the following before the goalOrientedRobot function from the previous chapter to describe its type:

// (VillageState, Array) → {direction: string, memory: Array}

function goalOrientedRobot(state, memory) {

// …

}

There are a number of different conventions for annotating JavaScript programs with types. One thing about types is that they need to introduce their own complexity to be able to describe enough code to be useful. What do you think would be the type of the randomPick function that returns a random element from an array? You’d need to introduce a type variable, T, which can stand in for any type, so that you can give randomPick a type like ([T])→ T (function from an array of Ts to a T).

When the types of a program are known, it is possible for the computer to check them for you, pointing out mistakes before the program is run. There are several JavaScript dialects that add types to the language and check them. The most popular one is called TypeScript. If you are interested in adding more rigor to your programs, we recommend you give it a try. We’ll continue using raw, dangerous, untyped JavaScript code.

Testing

If the language is not going to do much to help us find mistakes, we’ll have to find them the hard way: by running the program and seeing whether it does the right thing. Doing this by hand, again and again, is a really bad idea. Not only is it annoying, it also tends to be ineffective since it takes too much time to exhaustively test everything every time you make a change.

Computers are good at repetitive tasks, and testing is the ideal repetitive task. Automated testing is the process of writing a program that tests another program. Writing tests is a bit more work than testing manually, but once you’ve done it, you gain a kind of superpower: it takes you only a few seconds to verify that your program still behaves properly in all the situations you wrote tests for. When you break something, you’ll immediately notice, rather than randomly running into it at some later time. Tests usually take the form of little labeled programs that verify some aspect of your code. For example, a set of tests for the (standard, probably already tested by someone else) toUpperCase method might look like this:

function test(label, body) {

if (!body()) console.log(`Failed: ${label}`);

}

test(“convert Latin text to uppercase”, () => {

   return “hello”.toUpperCase() == “HELLO”;

});

test(“convert Greek text to uppercase”, () => {

    return “Χαίρετε”.toUpperCase() == “ΧΑΊΡΕΤΕ”;

});

test(“don’t convert case-less characters”, () => {

    return ”   “.toUpperCase() == ”   “;

});

Writing tests like this tends to produce rather repetitive, awkward code. Fortunately, there exist pieces of software that help you build and run collections of tests (test suites) by providing a language (in the form of functions and methods) suited to expressing tests and by outputting informative information when a test fails. These are usually called test runnersSome code is easier to test than other code. Generally, the more external objects that the code interacts with, the harder it is to set up the context in which to test it. The style of programming shown in the previous chapter, which uses self-contained persistent values rather than changing objects, tends to be easy to test.

Debugging

Once you notice there is something wrong with your program because it misbehaves or produces errors, the next step is to figure out what the problem is. Sometimes it is obvious. The error message will point at a specific line of your program, and if you look at the error description and that line of code, you can often see the problem. But not always. Sometimes the line that triggered the problem is simply the first place where a flaky value produced elsewhere gets used in an invalid way. If you have been solving the exercises in earlier chapters, you will probably have already experienced such situations.

The following example program tries to convert a whole number to a string in a given base (decimal, binary, and so on) by repeatedly picking out the last digit and then dividing the number to get rid of this digit. But the strange output that it currently produces suggests that it has a bug.

function numberToString(n, base = 10) {

  let result = “”, sign = “”;

  if (n < 0) {

    sign = “-“;

    n = -n;

}

do {

   result = String(n % base) + result;

   n /= base;

} while (n > 0);

return sign + result;

}

console.log(numberToString(13, 10));

// → 1.5e-3231.3e-3221.3e-3211.3e-3201.3e-3191.3e…-3181.3

Even if you see the problem already, pretend for a moment that you don’t. We know that our program is malfunctioning, and we want to find out why. This is where you must resist the urge to start making random changes to the code to see whether that makes it better. Instead, think. Analyze what is happening and come up with a theory of why it might be happening. Then, make additional observations to test this theory—or, if you don’t yet have a theory, make additional observations to help you come up with one.

Putting a few strategic console.log calls into the program is a good way to get additional information about what the program is doing. In this case, we want n to take the values 13, 1, and then 0. Let’s write out its value at the start of the loop.

13

1.3

0.13

0.013…

1.5e-323

Right. Dividing 13 by 10 does not produce a whole number. Instead of n /= base, what we actually want is n = Math.floor(n / base) so that the number is properly “shifted” to the right. An alternative to using console.log to peek into the program’s behavior is to use the debugger capabilities of your browser. Browsers come with the ability to set a breakpoint on a specific line of your code. When the execution of the program reaches a line with a breakpoint, it is paused, and you can inspect the values of bindings at that point. We won’t go into details, as debuggers differ from browser to browser, but look in your browser’s developer tools or search the Web for more information. Another way to set a breakpoint is to include a debugger statement (consisting of simply that keyword) in your program. If the developer tools of your browser are active, the program will pause whenever it reaches such a statement.

Error propagation

Not all problems can be prevented by the programmer, unfortunately. If your program communicates with the outside world in any way, it is possible to get malformed input, to become overloaded with work, or to have the network fail. If you’re programming only for yourself, you can afford to just ignore such problems until they occur. But if you build something that is going to be used by anybody else, you usually want the program to do better than just crash. Sometimes the right thing to do is take the bad input in stride and continue running. In other cases, it is better to report to the user what went wrong and then give up. But in either situation, the program has to actively do something in response to the problem.

Say you have a function promptInteger that asks the user for a whole number and returns it. What should it return if the user inputs “orange”? One option is to make it return a special value. Common choices for such values are null, undefined, or -1.

function promptNumber(question) {

   let result = Number(prompt(question));

   if (Number.isNaN(result)) return null;

   else return result;

}

console.log(promptNumber(“How many trees do you see?”));

Now any code that calls promptNumber must check whether an actual number was read and, failing that, must somehow recover—maybe by asking again or by filling in a default value. Or it could again return a special value to its caller to indicate that it failed to do what it was asked.

In many situations, mostly when errors are common and the caller should be explicitly taking them into account, returning a special value is a good way to indicate an error. It does, however, have its downsides. First, what if the function can already return every possible kind of value? In such a function, you’ll have to do something like wrap the result in an object to be able to distinguish success from failure.

function lastElement(array) {

    if (array.length == 0) {

         return {failed: true};

} else {

     return {element: array[array.length – 1]};

   }

}

The second issue with returning special values is that it can lead to awkward code. If a piece of code calls promptNumber 10 times, it has to check 10 times whether null was returned. And if its response to finding null is to simply return null itself, callers of the function will in turn have to check for it, and so on.

Exceptions

When a function cannot proceed normally, what we would like to do is just stop what we are doing and immediately jump to a place that knows how to handle the problem. This is what exception handling does.

Exceptions are a mechanism that makes it possible for code that runs into a problem to raise (or throw) an exception. An exception can be any value. Raising one somewhat resembles a super-charged return from a function: it jumps out of not just the current function but also its callers, all the way down to the first call that started the current execution. This is called unwinding the stack. You may remember the stack of function calls that was mentioned before. An exception zooms down this stack, throwing away all the call contexts it encounters.

If exceptions always zoomed right down to the bottom of the stack, they would not be of much use. They’d just provide a novel way to blow up your program. Their power lies in the fact that you can set “obstacles” along the stack to catch the exception as it is zooming down. Once you’ve caught an exception, you can do something with it to address the problem and then continue to run the program. Here’s an example:

function promptDirection(question) {

let result = prompt(question);

if (result.toLowerCase() == “left”) return “L”;

if (result.toLowerCase() == “right”) return “R”;

throw new Error(“Invalid direction: ” + result);

}

function look() {

if (promptDirection(“Which way?”) == “L”) {

return “a house”;

} else {

return “two angry bears”;

   }

}

try {

  console.log(“You see”, look());

} catch (error) {

console.log(“Something went wrong: ” + error);

}

The throw keyword is used to raise an exception. Catching one is done by wrapping a piece of code in a try block, followed by the keyword catch. When the code in the try block causes an exception to be raised, the catch block is evaluated, with the name in parentheses bound to the exception value. After the catch block finishes—or if the try block finishes without problems—the program proceeds beneath the entire try/catch statement.

In this case, we used the Error constructor to create our exception value. This is a standard JavaScript constructor that creates an object with a message property. In most JavaScript environments, instances of this constructor also gather information about the call stack that existed when the exception was created, a so-called stack trace. This information is stored in the stack property and can be helpful when trying to debug a problem: it tells us the function where the problem occurred and which functions made the failing call.

Note that the look function completely ignores the possibility that promptDirection might go wrong. This is the big advantage of exceptions: error-handling code is necessary only at the point where the error occurs and at the point where it is handled. The functions in between can forget all about it.

Cleaning up after exceptions

The effect of an exception is another kind of control flow. Every action that might cause an exception, which is pretty much every function call and property access, might cause control to suddenly leave your code. This means when code has several side effects, even if its “regular” control flow looks like they’ll always all happen, an exception might prevent some of them from taking place. Here is some really bad banking code.

const accounts = {

a: 100,

b: 0,

c: 20

};

function getAccount() {

let accountName = prompt(“Enter an account name”);

if (!accounts.hasOwnProperty(accountName)) {

    throw new Error(`No such account: ${accountName}`);

}

  return accountName;

}

function transfer(from, amount) {

   if (accounts[from] < amount) return;

   accounts[from] -= amount;

   accounts[getAccount()] += amount;

}

The transfer function transfers a sum of money from a given account to another, asking for the name of the other account in the process. If given an invalid account name, getAccount throws an exception. But transfer first removes the money from the account and then calls getAccount before it adds it to another account. If it is broken off by an exception at that point, it’ll just make the money disappear.

That code could have been written a little more intelligently, for example by calling getAccount before it starts moving money around. But often problems like this occur in more subtle ways. Even functions that don’t look like they will throw an exception might do so in exceptional circumstances or when they contain a programmer mistake.

One way to address this is to use fewer side effects. Again, a programming style that computes new values instead of changing existing data helps. If a piece of code stops running in the middle of creating a new value, no one ever sees the half-finished value, and there is no problem. But that isn’t always practical. So there is another feature that try statements have. They may be followed by a finally block either instead of or in addition to a catch block. A finally block says “no matter what happens, run this code after trying to run the code in the try block.”

function transfer(from, amount) {

if (accounts[from] < amount) return;

let progress = 0;

try {

accounts[from] -= amount;

progress = 1;

accounts[getAccount()] += amount;

progress = 2;

} finally {

if (progress == 1) {

accounts[from] += amount;

        }

   }

}

This version of the function tracks its progress, and if, when leaving, it notices that it was aborted at a point where it had created an inconsistent program state, it repairs the damage it did. Note that even though the finally code is run when an exception is thrown in the try block, it does not interfere with the exception. After the finally block runs, the stack continues unwinding. Writing programs that operate reliably even when exceptions pop up in unexpected places is hard. Many people simply don’t bother, and because exceptions are typically reserved for exceptional circumstances, the problem may occur so rarely that it is never even noticed. Whether that is a good thing or a really bad thing depends on how much damage the software will do when it fails.

Selective catching

When an exception makes it all the way to the bottom of the stack without being caught, it gets handled by the environment. What this means differs between environments. In browsers, a description of the error typically gets written to the JavaScript console (reachable through the browser’s Tools or Developer menu). Node.js, the browserless JavaScript environment we will discuss later, is more careful about data corruption. It aborts the whole process when an unhandled exception occurs.

For programmer mistakes, just letting the error go through is often the best you can do. An unhandled exception is a reasonable way to signal a broken program, and the JavaScript console will, on modern browsers, provide you with some information about which function calls were on the stack when the problem occurred. For problems that are expected to happen during routine use, crashing with an unhandled exception is a terrible strategy. Invalid uses of the language, such as referencing a nonexistent binding, looking up a property on null, or calling something that’s not a function, will also result in exceptions being raised. Such exceptions can also be caught.

When a catch body is entered, all we know is that something in our try body caused an exception. But we don’t know what did or which exception it caused. JavaScript (in a rather glaring omission) doesn’t provide direct support for selectively catching exceptions: either you catch them all or you don’t catch any. This makes it tempting to assume that the exception you get is the one you were thinking about when you wrote the catch block. But it might not be. Some other assumption might be violated, or you might have introduced a bug that is causing an exception. Here is an example that attempts to keep on calling promptDirection until it gets a valid answer:

for (;;) {

try {

let dir = promtDirection(“Where?”); // ← typo!

console.log(“You chose “, dir);

break;

} catch (e) {

console.log(“Not a valid direction. Try again.”);

    }

}

The for (;;) construct is a way to intentionally create a loop that doesn’t terminate on its own. We break out of the loop only when a valid direction is given. But we misspelled promptDirection, which will result in an “undefined variable” error. Because the catch block completely ignores its exception value (e), assuming it knows what the problem is, it wrongly treats the binding error as indicating bad input. Not only does this cause an infinite loop, it “buries” the useful error message about the misspelled binding.

As a general rule, don’t blanket-catch exceptions unless it is for the purpose of “routing” them somewhere—for example, over the network to tell another system that our program crashed. And even then, think carefully about how you might be hiding information. So we want to catch a specific kind of exception. We can do this by checking in the catch block whether the exception we got is the one we are interested in and rethrowing it otherwise. But how do we recognize an exception?

We could compare its message property against the error message we happen to expect. But that’s a shaky way to write code—we’d be using information that’s intended for human consumption (the message) to make a programmatic decision. As soon as someone changes (or translates) the message, the code will stop working. Rather, let’s define a new type of error and use instanceof to identify it.

class InputError extends Error {}

function promptDirection(question) {

let result = prompt(question);

if (result.toLowerCase() == “left”) return “L”;

if (result.toLowerCase() == “right”) return “R”;

throw new InputError(“Invalid direction: ” + result);

}

The new error class extends Error. It doesn’t define its own constructor, which means that it inherits the Error constructor, which expects a string message as argument. In fact, it doesn’t define anything at all—the class is empty. InputError objects behave like Error objects, except that they have a different class by which we can recognize them. Now the loop can catch these more carefully.

for (;;) {

try {

let dir = promptDirection(“Where?”);

console.log(“You chose “, dir);

break;

} catch (e) {

if (e instanceof InputError) {

console.log(“Not a valid direction. Try again.”);

} else {

   throw e;

      }

   }

}

This will catch only instances of InputError and let unrelated exceptions through. If you reintroduce the typo, the undefined binding error will be properly reported.

Assertions

Assertions are checks inside a program that verify that something is the way it is supposed to be. They are used not to handle situations that can come up in normal operation but to find programmer mistakes. If, for example, firstElement is described as a function that should never be called on empty arrays, we might write it like this:

function firstElement(array) {

if (array.length == 0) {

throw new Error(“firstElement called with []”);

}

return array[0];

}

Now, instead of silently returning undefined (which you get when reading an array property that does not exist), this will loudly blow up your program as soon as you misuse it. This makes it less likely for such mistakes to go unnoticed and easier to find their cause when they occur. We do not recommend trying to write assertions for every possible kind of bad input. That’d be a lot of work and would lead to very noisy code. You’ll want to reserve them for mistakes that are easy to make (or that you find yourself making).

Regular Expressions

Programming tools and techniques survive and spread in a chaotic, evolutionary way. It’s not always the pretty or brilliant ones that win but rather the ones that function well enough within the right niche or that happen to be integrated with another successful piece of technology. We will discuss one such tool, regular expressions. Regular expressions are a way to describe patterns in string data. They form a small, separate language that is part of JavaScript and many other languages and systems. Regular expressions are both terribly awkward and extremely useful. Their syntax is cryptic, and the programming interface JavaScript provides for them is clumsy. But they are a powerful tool for inspecting and processing strings. Properly understanding regular expressions will make you a more effective programmer.

Creating a regular expression

A regular expression is a type of object. It can be either constructed with the RegExp constructor or written as a literal value by enclosing a pattern in forward slash (/) characters.

let re1 = new RegExp(“abc”);

let re2 = /abc/;

Both of those regular expression objects represent the same pattern: an a character followed by a b followed by a cWhen using the RegExp constructor, the pattern is written as a normal string, so the usual rules apply for backslashes. The second notation, where the pattern appears between slash characters, treats backslashes somewhat differently. First, since a forward slash ends the pattern, we need to put a backslash before any forward slash that we want to be part of the pattern. In addition, backslashes that aren’t part of special character codes (like \n) will be preserved, rather than ignored as they are in strings, and change the meaning of the pattern. Some characters, such as question marks and plus signs, have special meanings in regular expressions and must be preceded by a backslash if they are meant to represent the character itself.

let eighteenPlus = /eighteen\+/;

Testing for matches

Regular expression objects have a number of methods. The simplest one is test. If you pass it a string, it will return a Boolean telling you whether the string contains a match of the pattern in the expression.

console.log(/abc/.test(“abcde”));

// → true

console.log(/abc/.test(“abxde”));

// → false

A regular expression consisting of only nonspecial characters simply represents that sequence of characters. If abc occurs anywhere in the string we are testing against (not just at the start), test will return true.

Sets of characters

Finding out whether a string contains abc could just as well be done with a call to indexOf. Regular expressions allow us to express more complicated patterns. Say we want to match any number. In a regular expression, putting a set of characters between square brackets makes that part of the expression match any of the characters between the brackets. Both of the following expressions match all strings that contain a digit:

console.log(/[0123456789]/.test(“in 1992”));

// → true

console.log(/[0-9]/.test(“in 1992”));

// → true

Within square brackets, a hyphen () between two characters can be used to indicate a range of characters, where the ordering is determined by the character’s Unicode number. Characters 0 to 9 sit right next to each other in this ordering (codes 48 to 57), so [0-9] covers all of them and matches any digit. A number of common character groups have their own built-in shortcuts.

Digits are one of them: \d means the same thing as [0-9].

\d    Any digit character

\w   An alphanumeric character (“word character”)

\s    Any whitespace character (space, tab, newline, and similar)

\D   A character that is not a digit

\W  A nonalphanumeric character

\S   A nonwhitespace character 

         Any character except for newline

So you could match a date and time format like 01-30-2003 15:20 with the following expression:

let dateTime = /\d\d-\d\d-\d\d\d\d \d\d:\d\d/;

console.log(dateTime.test(“01-30-2003 15:20”));

// → true

console.log(dateTime.test(“30-jan-2003 15:20”));

// → false

That looks completely awful, doesn’t it? Half of it is backslashes, producing a background noise that makes it hard to spot the actual pattern expressed. We’ll see a slightly improved version of this expression laterThese backslash codes can also be used inside square brackets. For example, [\d.] means any digit or a period character. But the period itself, between square brackets, loses its special meaning. The same goes for other special characters, such as +To invert a set of characters—that is, to express that you want to match any character except the ones in the set—you can write a caret (^) character after the opening bracket.

let notBinary = /[^01]/;

console.log(notBinary.test(“1100100010100110”));

// → false

console.log(notBinary.test(“1100100010200110”));

// → true

Repeating parts of a pattern

We now know how to match a single digit. What if we want to match a whole number—a sequence of one or more digits? When you put a plus sign (+) after something in a regular expression, it indicates that the element may be repeated more than once. Thus, /\d+/ matches one or more digit characters.

console.log(/’\d+’/.test(“‘123′”));

// → true

console.log(/’\d+’/.test(“””));

// → false

console.log(/’\d*’/.test(“‘123′”));

// → true

console.log(/’\d*’/.test(“””));

// → true

The star (*) has a similar meaning but also allows the pattern to match zero times. Something with a star after it never prevents a pattern from matching— it’ll just match zero instances if it can’t find any suitable text to match. A question mark makes a part of a pattern optional, meaning it may occur zero times or one time. In the following example, the u character is allowed to occur, but the pattern also matches when it is missing.

let neighbor = /neighbou?r/;

console.log(neighbor.test(“neighbour”));

// → true

console.log(neighbor.test(“neighbor”));

// → true

To indicate that a pattern should occur a precise number of times, use braces. Putting {4} after an element, for example, requires it to occur exactly four times. It is also possible to specify a range this way: {2,4} means the element must occur at least twice and at most four times. Here is another version of the date and time pattern that allows both single-and double-digit days, months, and hours. It is also slightly easier to decipher.

let dateTime = /\d{1,2}-\d{1,2}-\d{4} \d{1,2}:\d{2}/;

console.log(dateTime.test(“1-30-2003 8:45”));

// → true

You can also specify open-ended ranges when using braces by omitting the number after the comma. So, {5,} means five or more times.

Grouping subexpressions

To use an operator like * or + on more than one element at a time, you have to use parentheses. A part of a regular expression that is enclosed in parentheses counts as a single element as far as the operators following it are concerned.

let cartoonCrying = /boo+(hoo+)+/i; console.log(cartoonCrying.test(“Boohoooohoohooo”));

// → true

The first and second + characters apply only to the second o in boo and hoo, respectively. The third + applies to the whole group (hoo+), matching one or more sequences like that. The i at the end of the expression in the example makes this regular expression case insensitive, allowing it to match the uppercase B in the input string, even though the pattern is itself all lowercase.

Matches and groups

The test method is the absolute simplest way to match a regular expression. It tells you only whether it matched and nothing else. Regular expressions also have an exec (execute) method that will return null if no match was found and return an object with information about the match otherwise.

let match = /\d+/.exec(“one two 100”);

console.log(match);

// → [“100”]

console.log(match.index);

// → 8

An object returned from exec has an index property that tells us where in the string the successful match begins. Other than that, the object looks like (and in fact is) an array of strings, whose first element is the string that was matched. In the previous example, this is the sequence of digits that we were looking for. String values have a match method that behaves similarly.

console.log(“one two 100”.match(/\d+/));

// → [“100”]

When the regular expression contains subexpressions grouped with parentheses, the text that matched those groups will also show up in the array. The whole match is always the first element. The next element is the part matched by the first group (the one whose opening parenthesis comes first in the expression), then the second group, and so on.

let quotedText = /'([^’]*)’/;

console.log(quotedText.exec(“she said ‘hello'”));

// → [“‘hello'”, “hello”]

When a group does not end up being matched at all (for example, when followed by a question mark), its position in the output array will hold undefined. Similarly, when a group is matched multiple times, only the last match ends up in the array.

console.log(/bad(ly)?/.exec(“bad”));

// → [“bad”, undefined]

console.log(/(\d)+/.exec(“123”));

// → [“123”, “3”]

Groups can be useful for extracting parts of a string. If we don’t just want to verify whether a string contains a date but also extract it and construct an object that represents it, we can wrap parentheses around the digit patterns and directly pick the date out of the result of exec.But first we’ll take a brief detour, in which we discuss the built-in way to represent date and time values in JavaScript.

The Date class

JavaScript has a standard class for representing dates—or, rather, points in time. It is called Date. If you simply create a date object using new, you get the current date and time.

console.log(new Date());

// → Mon Nov 13 2017 16:19:11 GMT+0100 (CET)

You can also create an object for a specific time.

console.log(new Date(2009, 11, 9));

// → Wed Dec 09 2009 00:00:00 GMT+0100 (CET)

console.log(new Date(2009, 11, 9, 12, 59, 59, 999));

// → Wed Dec 09 2009 12:59:59 GMT+0100 (CET)

JavaScript uses a convention where month numbers start at zero (so December is 11), yet day numbers start at one. This is confusing and silly. Be careful. The last four arguments (hours, minutes, seconds, and milliseconds) are optional and taken to be zero when not given. Timestamps are stored as the number of milliseconds since the start of 1970, in the UTC time zone. This follows a convention set by “Unix time”, which was invented around that time. You can use negative numbers for times before 1970. The getTime method on a date object returns this number. It is big, as you can imagine.

console.log(new Date(2013, 11, 19).getTime());

// → 1387407600000

console.log(new Date(1387407600000));

// → Thu Dec 19 2013 00:00:00 GMT+0100 (CET)

If you give the Date constructor a single argument, that argument is treated as such a millisecond count. You can get the current millisecond count by creating a new Date object and calling getTime on it or by calling the Date.now function. Date objects provide methods such as getFullYear, getMonth, getDate, getHours , getMinutes, and getSeconds to extract their components. Besides getFullYear there’s also getYear, which gives you the year minus 1900 (98 or 119) and is mostly useless. Putting parentheses around the parts of the expression that we are interested in, we can now create a date object from a string.

function getDate(string) {

let [_, month, day, year] =

/(\d{1,2})-(\d{1,2})-(\d{4})/.exec(string);

return new Date(year, month – 1, day);

}

console.log(getDate(“1-30-2003”));

// → Thu Jan 30 2003 00:00:00 GMT+0100 (CET)

The _ (underscore) binding is ignored and used only to skip the full match element in the array returned by exec.

Word and string boundaries

Unfortunately, getDate will also happily extract the nonsensical date 00-1-3000 from the string “100-1-30000”. A match may happen anywhere in the string, so in this case, it’ll just start at the second character and end at the second-to-last character. If we want to enforce that the match must span the whole string, we can add the markers ^ and $. The caret matches the start of the input string, whereas the dollar sign matches the end. So, /^\d+$/ matches a string consisting entirely of one or more digits, /^!/ matches any string that starts with an exclamation mark, and /x^/ does not match any string (there cannot be an x before the start of the string).

If, on the other hand, we just want to make sure the date starts and ends on a word boundary, we can use the marker \b. A word boundary can be the start or end of the string or any point in the string that has a word character (as in \w) on one side and a nonword character on the other.

console.log(/cat/.test(“concatenate”));

// → true

console.log(/\bcat\b/.test(“concatenate”));

// → false

Note that a boundary marker doesn’t match an actual character. It just enforces that the regular expression matches only when a certain condition holds at the place where it appears in the pattern.

Choice patterns

Say we want to know whether a piece of text contains not only a number but a number followed by one of the words pig, cow, or chicken, or any of their plural forms. We could write three regular expressions and test them in turn, but there is a nicer way. The pipe character (|) denotes a choice between the pattern to its left and the pattern to its right. So I can say this:

let animalCount = /\b\d+ (pig|cow|chicken)s?\b/; console.log(animalCount.test(“15 pigs”));

// → true

console.log(animalCount.test(“15 pigchickens”));

// → false

Parentheses can be used to limit the part of the pattern that the pipe operator applies to, and you can put multiple such operators next to each other to express a choice between more than two alternatives.

The mechanics of matching

Conceptually, when you use exec or test, the regular expression engine looks for a match in your string by trying to match the expression first from the start of the string, then from the second character, and so on, until it finds a match or reaches the end of the string. It’ll either return the first match that can be found or fail to find any match at all. To do the actual matching, the engine treats a regular expression something like a flow diagram. This is the diagram for the livestock expression in the previous example:

Our expression matches if we can find a path from the left side of the diagram to the right side. We keep a current position in the string, and every time we move through a box, we verify that the part of the string after our current position matches that box. So if we try to match “the 3 pigs” from position 4, our progress through the flow chart would look like this:

  • At position 4, there is a word boundary, so we can move past the first box.
  • Still at position 4, we find a digit, so we can also move past the second box.
  • At position 5, one path loops back to before the second (digit) box, while the other moves forward through the box that holds a single space character. There is a space here, not a digit, so we must take the second path.
  • We are now at position 6 (the start of pigs) and at the three-way branch in the diagram. We don’t see cow or chicken here, but we do see pig, so we take that branch.
  • At position 9, after the three-way branch, one path skips the s box and goes straight to the final word boundary, while the other path matches an s. There is an s character here, not a word boundary, so we go through the s box.
  • We’re at position 10 (the end of the string) and can match only a word boundary. The end of a string counts as a word boundary, so we go through the last box and have successfully matched this string.

Backtracking

The regular expression /\b([01]+b|[\da-f]+h|\d+)\b/ matches either a binary number followed by a b, a hexadecimal number (that is, base 16, with the letters a to f standing for the digits 10 to 15) followed by an h, or a regular decimal number with no suffix character. This is the corresponding diagram:

When matching this expression, it will often happen that the top (binary) branch is entered even though the input does not actually contain a binary number. When matching the string “103”, for example, it becomes clear only at the 3 that we are in the wrong branch. The string does match the expression, just not the branch we are currently in.

So the matcher backtracks. When entering a branch, it remembers its current position (in this case, at the start of the string, just past the first boundary box in the diagram) so that it can go back and try another branch if the current one does not work out. For the string “103”, after encountering the 3 character, it will start trying the branch for hexadecimal numbers, which fails again because there is no h after the number. So it tries the decimal number branch. This one fits, and a match is reported after all.

The matcher stops as soon as it finds a full match. This means that if multiple branches could potentially match a string, only the first one (ordered by where the branches appear in the regular expression) is used. Backtracking also happens for repetition operators like + and *. If you match /^.*x/ against “abcxe”, the .* part will first try to consume the whole string. The engine will then realize that it needs an x to match the pattern. Since there is no x past the end of the string, the star operator tries to match one character less. But the matcher doesn’t find an x after abcx either, so it backtracks again, matching the star operator to just abc. Now it finds an x where it needs it and reports a successful match from positions 0 to 4.

It is possible to write regular expressions that will do a lot of backtracking. This problem occurs when a pattern can match a piece of input in many different ways. For example, if we get confused while writing a binary-number regular expression, we might accidentally write something like /([01]+)+b/.

If that tries to match some long series of zeros and ones with no trailing b character, the matcher first goes through the inner loop until it runs out of digits. Then it notices there is no b, so it backtracks one position, goes through the outer loop once, and gives up again, trying to backtrack out of the inner loop once more. It will continue to try every possible route through these two loops. This means the amount of work doubles with each additional character. For even just a few dozen characters, the resulting match will take practically forever.

The replace method

String values have a replace method that can be used to replace part of the string with another string.

console.log(“papa”.replace(“p”, “m”));

// → mapa

The first argument can also be a regular expression, in which case the first match of the regular expression is replaced. When a g option (for global) is added to the regular expression, all matches in the string will be replaced, not just the first.

console.log(“Borobudur”.replace(/[ou]/, “a”));

// → Barobudur

console.log(“Borobudur”.replace(/[ou]/g, “a”));

// → Barabadar

It would have been sensible if the choice between replacing one match or all matches was made through an additional argument to replace or by providing a different method, replaceAll. But for some unfortunate reason, the choice relies on a property of the regular expression instead. The real power of using regular expressions with replace comes from the fact that we can refer to matched groups in the replacement string. For example, say we have a big string containing the names of people, one name per line, in the format Lastname, Firstname. If we want to swap these names and remove the comma to get a Firstname Lastname format, we can use the following code:

console.log(

“Liskov, Barbara\nMcCarthy, John\nWadler, Philip”

.replace(/(\w+), (\w+)/g, “$2 $1”));

// → Barbara Liskov

// John McCarthy

// Philip Wadler

The $1 and $2 in the replacement string refer to the parenthesized groups in the pattern. $1 is replaced by the text that matched against the first group, $2 by the second, and so on, up to $9. The whole match can be referred to with $&It is possible to pass a function—rather than a string—as the second argument to replace. For each replacement, the function will be called with the matched groups (as well as the whole match) as arguments, and its return value will be inserted into the new string. Here’s a small example:

let s = “the cia and fbi”;

console.log(s.replace(/\b(fbi|cia)\b/g,

str => str.toUpperCase()));

// → the CIA and FBI

Here’s a more interesting one:

let stock = “1 lemon, 2 cabbages, and 101 eggs”;

function minusOne(match, amount, unit) {

     amount = Number(amount) – 1;

if (amount == 1) { // only one left, remove the ‘s’

     unit = unit.slice(0, unit.length – 1);

} else if (amount == 0) {

          amount = “no”;

}

   return amount + ” ” + unit;

}

console.log(stock.replace(/(\d+) (\w+)/g, minusOne));

// → no lemon, 1 cabbage, and 100 eggs

This takes a string, finds all occurrences of a number followed by an alphanumeric word, and returns a string wherein every such occurrence is decremented by one. The (\d+) group ends up as the amount argument to the function, and the (\w+) group gets bound to unit. The function converts amount to a number— which always works since it matched \d+—and makes some adjustments in case there is only one or zero left.

Greed

It is possible to use replace to write a function that removes all comments from a piece of JavaScript code. Here is a first attempt:

function stripComments(code) {

return code.replace(/\/\/.*|\/\*[^]*\*\//g, “”);

}

console.log(stripComments(“1 + /* 2 */3”));

// → 1 + 3

console.log(stripComments(“x = 10;// ten!”));

// → x = 10;

console.log(stripComments(“1 /* a */+/* b */ 1”));

// → 1 1

The part before the or operator matches two slash characters followed by any number of non-newline characters. The part for multiline comments is more involved. We use [^] (any character that is not in the empty set of characters) as a way to match any character. We cannot just use a period here because block comments can continue on a new line, and the period character does not match newline characters. But the output for the last line appears to have gone wrong. Why?

The [^]* part of the expression, as we described in the section on backtracking, will first match as much as it can. If that causes the next part of the pattern to fail, the matcher moves back one character and tries again from there. In the example, the matcher first tries to match the whole rest of the string and then moves back from there. It will find an occurrence of */ after going back four characters and match that. This is not what we wanted—the intention was to match a single comment, not to go all the way to the end of the code and find the end of the last block comment.

Because of this behavior, we say the repetition operators (+, *, ?, and {} ) are greedy, meaning they match as much as they can and backtrack from there. If you put a question mark after them (+?, *?, ??, {}?), they become nongreedy and start by matching as little as possible, matching more only when the remaining pattern does not fit the smaller match. And that is exactly what we want in this case. By having the star match the smallest stretch of characters that brings us to a */, we consume one block comment and nothing more.

function stripComments(code) {

return code.replace(/\/\/.*|\/\*[^]*?\*\//g, “”);

}

console.log(stripComments(“1 /* a */+/* b */ 1”));

// → 1 + 1

A lot of bugs in regular expression programs can be traced to unintentionally using a greedy operator where a nongreedy one would work better. When using a repetition operator, consider the nongreedy variant first.

Dynamically creating RegExp objects

There are cases where you might not know the exact pattern you need to match against when you are writing your code. Say you want to look for the user’s name in a piece of text and enclose it in underscore characters to make it stand out. Since you will know the name only once the program is actually running, you can’t use the slash-based notation. But you can build up a string and use the RegExp constructor on that. Here’s an example:

let name = “harry”;

let text = “Harry is a suspicious character.”;

let regexp = new RegExp(“\\b(” + name + “)\\b”, “gi”); console.log(text.replace(regexp, “_$1_”)); /

/ → _Harry_ is a suspicious character.

When creating the \b boundary markers, we have to use two backslashes because we are writing them in a normal string, not a slash-enclosed regular expression. The second argument to the RegExp constructor contains the options for the regular expression—in this case, “gi” for global and case insensitive. But what if the name is “dea+hl[]rd” because our user is a nerdy teenager? That would result in a nonsensical regular expression that won’t actually match the user’s name. To work around this, we can add backslashes before any character that has a special meaning.

let name = “dea+hl[]rd”;

let text = “This dea+hl[]rd guy is super annoying.”;

let escaped = name.replace(/[\\[.+*?(){|^$]/g, “\\$&”);

let regexp = new RegExp(“\\b” + escaped + “\\b”, “gi”); console.log(text.replace(regexp, “_$&_”));

// → This _dea+hl[]rd_ guy is super annoying.

The search method

The indexOf method on strings cannot be called with a regular expression. But there is another method, search, that does expect a regular expression. Like indexOf, it returns the first index on which the expression was found, or -1 when it wasn’t found.

console.log(” word”.search(/\S/));

// → 2

console.log(“ “.search(/\S/));

// → -1

Unfortunately, there is no way to indicate that the match should start at a given offset (like we can with the second argument to indexOf), which would often be useful.

The lastIndex property

The exec method similarly does not provide a convenient way to start searching from a given position in the string. But it does provide an inconvenient way. Regular expression objects have properties. One such property is source, which contains the string that expression was created from. Another property is lastIndex, which controls, in some limited circumstances, where the next match will start. Those circumstances are that the regular expression must have the global (g) or sticky (y) option enabled, and the match must happen through the exec method. Again, a less confusing solution would have been to just allow an extra argument to be passed to exec, but confusion is an essential feature of JavaScript’s regular expression interface.

let pattern = /y/g;

pattern.lastIndex = 3;

let match = pattern.exec(“xyzzy”);

console.log(match.index);

// → 4

console.log(pattern.lastIndex);

// → 5

If the match was successful, the call to exec automatically updates the lastIndex property to point after the match. If no match was found, lastIndex is set back to zero, which is also the value it has in a newly constructed regular expression object. The difference between the global and the sticky options is that, when sticky is enabled, the match will succeed only if it starts directly at lastIndex, whereas with global, it will search ahead for a position where a match can start.

let global = /abc/g;

console.log(global.exec(“xyz abc”));

// → [“abc”]

let sticky = /abc/y;

console.log(sticky.exec(“xyz abc”));

// → null

When using a shared regular expression value for multiple exec calls, these automatic updates to the lastIndex property can cause problems. Your regular expression might be accidentally starting at an index that was left over from a previous call.

let digit = /\d/g;

console.log(digit.exec(“here it is: 1”));

// → [“1”]

console.log(digit.exec(“and now: 1”));

// → null

Another interesting effect of the global option is that it changes the way the match method on strings works. When called with a global expression, instead of returning an array similar to that returned by exec, match will find all matches of the pattern in the string and return an array containing the matched strings.

console.log(“Banana”.match(/an/g));

// → [“an”, “an”]

So be cautious with global regular expressions. The cases where they are necessary—calls to replace and places where you want to explicitly use lastIndex —are typically the only places where you want to use them.

Looping over matches

A common thing to do is to scan through all occurrences of a pattern in a string, in a way that gives us access to the match object in the loop body. We can do this by using lastIndex and exec.

let input = “A string with 3 numbers in it… 42 and 88.”;

let number = /\b\d+\b/g;

let match;

while (match = number.exec(input)) {

       console.log(“Found”, match[0], “at”, match.index);

}

// → Found 3 at 14

// Found 42 at 33

// Found 88 at 40

This makes use of the fact that the value of an assignment expression (=) is the assigned value. So by using match = number.exec(input) as the condition in the while statement, we perform the match at the start of each iteration, save its result in a binding, and stop looping when no more matches are found.

Parsing an INI file

We’ll look at a problem that calls for regular expressions. Imagine we are writing a program to automatically collect information about our enemies from the Internet. (We will not actually write that program here, just the part that reads the configuration file. Sorry.) The configuration file looks like this:

searchengine=https://duckduckgo.com/?q=$1

spitefulness=9.7

; comments are preceded by a semicolon…

; each section concerns an individual enemy

[larry]

fullname=Larry Doe

type=kindergarten bully

website=http://www.geocities.com/CapeCanaveral/11451

[davaeorn]

fullname=Davaeorn

type=evil wizard

outputdir=/home/marijn/enemies/davaeorn

The exact rules for this format (which is a widely used format, usually called an INI file) are as follows:

  • Blank lines and lines starting with semicolons are ignored.
  • Lines wrapped in [ and ] start a new section.
  • Lines containing an alphanumeric identifier followed by an = character add a setting to the current section.
  • Anything else is invalid.

Our task is to convert a string like this into an object whose properties hold strings for settings written before the first section header and subobjects for sections, with those subobjects holding the section’s settings. Since the format has to be processed line by line, splitting up the file into separate lines is a good start. We used string.split(“\n”) to do this before. Some operating systems, however, use not just a newline character to separate lines but a carriage return character followed by a newline (“\r\n”). Given that the split method also allows a regular expression as its argument, we can use a regular expression like /\r?\n/ to split in a way that allows both “\n” and “\r\n” between lines.

function parseINI(string) {
// Start with an object to hold the top-level fields
let result = {};
let section = result;
string.split(/\r?\n/).forEach(line => {
let match;
if (match = line.match(/^(\w+)=(.*)$/)) {
section[match[1]] = match[2];
} else if (match = line.match(/^\[(.*)\]$/)) {
section = result[match[1]] = {};
} else if (!/^\s*(;.*)?$/.test(line)) {throw new Error(“Line ‘” + line + “‘ is not valid.”); }

});
return result;
}

console.log(parseINI(`
name=Vasilis[address] city=Tessaloniki`));
// → {name: “Vasilis”, address: {city: “Tessaloniki”}}

The code goes over the file’s lines and builds up an object. Properties at the top are stored directly into that object, whereas properties found in sections are stored in a separate section object. The section binding points at the object for the current section. There are two kinds of significant lines—section headers or property lines. When a line is a regular property, it is stored in the current section. When it is a section header, a new section object is created, and section is set to point at it. Note the recurring use of ^ and $ to make sure the expression matches the whole line, not just part of it. Leaving these out results in code that mostly works but behaves strangely for some input, which can be a difficult bug to track down.

The pattern if (match = string.match(…)) is similar to the trick of using an assignment as the condition for while. You often aren’t sure that your call to match will succeed, so you can access the resulting object only inside an if statement that tests for this. To not break the pleasant chain of else if forms, we assign the result of the match to a binding and immediately use that assignment as the test for the if statement. If a line is not a section header or a property, the function checks whether it is a comment or an empty line using the expression /^\s*(;.*)?$/. Do you see how it works? The part between the parentheses will match comments, and the ? makes sure it also matches lines containing only whitespace. When a line doesn’t match any of the expected forms, the function throws an exception.

International characters

Because of JavaScript’s initial simplistic implementation and the fact that this simplistic approach was later set in stone as standard behavior, JavaScript’s regular expressions are rather dumb about characters that do not appear in the English language. For example, as far as JavaScript’s regular expressions are concerned, a “word character” is only one of the 26 characters in the Latin alphabet (uppercase or lowercase), decimal digits, and, for some reason, the underscore character. Things like é or ß, which most definitely are word characters, will not match \w (and will match uppercase \W, the nonword category).

By a strange historical accident, \s (whitespace) does not have this problem and matches all characters that the Unicode standard considers whitespace, including things like the nonbreaking space and the Mongolian vowel separator. Another problem is that, by default, regular expressions work on code units, not actual characters. This means characters that are composed of two code units behave strangely.

console.log(/🍎{3}/.test(“🍎🍎🍎”));

// → false

console.log(/<.>/.test(“<🌹>”));

// → false

console.log(/<.>/u.test(“<🌹>”));

// → true

The problem is that the in the first line is treated as two code units, and the {3} part is applied only to the second one. Similarly, the dot matches a single code unit, not the two that make up the rose emoji. You must add a u option (for Unicode) to your regular expression to make it treat such characters properly. The wrong behavior remains the default, unfortunately, because changing that might cause problems for existing code that depends on it.

Though this was only just standardized and is, at the time of writing, not widely supported yet, it is possible to use \p in a regular expression (that must have the Unicode option enabled) to match all characters to which the Unicode standard assigns a given property.

console.log(/\p{Script=Greek}/u.test(“α”));

// → true

console.log(/\p{Script=Arabic}/u.test(“α”));

// → false

console.log(/\p{Alphabetic}/u.test(“α”));

// → true

console.log(/\p{Alphabetic}/u.test(“!”));

// → false

Unicode defines a number of useful properties, though finding the one that you need may not always be trivial. You can use the \p{Property=Value} notation to match any character that has the given value for that property. If the property name is left off, as in \p{Name}, the name is assumed to be either a binary property such as Alphabetic or a category such as Number.

Modules

The ideal program has a crystal-clear structure. The way it works is easy to explain, and each part plays a well-defined role. A typical real program grows organically. New pieces of functionality are added as new needs come up. Structuring—and preserving structure—is additional work. It’s work that will pay off only in the future, the next time someone works on the program. So it is tempting to neglect it and allow the parts of the program to become deeply entangled.

This causes two practical issues. First, understanding such a system is hard. If everything can touch everything else, it is difficult to look at any given piece in isolation. You are forced to build up a holistic understanding of the entire thing. Second, if you want to use any of the functionality from such a program in another situation, rewriting it may be easier than trying to disentangle it from its context. The phrase “big ball of mud” is often used for such large, structureless programs. Everything sticks together, and when you try to pick out a piece, the whole thing comes apart, and your hands get dirty.

Modules

Modules are an attempt to avoid these problems. A module is a piece of program that specifies which other pieces it relies on and which functionality it provides for other modules to use (its interface). Module interfaces have a lot in common with object interfaces. They make part of the module available to the outside world and keep the rest private. By restricting the ways in which modules interact with each other, the system becomes more like LEGO, where pieces interact through well-defined connectors, and less like mud, where everything mixes with everything.

The relations between modules are called dependencies. When a module needs a piece from another module, it is said to depend on that module. When this fact is clearly specified in the module itself, it can be used to figure out which other modules need to be present to be able to use a given module and to automatically load dependencies. To separate modules in that way, each needs it own private scope.

Just putting your JavaScript code into different files does not satisfy these requirements. The files still share the same global namespace. They can, intentionally or accidentally, interfere with each other’s bindings. And the dependency structure remains unclear. We can do better, as we’ll see later in the chapter. Designing a fitting module structure for a program can be difficult. In the phase where you are still exploring the problem, trying different things to see what works, you might want to not worry about it too much since it can be a big distraction. Once you have something that feels solid, that’s a good time to take a step back and organize it.

Packages

One of the advantages of building a program out of separate pieces, and being actually able to run those pieces on their own, is that you might be able to apply the same piece in different programs. But how do you set this up? Say we want to use the parseINI function from earlier  in another program. If it is clear what the function depends on (in this case, nothing), we can just copy all the necessary code into our new project and use it. But then, if we find a mistake in that code, we’ll probably fix it in whichever program we are working with at the time and forget to also fix it in the other program. Once you start duplicating code, you’ll quickly find yourself wasting time and energy moving copies around and keeping them up-to-date.

That’s where packages come in. A package is a chunk of code that can be distributed (copied and installed). It may contain one or more modules and has information about which other packages it depends on. A package also usually comes with documentation explaining what it does so that people who didn’t write it might still be able to use it. When a problem is found in a package or a new feature is added, the package is updated. Now the programs that depend on it (which may also be packages) can upgrade to the new version. Working in this way requires infrastructure. We need a place to store and find packages and a convenient way to install and upgrade them. In the JavaScript world, this infrastructure is provided by NPM (https://npmjs.org).

NPM is two things: an online service where one can download (and upload) packages and a program (bundled with Node.js) that helps you install and manage them. At the time of writing, there are more than half a million different packages available on NPM. A large portion of those are rubbish, I should mention, but almost every useful, publicly available package can be found on there. For example, an INI file parser, similar to the one we built before, is available under the package name iniHaving quality packages available for download is extremely valuable. It means that we can often avoid reinventing a program that 100 people have written before and get a solid, well-tested implementation at the press of a few keys.

Software is cheap to copy, so once someone has written it, distributing it to other people is an efficient process. But writing it in the first place is work, and responding to people who have found problems in the code, or who want to propose new features, is even more work. By default, you own the copyright to the code you write, and other people may use it only with your permission. But because some people are just nice and because publishing good software can help make you a little bit famous among programmers, many packages are published under a license that explicitly allows other people to use it.

Most code on NPM is licensed this way. Some licenses require you to also publish code that you build on top of the package under the same license. Others are less demanding, just requiring that you keep the license with the code as you distribute it. The JavaScript community mostly uses the latter type of license. When using other people’s packages, make sure you are aware of their license.

Improvised modules

Until 2015, the JavaScript language had no built-in module system. Yet people had been building large systems in JavaScript for more than a decade, and they needed modules. So they designed their own module systems on top of the language. You can use JavaScript functions to create local scopes and objects to represent module interfaces. This is a module for going between day names and numbers (as returned by Date’s getDay method). Its interface consists of weekDay.name and weekDay .number, and it hides its local binding names inside the scope of a function expression that is immediately invoked.

const weekDay = function() {

const names = [“Sunday”, “Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”, “Saturday”];

return {

name(number) { return names[number]; }, number(name) { return names.indexOf(name); }

};

}();

console.log(weekDay.name(weekDay.number(“Sunday”)));

// → Sunday

This style of modules provides isolation, to a certain degree, but it does not declare dependencies. Instead, it just puts its interface into the global scope and expects its dependencies, if any, to do the same. For a long time this was the main approach used in web programming, but it is mostly obsolete now. If we want to make dependency relations part of the code, we’ll have to take control of loading dependencies. Doing that requires being able to execute strings as code. JavaScript can do this.

Evaluating data as code

There are several ways to take data (a string of code) and run it as part of the current program. The most obvious way is the special operator eval, which will execute a string in the current scope. This is usually a bad idea because it breaks some of the properties that scopes normally have, such as it being easily predictable which binding a given name refers to.

const x = 1;

function evalAndReturnX(code) {

eval(code);

return x;

}

console.log(evalAndReturnX(“var x = 2”));

// → 2

console.log(x);

// → 1

A less scary way of interpreting data as code is to use the Function constructor. It takes two arguments: a string containing a comma-separated list of argument names and a string containing the function body. It wraps the code in a function value so that it gets its own scope and won’t do odd things with other scopes.

let plusOne = Function(“n”, “return n + 1;”);

console.log(plusOne(4));

// → 5

This is precisely what we need for a module system. We can wrap the module’s code in a function and use that function’s scope as module scope.

CommonJS

The most widely used approach to bolted-on JavaScript modules is called CommonJS modules. Node.js uses it and is the system used by most packages on NPM. The main concept in CommonJS modules is a function called require. When you call this with the module name of a dependency, it makes sure the module is loaded and returns its interface. Because the loader wraps the module code in a function, modules automatically get their own local scope. All they have to do is call require to access their dependencies and put their interface in the object bound to exports.

This example module provides a date-formatting function. It uses two packages from NPM—ordinal to convert numbers to strings like “1st” and “2nd”, and date-names to get the English names for weekdays and months. It exports a single function, formatDate, which takes a Date object and a template string. The template string may contain codes that direct the format, such as YYYY for the full year and Do for the ordinal day of the month. You could give it a string like “MMMM Do YYYY” to get output like “November 22nd 2017”.

const ordinal = require(“ordinal”);

const {days, months} = require(“date-names”);

exports.formatDate = function(date, format) {

return format.replace(/YYYY|M(MMM)?|Do?|dddd/g, tag => {

if (tag == “YYYY”) return date.getFullYear();

if (tag == “M”) return date.getMonth();

if (tag == “MMMM”) return months[date.getMonth()];

if (tag == “D”) return date.getDate();

if (tag == “Do”) return ordinal(date.getDate()); if (tag == “dddd”) return days[date.getDay()];

   });

};

The interface of ordinal is a single function, whereas date-names exports an object containing multiple things—days and months are arrays of names. Destructuring is very convenient when creating bindings for imported interfaces. The module adds its interface function to exports so that modules that depend on it get access to it. We could use the module like this:

const {formatDate} = require(“./format-date”);

console.log(formatDate(new Date(2017, 9, 13), “dddd the Do”));

// → Friday the 13th

We can define require, in its most minimal form, like this:

require.cache = Object.create(null);

function require(name) {

if (!(name in require.cache)) {

let code = readFile(name);

let module = {exports: {}};

require.cache[name] = module;

let wrapper = Function(“require, exports, module”, code);

wrapper(require, module.exports, module);

}

return require.cache[name].exports;

}

In this code, readFile is a made-up function that reads a file and returns its contents as a string. Standard JavaScript provides no such functionality—but different JavaScript environments, such as the browser and Node.js, provide their own ways of accessing files. The example just pretends that readFile exists. To avoid loading the same module multiple times, require keeps a store (cache) of already loaded modules. When called, it first checks if the requested module has been loaded and, if not, loads it. This involves reading the module’s code, wrapping it in a function, and calling it.

The interface of the ordinal package we saw before is not an object but a function. A quirk of the CommonJS modules is that, though the module system will create an empty interface object for you (bound to exports), you can replace that with any value by overwriting module.exports. This is done by many modules to export a single value instead of an interface object. By defining require, exports, and module as parameters for the generated wrapper function (and passing the appropriate values when calling it), the loader makes sure that these bindings are available in the module’s scope.

The way the string given to require is translated to an actual filename or web address differs in different systems. When it starts with “./” or “../”, it is generally interpreted as relative to the current module’s filename. So “./ format-date” would be the file named format-date.js in the same directory. When the name isn’t relative, Node.js will look for an installed package by that name. In the example code in this chapter, we’ll interpret such names as referring to NPM packages. We’ll go into more detail on how to install and use NPM modules laterNow, instead of writing our own INI file parser, we can use one from NPM.

const {parse} = require(“ini”);

console.log(parse(“x = 10\ny = 20”));

// → {x: “10”, y: “20”}

ECMAScript modules

CommonJS modules work quite well and, in combination with NPM, have allowed the JavaScript community to start sharing code on a large scale. But they remain a bit of a ducttape hack. The notation is slightly awkward— the things you add to exports are not available in the local scope, for example. And because require is a normal function call taking any kind of argument, not just a string literal, it can be hard to determine the dependencies of a module without running its code.

This is why the JavaScript standard from 2015 introduces its own, different module system. It is usually called ES modules, where ES stands for ECMAScript. The main concepts of dependencies and interfaces remain the same, but the details differ. For one thing, the notation is now integrated into the language. Instead of calling a function to access a dependency, you use a special import keyword.

import ordinal from “ordinal”;

import {days, months} from “date-names”;

export function formatDate(date, format) { /* … */ }

Similarly, the export keyword is used to export things. It may appear in front of a function, class, or binding definition (let, const, or var). An ES module’s interface is not a single value but a set of named bindings. The preceding module binds formatDate to a function. When you import from another module, you import the binding, not the value, which means an exporting module may change the value of the binding at any time, and the modules that import it will see its new value.

When there is a binding named default, it is treated as the module’s main exported value. If you import a module like ordinal in the example, without braces around the binding name, you get its default binding. Such modules can still export other bindings under different names alongside their default export. To create a default export, you write export default before an expression, a function declaration, or a class declaration.

export default [“Winter”, “Spring”, “Summer”, “Autumn”];

It is possible to rename imported bindings using the word as.

import {days as dayNames} from “date-names”;

console.log(dayNames.length);

// → 7

Another important difference is that ES module imports happen before a module’s script starts running. That means import declarations may not appear inside functions or blocks, and the names of dependencies must be quoted strings, not arbitrary expressions.

At the time of writing, the JavaScript community is in the process of adopting this module style. But it has been a slow process. It took a few years, after the format was specified, for browsers and Node.js to start supporting it. And though they mostly support it now, this support still has issues, and the discussion on how such modules should be distributed through NPM is still ongoing. Many projects are written using ES modules and then automatically converted to some other format when published. We are in a transitional period in which two different module systems are used side by side, and it is useful to be able to read and write code in either of them.

Building and bundling

In fact, many JavaScript projects aren’t even, technically, written in JavaScript. There are extensions, such as the type checking dialect mentioned, that are widely used. People also often start using planned extensions to the language long before they have been added to the platforms that actually run JavaScript. To make this possible, they compile their code, translating it from their chosen JavaScript dialect to plain old JavaScript—or even to a past version of JavaScript—so that old browsers can run it.

Including a modular program that consists of 200 different files in a web page produces its own problems. If fetching a single file over the network takes 50 milliseconds, loading the whole program takes 10 seconds, or maybe half that if you can load several files simultaneously. That’s a lot of wasted time. Because fetching a single big file tends to be faster than fetching a lot of tiny ones, web programmers have started using tools that roll their programs (which they painstakingly split into modules) back into a single big file before they publish it to the Web. Such tools are called bundlers.

And we can go further. Apart from the number of files, the size of the files also determines how fast they can be transferred over the network. Thus, the JavaScript community has invented minifiers. These are tools that take a JavaScript program and make it smaller by automatically removing comments and whitespace, renaming bindings, and replacing pieces of code with equivalent code that take up less space.

So it is not uncommon for the code that you find in an NPM package or that runs on a web page to have gone through multiple stages of transformation— converted from modern JavaScript to historic JavaScript, from ES module format to CommonJS, bundled, and minified. We won’t go into the details of these tools since they tend to be boring and change rapidly. Just be aware that the JavaScript code you run is often not the code as it was written.

Module design

Structuring programs is one of the subtler aspects of programming. Any nontrivial piece of functionality can be modeled in various ways. Good program design is subjective—there are tradeoffs involved and matters of taste. The best way to learn the value of well-structured design is to read or work on a lot of programs and notice what works and what doesn’t. Don’t assume that a painful mess is “just the way it is”. You can improve the structure of almost everything by putting more thought into it.

One aspect of module design is ease of use. If you are designing something that is intended to be used by multiple people—or even by yourself, in three months when you no longer remember the specifics of what you did—it is helpful if your interface is simple and predictable. That may mean following existing conventions. A good example is the ini package. This module imitates the standard JSON object by providing parse and stringify (to write an INI file) functions, and, like JSON, converts between strings and plain objects. So the interface is small and familiar, and after you’ve worked with it once, you’re likely to remember how to use it.

Even if there’s no standard function or widely used package to imitate, you can keep your modules predictable by using simple data structures and doing a single, focused thing. Many of the INI-file parsing modules on NPM provide a function that directly reads such a file from the hard disk and parses it, for example. This makes it impossible to use such modules in the browser, where we don’t have direct file system access, and adds complexity that would have been better addressed by composing the module with some file-reading function. This points to another helpful aspect of module design—the ease with which something can be composed with other code. Focused modules that compute values are applicable in a wider range of programs than bigger modules that perform complicated actions with side effects. An INI file reader that insists on reading the file from disk is useless in a scenario where the file’s content comes from some other source.

Relatedly, stateful objects are sometimes useful or even necessary, but if something can be done with a function, use a function. Several of the INI file readers on NPM provide an interface style that require you to first create an object, then load the file into your object, and finally use specialized methods to get at the results. This type of thing is common in the object-oriented tradition, and it’s terrible. Instead of making a single function call and moving on, you have to perform the ritual of moving your object through various states. And because the data is now wrapped in a specialized object type, all code that interacts with it has to know about that type, creating unnecessary interdependencies. Often defining new data structures can’t be avoided—only a few basic ones are provided by the language standard, and many types of data have to be more complex than an array or a map. But when an array suffices, use an array.There is no single obvious way to represent a graph in JavaScript. In that chapter, we used an object whose properties hold arrays of strings—the other nodes reachable from that node.

There are several different pathfinding packages on NPM, but none of them uses this graph format. They usually allow the graph’s edges to have a weight, which is the cost or distance associated with it. That isn’t possible in our representation. For example, there’s the dijkstrajs package. A well-known approach to pathfinding, quite similar to our findRoute function, is called Dijkstra’s algorithm, after Edsger Dijkstra, who first wrote it down. The js suffix is often added to package names to indicate the fact that they are written in JavaScript. This dijkstrajs package uses a graph format similar to ours, but instead of arrays, it uses objects whose property values are numbers—the weights of the edges. So if we wanted to use that package, we’d have to make sure that our graph was stored in the format it expects. All edges get the same weight since our simplified model treats each road as having the same cost (one turn).

const {find_path} = require(“dijkstrajs”);

let graph = {};

for (let node of Object.keys(roadGraph)) { let edges = graph[node] = {};

for (let dest of roadGraph[node]) {

edges[dest] = 1;

}

}

console.log(find_path(graph, “Post Office”, “Cabin”));

// → [“Post Office”, “Alice’s House”, “Cabin”]

This can be a barrier to composition—when various packages are using different data structures to describe similar things, combining them is difficult. Therefore, if you want to design for composability, find out what data structures other people are using and, when possible, follow their example.

Asynchronous Programming

The central part of a computer, the part that carries out the individual steps that make up our programs, is called the processor. The programs we have seen so far are things that will keep the processor busy until they have finished their work. The speed at which something like a loop that manipulates numbers can be executed depends pretty much entirely on the speed of the processor. But many programs interact with things outside of the processor. For example, they may communicate over a computer network or request data from the hard disk—which is a lot slower than getting it from memory. When such a thing is happening, it would be a shame to let the processor sit idle—there might be some other work it could do in the meantime. In part, this is handled by your operating system, which will switch the processor between multiple running programs. But that doesn’t help when we want a single program to be able to make progress while it is waiting for a network request.

Asynchronicity

In a synchronous programming model, things happen one at a time. When you call a function that performs a long-running action, it returns only when the action has finished and it can return the result. This stops your program for the time the action takes. An asynchronous model allows multiple things to happen at the same time. When you start an action, your program continues to run. When the action finishes, the program is informed and gets access to the result (for example, the data read from disk).

We can compare synchronous and asynchronous programming using a small example: a program that fetches two resources from the network and then combines results. In a synchronous environment, where the request function returns only after it has done its work, the easiest way to perform this task is to make the requests one after the other. This has the drawback that the second request will be started only when the first has finished. The total time taken will be at least the sum of the two response times. The solution to this problem, in a synchronous system, is to start additional threads of control. A thread is another running program whose execution may be interleaved with other programs by the operating system—since most modern computers contain multiple processors, multiple threads may even run at the same time, on different processors. A second thread could start the second request, and then both threads wait for their results to come back, after which they resynchronize to combine their results.

In the following diagram, the thick lines represent time the program spends running normally, and the thin lines represent time spent waiting for the network. In the synchronous model, the time taken by the network is part of the timeline for a given thread of control. In the asynchronous model, starting a network action conceptually causes a split in the timeline. The program that initiated the action continues running, and the action happens alongside it, notifying the program when it is finished.

Another way to describe the difference is that waiting for actions to finish is implicit in the synchronous model, while it is explicit, under our control, in the asynchronous one. Asynchronicity cuts both ways. It makes expressing programs that do not fit the straight-line model of control easier, but it can also make expressing programs that do follow a straight line more awkward. We’ll see some ways to address this awkwardness later in the chapter. Both of the important JavaScript programming platforms—browsers and Node.js—make operations that might take a while asynchronous, rather than relying on threads. Since programming with threads is notoriously hard (understanding what a program does is much more difficult when it’s doing multiple things at once), this is generally considered a good thing.

Crow tech

Most people are aware of the fact that crows are very smart birds. They can use tools, plan ahead, remember things, and even communicate these things among themselves. What most people don’t know is that they are capable of many things that they keep well hidden from us.

For example, many crow cultures have the ability to construct computing devices. These are not electronic, as human computing devices are, but operate through the actions of tiny insects, a species closely related to the termite, which has developed a symbiotic relationship with the crows. The birds provide them with food, and in return the insects build and operate their complex colonies that, with the help of the living creatures inside them, perform computations. Such colonies are usually located in big, long-lived nests. The birds and insects work together to build a network of bulbous clay structures, hidden between the twigs of the nest, in which the insects live and work.

To communicate with other devices, these machines use light signals. The crows embed pieces of reflective material in special communication stalks, and the insects aim these to reflect light at another nest, encoding data as a sequence of quick flashes. This means that only nests that have an unbroken visual connection can communicate. Our friend the corvid expert has mapped the network of crow nests in the village of Hières-sur-Amby, on the banks of the river Rhône. This map shows the nests and their connections:

In an astounding example of convergent evolution, crow computers run JavaScript. In this chapter we’ll write some basic networking functions for them.

Callbacks

One approach to asynchronous programming is to make functions that perform a slow action take an extra argument, a callback function. The action is started, and when it finishes, the callback function is called with the result. As an example, the setTimeout function, available both in Node.js and in browsers, waits a given number of milliseconds (a second is a thousand milliseconds) and then calls a function.

setTimeout(() => console.log(“Tick”), 500);

Waiting is not generally a very important type of work, but it can be useful when doing something like updating an animation or checking whether something is taking longer than a given amount of time. Performing multiple asynchronous actions in a row using callbacks means that you have to keep passing new functions to handle the continuation of the computation after the actions.

Most crow nest computers have a long-term data storage bulb, where pieces of information are etched into twigs so that they can be retrieved later. Etching, or finding a piece of data, takes a moment, so the interface to long-term storage is asynchronous and uses callback functions. Storage bulbs store pieces of JSON-encodable data under names. A crow might store information about the places where it’s hidden food under the name “food caches”, which could hold an array of names that point at other pieces of data, describing the actual cache. To look up a food cache in the storage bulbs of the Big Oak nest, a crow could run code like this:

import {bigOak} from “./crow-tech”;

bigOak.readStorage(“food caches”, caches => {

let firstCache = caches[0];

bigOak.readStorage(firstCache, info => {

console.log(info);

      });
});

(All binding names and strings have been translated from crow language to English.) This style of programming is workable, but the indentation level increases with each asynchronous action because you end up in another function. Doing more complicated things, such as running multiple actions at the same time, can get a little awkward. Crow nest computers are built to communicate using request-response pairs.

That means one nest sends a message to another nest, which then immediately sends a message back, confirming receipt and possibly including a reply to a question asked in the message. Each message is tagged with a type, which determines how it is handled. Our code can define handlers for specific request types, and when such a request comes in, the handler is called to produce a response. The interface exported by the “./crow-tech” module provides callback-based functions for communication. Nests have a send method that sends off a request. It expects the name of the target nest, the type of the request, and the content of the request as its first three arguments, and it expects a function to call when a response comes in as its fourth and last argument.

bigOak.send(“Cow Pasture”, “note”, “Let’s caw loudly at 7PM”, () => console.log(“Note delivered.”));

But to make nests capable of receiving that request, we first have to define a request type named “note”. The code that handles the requests has to run not just on this nest-computer but on all nests that can receive messages of this type. We’ll just assume that a crow flies over and installs our handler code on all the nests.

import {defineRequestType} from “./crow-tech”;

defineRequestType(“note”, (nest, content, source, done) => {

console.log(`${nest.name} received note: ${content}`);

done();

});

The defineRequestType function defines a new type of request. The example adds support for “note” requests, which just sends a note to a given nest. Our implementation calls console.log so that we can verify that the request arrived. Nests have a name property that holds their name.

The fourth argument given to the handler, done, is a callback function that it must call when it is done with the request. If we had used the handler’s return value as the response value, that would mean that a request handler can’t itself perform asynchronous actions. A function doing asynchronous work typically returns before the work is done, having arranged for a callback to be called when it completes. So we need some asynchronous mechanism—in this case, another callback function—to signal when a response is available. In a way, asynchronicity is contagious. Any function that calls a function that works asynchronously must itself be asynchronous, using a callback or similar mechanism to deliver its result. Calling a callback is somewhat more involved and error-prone than simply returning a value, so needing to structure large parts of your program that way is not great.

Promises

Working with abstract concepts is often easier when those concepts can be represented by values. In the case of asynchronous actions, you could, instead of arranging for a function to be called at some point in the future, return an object that represents this future event. This is what the standard class Promise is for. A promise is an asynchronous action that may complete at some point and produce a value. It is able to notify anyone who is interested when its value is available. The easiest way to create a promise is by calling Promise.resolve. This function ensures that the value you give it is wrapped in a promise. If it’s already a promise, it is simply returned—otherwise, you get a new promise that immediately finishes with your value as its result.

let fifteen = Promise.resolve(15);

fifteen.then(value => console.log(`Got ${value}`));

// → Got 15

To get the result of a promise, you can use its then method. This registers a callback function to be called when the promise resolves and produces a value. You can add multiple callbacks to a single promise, and they will be called, even if you add them after the promise has already resolved (finished). But that’s not all the then method does. It returns another promise, which resolves to the value that the handler function returns or, if that returns a promise, waits for that promise and then resolves to its result.

It is useful to think of promises as a device to move values into an asynchronous reality. A normal value is simply there. A promised value is a value that might already be there or might appear at some point in the future. Computations defined in terms of promises act on such wrapped values and are executed asynchronously as the values become available. To create a promise, you can use Promise as a constructor. It has a somewhat odd interface—the constructor expects a function as argument, which it immediately calls, passing it a function that it can use to resolve the promise. It works this way, instead of for example with a resolve method, so that only the code that created the promise can resolve it. This is how you’d create a promise-based interface for the readStorage funcion:

function storage(nest, name) {

return new Promise(resolve => {

nest.readStorage(name, result => resolve(result));

      });

}

storage(bigOak, “enemies”)

.then(value => console.log(“Got”, value));

This asynchronous function returns a meaningful value. This is the main advantage of promises—they simplify the use of asynchronous functions. Instead of having to pass around callbacks, promise-based functions look similar to regular ones: they take input as arguments and return their output. The only difference is that the output may not be available yet.

Failure

Regular JavaScript computations can fail by throwing an exception. Asynchronous computations often need something like that. A network request may fail, or some code that is part of the asynchronous computation may throw an exception. One of the most pressing problems with the callback style of asynchronous programming is that it makes it extremely difficult to make sure failures are properly reported to the callbacks. A widely used convention is that the first argument to the callback is used to indicate that the action failed, and the second contains the value produced by the action when it was successful. Such callback functions must always check whether they received an exception and make sure that any problems they cause, including exceptions thrown by functions they call, are caught and given to the right function.

Promises make this easier. They can be either resolved (the action finished successfully) or rejected (it failed). Resolve handlers (as registered with then) are called only when the action is successful, and rejections are automatically propagated to the new promise that is returned by then. And when a handler throws an exception, this automatically causes the promise produced by its then call to be rejected. So if any element in a chain of asynchronous actions fails, the outcome of the whole chain is marked as rejected, and no success handlers are called beyond the point where it failed.

Much like resolving a promise provides a value, rejecting one also provides one, usually called the reason of the rejection. When an exception in a handler function causes the rejection, the exception value is used as the reason. Similarly, when a handler returns a promise that is rejected, that rejection flows into the next promise. There’s a Promise.reject function that creates a new, immediately rejected promise. To explicitly handle such rejections, promises have a catch method that registers a handler to be called when the promise is rejected, similar to how then handlers handle normal resolution. It’s also very much like then in that it returns a new promise, which resolves to the original promise’s value if it resolves normally and to the result of the catch handler otherwise. If a catch handler throws an error, the new promise is also rejected.

As a shorthand, then also accepts a rejection handler as a second argument, so you can install both types of handlers in a single method call. A function passed to the Promise constructor receives a second argument, alongside the resolve function, which it can use to reject the new promise. The chains of promise values created by calls to then and catch can be seen as a pipeline through which asynchronous values or failures move. Since such chains are created by registering handlers, each link has a success handler or a rejection handler (or both) associated with it. Handlers that don’t match the type of outcome (success or failure) are ignored. But those that do match are called, and their outcome determines what kind of value comes next—success when it returns a non-promise value, rejection when it throws an exception, and the outcome of a promise when it returns one of those.

new Promise((_, reject) => reject(new Error(“Fail”)))

.then(value => console.log(“Handler 1”))

.catch(reason => {

console.log(“Caught failure ” + reason);

return “nothing”; })

.then(value => console.log(“Handler 2”, value));

// → Caught failure Error: Fail

// → Handler 2 nothing

Much like an uncaught exception is handled by the environment, JavaScript environments can detect when a promise rejection isn’t handled and will report this as an error.

Networks are hard

Occasionally, there isn’t enough light for the crows’ mirror systems to transmit a signal or something is blocking the path of the signal. It is possible for a signal to be sent but never received. As it is, that will just cause the callback given to send to never be called, which will probably cause the program to stop without even noticing there is a problem. It would be nice if, after a given period of not getting a response, a request would time out and report failure. Often, transmission failures are random accidents, like a car’s headlight interfering with the light signals, and simply retrying the request may cause it to succeed. So while we’re at it, let’s make our request function automatically retry the sending of the request a few times before it gives up.

And, since we’ve established that promises are a good thing, we’ll also make our request function return a promise. In terms of what they can express, callbacks and promises are equivalent. Callback-based functions can be wrapped to expose a promise based interface, and vice versa. Even when a request and its response are successfully delivered, the response may indicate failure—for example, if the request tries to use a request type that hasn’t been defined or the handler throws an error. To support this, send and defineRequestType follow the convention mentioned before, where the first argument passed to callbacks is the failure reason, if any, and the second is the actual result. These can be translated to promise resolution and rejection by our wrapper.

class Timeout extends Error {}

function request(nest, target, type, content) {

return new Promise((resolve, reject) => {

let done = false;

function attempt(n) {

nest.send(target, type, content, (failed, value) => {

done = true;

if (failed) reject(failed);

else resolve(value);

});

setTimeout(() => {

if (done) return;

else if (n < 3) attempt(n + 1);

else reject(new Timeout(“Timed out”));

}, 250); }

attempt(1);

});

}

Because promises can be resolved (or rejected) only once, this will work. The first time resolve or reject is called determines the outcome of the promise, and any further calls, such as the timeout arriving after the request finishes or a request coming back after another request finished, are ignored. To build an asynchronous loop, for the retries, we need to use a recursive function—a regular loop doesn’t allow us to stop and wait for an asynchronous action. The attempt function makes a single attempt to send a request. It also sets a timeout that, if no response has come back after 250 milliseconds, either starts the next attempt or, if this was the fourth attempt, rejects the promise with an instance of Timeout as the reason.

Retrying every quarter-second and giving up when no response has come in after a second is definitely somewhat arbitrary. It is even possible, if the request did come through but the handler is just taking a bit longer, for requests to be delivered multiple times. We’ll write our handlers with that problem in mind—duplicate messages should be harmless. In general, we will not be building a world class, robust network today. But that’s okay—crows don’t have very high expectations yet when it comes to computing. To isolate ourselves from callbacks altogether, we’ll go ahead and also define a wrapper for defineRequestType that allows the handler function to return a promise or plain value and wires that up to the callback for us.

function requestType(name, handler) {

defineRequestType(name, (nest, content, source,

callback) => {
try {

Promise.resolve(handler(nest, content, source))

.then(response => callback(null, response),

failure => callback(failure));

} catch (exception) {

callback(exception);} });

}

});

}

Promise.resolve is used to convert the value returned by handler to a promise if it isn’t already. Note that the call to handler had to be wrapped in a try block to make sure any exception it raises directly is given to the callback. This nicely illustrates the difficulty of properly handling errors with raw callbacks—it is easy to forget to properly route exceptions like that, and if you don’t do it, failures won’t get reported to the right callback. Promises make this mostly automatic and thus less error-prone.

Collections of promises

Each nest computer keeps an array of other nests within transmission distance in its neighbors property. To check which of those are currently reachable, you could write a function that tries to send a “ping” request (a request that simply asks for a response) to each of them and see which ones come back. When working with collections of promises running at the same time, the Promise.all function can be useful. It returns a promise that waits for all of the promises in the array to resolve and then resolves to an array of the values that these promises produced (in the same order as the original array). If any promise is rejected, the result of Promise.all is itself rejected.

requestType(“ping”, () => “pong”);

function availableNeighbors(nest) {

let requests = nest.neighbors.map(neighbor => {

return request(nest, neighbor, “ping”)

.then(() => true, () => false);

});

return Promise.all(requests).then(result => {

return nest.neighbors.filter((_, i) => result[i]);

});

}

When a neighbor isn’t available, we don’t want the entire combined promise to fail since then we still wouldn’t know anything. So the function that is mapped over the set of neighbors to turn them into request promises attaches handlers that make successful requests produce true and rejected ones produce false. In the handler for the combined promise, filter is used to remove those elements from the neighbors array whose corresponding value is false. This makes use of the fact that filter passes the array index of the current element as a second argument to its filtering function (map, some, and similar higher order array methods do the same).

Network flooding

The fact that nests can talk only to their neighbors greatly inhibits the usefulness of this network. For broadcasting information to the whole network, one solution is to set up a type of request that is automatically forwarded to neighbors. These neighbors then in turn forward it to their neighbors, until the whole network has received the message.

import {everywhere} from “./crow-tech”;

everywhere(nest => {

nest.state.gossip = [];

});

function sendGossip(nest, message, exceptFor = null) {

nest.state.gossip.push(message);

for (let neighbor of nest.neighbors) {

if (neighbor == exceptFor) continue;

request(nest, neighbor, “gossip”, message); }

}

requestType(“gossip”, (nest, message, source) => {

if (nest.state.gossip.includes(message)) return;

console.log(`${nest.name} received  gossip ‘${

message}’ from ${source}`);

sendGossip(nest, message, source);

});

To avoid sending the same message around the network forever, each nest keeps an array of gossip strings that it has already seen. To define this array, we use the everywhere function—which runs code on every nest—to add a property to the nest’s state object, which is where we’ll keep nest-local state. When a nest receives a duplicate gossip message, which is very likely to happen with everybody blindly resending them, it ignores it. But when it receives a new message, it excitedly tells all its neighbors except for the one who sent it the message. This will cause a new piece of gossip to spread through the network like an ink stain in water. Even when some connections aren’t currently working, if there is an alternative route to a given nest, the gossip will reach it through there. This style of network communication is called flooding—it floods the network with a piece of information until all nodes have it.

Message routing

If a given node wants to talk to a single other node, flooding is not a very efficient approach. Especially when the network is big, that would lead to a lot of useless data transfers. An alternative approach is to set up a way for messages to hop from node to node until they reach their destination. The difficulty with that is it requires knowledge about the layout of the network. To send a request in the direction of a faraway nest, it is necessary to know which neighboring nest gets it closer to its destination. Sending it in the wrong direction will not do much good.

Since each nest knows only about its direct neighbors, it doesn’t have the information it needs to compute a route. We must somehow spread the information about these connections to all nests, preferably in a way that allows it to change over time, when nests are abandoned or new nests are built. We can use flooding again, but instead of checking whether a given message has already been received, we now check whether the new set of neighbors for a given nest matches the current set we have for it.

requestType(“connections”, (nest, {name, neighbors}, source) => {

let connections = nest.state.connections;

if (JSON.stringify(connections.get(name)) ==

JSON.stringify(neighbors)) return;

connections.set(name, neighbors);

broadcastConnections(nest, name, source);

});

function broadcastConnections(nest, name, exceptFor = null) {

for (let neighbor of nest.neighbors) {

if (neighbor == exceptFor) continue;

request(nest, neighbor, “connections”, {

name,

neighbors: nest.state.connections.get(name)

});

   }

}

everywhere(nest => {

nest.state.connections = new Map;

nest.state.connections.set(nest.name, nest.neighbors);

broadcastConnections(nest, nest.name);

});

The comparison uses JSON.stringify because ==, on objects or arrays, will return true only when the two are the exact same value, which is not what we need here. Comparing the JSON strings is a crude but effective way to compare their content. The nodes immediately start broadcasting their connections, which should, unless some nests are completely unreachable, quickly give every nest a map of the current network graph. A thing you can do with graphs is find routes in them. If we have a route toward a message’s destination, we know which direction to send it in.

This findRoute function, which greatly resembles the findRoute , searches for a way to reach a given node in the network. But instead of returning the whole route, it just returns the next step. That next nest will itself, using its current information about the network, decide where it sends the message.

function findRoute(from, to, connections) {

let work = [{at: from, via: null}];

for (let i = 0; i < work.length; i++) { let {at, via} = work[i];

for (let next of connections.get(at) || []) { if (next == to) return via;

if (!work.some(w => w.at == next)) {

work.push({at: next, via: via || next});

        }

   }

}
return null;
}

Now we can build a function that can send long-distance messages. If the message is addressed to a direct neighbor, it is delivered as usual. If not, it is packaged in an object and sent to a neighbor that is closer to the target, using the “route” request type, which will cause that neighbor to repeat the same behavior.

function routeRequest(nest, target, type, content) {

if (nest.neighbors.includes(target)) {

return request(nest, target, type, content);

} else {

let via = findRoute(nest.name, target,

nest.state.connections);

if (!via) throw new Error(`No route to ${target}`);

return request(nest, via, “route”,

{target, type, content});

}

}

requestType(“route”, (nest, {target, type, content}) => {

return routeRequest(nest, target, type, content);

});

We’ve constructed several layers of functionality on top of a primitive communication system to make it convenient to use. This is a nice (though simplified) model of how real computer networks work. A distinguishing property of computer networks is that they aren’t reliable— abstractions built on top of them can help, but you can’t abstract away network failure. So network programming is typically very much about anticipating and dealing with failures.

Async functions

To store important information, crows are known to duplicate it across nests. That way, when a hawk destroys a nest, the information isn’t lost. To retrieve a given piece of information that it doesn’t have in its own storage bulb, a nest computer might consult random other nests in the network until it finds one that has it.

requestType(“storage”, (nest, name) => storage(nest, name));

function findInStorage(nest, name) {

return storage(nest, name).then(found => {

if (found != null) return found;

else return findInRemoteStorage(nest, name);

});

}

function network(nest) {

return Array.from(nest.state.connections.keys());

}

function findInRemoteStorage(nest, name) {

let sources = network(nest).filter(n => n != nest.name);

function next() {

if (sources.length == 0) {

return Promise.reject(new Error(“Not found”));

} else {

let source = sources[Math.floor(Math.random() * sources.length)];

sources = sources.filter(n => n != source);

return routeRequest(nest, source, “storage”, name)

.then(value => value != null ? value : next(), next);

   } 

}

return next();

}

Because connections is a Map, Object.keys doesn’t work on it. It has a keys method, but that returns an iterator rather than an array. An iterator (or iterable value) can be converted to an array with the Array.from function. Even with promises this is some rather awkward code. Multiple asynchronous actions are chained together in non-obvious ways. We again need a recursive function (next) to model looping through the nests. And the thing the code actually does is completely linear—it always waits for the previous action to complete before starting the next one. In a synchronous programming model, it’d be simpler to express.

The good news is that JavaScript allows you to write pseudo-synchronous code to describe asynchronous computation. An async function is a function that implicitly returns a promise and that can, in its body, await other promises in a way that looks synchronous. We can rewrite findInStorage like this:

async function findInStorage(nest, name) {

   let local = await storage(nest, name);

   if (local != null) return local;

let sources = network(nest).filter(n => n != nest.name);

while (sources.length > 0) {

    let source = sources[Math.floor(Math.random() * sources.length)];

sources = sources.filter(n => n != source);

try {

  let found = await routeRequest(nest, source, “storage”, name);

   if (found != null) return found;

  } catch (_) {}

}

throw new Error(“Not found”);

}

An async function is marked by the word async before the function keyword. Methods can also be made async by writing async before their name. When such a function or method is called, it returns a promise. As soon as the body returns something, that promise is resolved. If it throws an exception, the promise is rejected. Inside an async function, the word await can be put in front of an expression to wait for a promise to resolve and only then continue the execution of the function.

Such a function no longer, like a regular JavaScript function, runs from start to completion in one go. Instead, it can be frozen at any point that has an await, and can be resumed at a later time. For non-trivial asynchronous code, this notation is usually more convenient than directly using promises. Even if you need to do something that doesn’t fit the synchronous model, such as perform multiple actions at the same time, it is easy to combine await with the direct use of promises.

Generators

This ability of functions to be paused and then resumed again is not exclusive to async functions. JavaScript also has a feature called generator functions. These are similar, but without the promises. When you define a function with function* (placing an asterisk after the word function), it becomes a generator. When you call a generator, it returns an iterator.

function* powers(n) {

for (let current = n;; current *= n) { yield current;

}

}

for (let power of powers(3)) {

if (power > 50) break;

console.log(power);

}

// → 3

// → 9

// → 27

Initially, when you call powers, the function is frozen at its start. Every time you call next on the iterator, the function runs until it hits a yield expression, which pauses it and causes the yielded value to become the next value produced by the iterator. When the function returns (the one in the example never does), the iterator is done. Writing iterators is often much easier when you use generator functions. The iterator for the group class  can be written with this generator:

Group.prototype[Symbol.iterator] = function*() {

for (let i = 0; i < this.members.length; i++) {

yield this.members[i];

}

};

There’s no longer a need to create an object to hold the iteration state— generators automatically save their local state every time they yield. Such yield expressions may occur only directly in the generator function itself and not in an inner function you define inside of it. The state a generator saves, when yielding, is only its local environment and the position where it yielded. An async function is a special type of generator. It produces a promise when called, which is resolved when it returns (finishes) and rejected when it throws an exception. Whenever it yields (awaits) a promise, the result of that promise (value or thrown exception) is the result of the await expression.

The event loop

Asynchronous programs are executed piece by piece. Each piece may start some actions and schedule code to be executed when the action finishes or fails. In between these pieces, the program sits idle, waiting for the next action. So callbacks are not directly called by the code that scheduled them. If we  call setTimeout from within a function, that function will have returned by the time the callback function is called. And when the callback returns, control does not go back to the function that scheduled it. Asynchronous behavior happens on its own empty function call stack. This is one of the reasons that, without promises, managing exceptions across asynchronous code is hard. Since each callback starts with a mostly empty stack, your catch handlers won’t be on the stack when they throw an exception.

try {

setTimeout(() => {

throw new Error(“Woosh”); }, 20);

} catch (_) {

// This will not run

console.log(“Caught!”);

}

No matter how closely together events—such as timeouts or incoming requests— happen, a JavaScript environment will run only one program at a time. You can think of this as it running a big loop around your program, called the event loop. When there’s nothing to be done, that loop is stopped. But as events come in, they are added to a queue, and their code is executed one after the other. Because no two things run at the same time, slowrunning code might delay the handling of other events. This example sets a timeout but then dallies until after the timeout’s intended point of time, causing the timeout to be late.

let start = Date.now();

setTimeout(() => {

console.log(“Timeout ran at”, Date.now() – start);

}, 20);

while (Date.now() < start + 50) {}

console.log(“Wasted time until”, Date.now() – start);

// → Wasted time until 50
// → Timeout ran at 55

Promises always resolve or reject as a new event. Even if a promise is already resolved, waiting for it will cause your callback to run after the current script finishes, rather than right away.

Promise.resolve(“Done”).then(console.log);

console.log(“Me first!”);

// → Me first!
// → Done

In later chapters we’ll see various other types of events that run on the event loop.

Asynchronous bugs

When your program runs synchronously, in a single go, there are no state changes happening except those that the program itself makes. For asynchronous programs this is different—they may have gaps in their execution during which other code can run. Let’s look at an example. One of the hobbies of our crows is to count the number of chicks that hatch throughout the village every year. Nests store this count in their storage bulbs. The following code tries to enumerate the counts from all the nests for a given year:

function anyStorage(nest, source, name) {

if (source == nest.name) return storage(nest, name);

else return  routeRequest(nest, source, “storage”, name);

}

async function chicks(nest, year) {

let list = “”;

await Promise.all(network(nest).map(async name => {

list += `${name}: ${

await anyStorage(nest, name, `chicks in ${year}`)

}\n`;

}));

return list;

}

The async name => part shows that arrow functions can also be made async by putting the word async in front of them. The code doesn’t immediately look suspicious…it maps the async arrow function over the set of nests, creating an array of promises, and then uses Promise .all to wait for all of these before returning the list they build up. But it is seriously broken. It’ll always return only a single line of output, listing the nest that was slowest to respond.

The problem lies in the += operator, which takes the current value of list at the time where the statement starts executing and then, when the await finishes, sets the list binding to be that value plus the added string. But between the time where the statement starts executing and the time where it finishes there’s an asynchronous gap. The map expression runs before anything has been added to the list, so each of the += operators starts from an empty string and ends up, when its storage retrieval finishes, setting list to a singleline list—the result of adding its line to the empty string.

This could have easily been avoided by returning the lines from the mapped promises and calling join on the result of Promise.all, instead of building up the list by changing a binding. As usual, computing new values is less error-prone than changing existing values.

async function chicks(nest, year) {

let lines = network(nest).map(async name => {

return name + “: ” +

await anyStorage(nest, name, `chicks in ${year}`);

});

return (await Promise.all(lines)).join(“\n”);

}

Mistakes like this are easy to make, especially when using await, and you should be aware of where the gaps in your code occur. An advantage of JavaScript’s explicit asynchronicity (whether through callbacks, promises, or await) is that spotting these gaps is relatively easy.

Summary

Mistakes and bad input are facts of life. An important part of programming is finding, diagnosing, and fixing bugs. Problems can become easier to notice if you have an automated test suite or add assertions to your programs. Problems caused by factors outside the program’s control should usually be handled gracefully. Sometimes, when the problem can be handled locally, special return values are a good way to track them. Otherwise, exceptions may be preferable.

Throwing an exception causes the call stack to be unwound until the next enclosing try/catch block or until the bottom of the stack. The exception value will be given to the catch block that catches it, which should verify that it is actually the expected kind of exception and then do something with it. To help address the unpredictable control flow caused by exceptions, finally blocks can be used to ensure that a piece of code always runs when a block finishes.

Regular expressions are objects that represent patterns in strings. They use their own language to express these patterns.

/abc/            A sequence of characters

/[abc]/        Any character from a set of characters

/[^abc]/     Any character not in a set of characters

/[0-9]/       Any character in a range of characters

/x+/            One or more occurrences of the pattern x

/x+?/          One or more occurrences, nongreedy

/x*/             Zero or more occurrences

/x?/            Zero or one occurrence

/x{2,4}/    Two to four occurrences

/(abc)/      A group

/a|b|c/     Any one of several patterns

/\d/          Any digit character

/\w/          An alphanumeric character (“word character”)

/\s/           Any whitespace character

/ . /            Any character except newlines

/\b/           A word boundary

/^/             Start of input

/$/             End of input

A regular expression has a method test to test whether a given string matches it. It also has a method exec that, when a match is found, returns an array containing all matched groups. Such an array has an index property that indicates where the match started. Strings have a match method to match them against a regular expression and a search method to search for one, returning only the starting position of the match. Their replace method can replace matches of a pattern with a replacement string or function.

Regular expressions can have options, which are written after the closing slash. The i option makes the match case insensitive. The g option makes the expression global, which, among other things, causes the replace method to replace all instances instead of just the first. The y option makes it sticky, which means that it will not search ahead and skip part of the string when looking for a match. The u option turns on Unicode mode, which fixes a number of problems around the handling of characters that take up two code units. Regular expressions are a sharp tool with an awkward handle. They simplify some tasks tremendously but can quickly become unmanageable when applied to complex problems. Part of knowing how to use them is resisting the urge to try to shoehorn things that they cannot cleanly express into them.

Modules provide structure to bigger programs by separating the code into pieces with clear interfaces and dependencies. The interface is the part of the module that’s visible from other modules, and the dependencies are the other modules that it makes use of. Because JavaScript historically did not provide a module system, the CommonJS system was built on top of it. Then at some point it did get a built-in system, which now coexists uneasily with the CommonJS system. A package is a chunk of code that can be distributed on its own. NPM is a repository of JavaScript packages. You can download all kinds of useful (and useless) packages from it.

Asynchronous programming makes it possible to express waiting for long- running actions without freezing the program during these actions. JavaScript environments typically implement this style of programming using callbacks, functions that are called when the actions complete. An event loop schedules such callbacks to be called when appropriate, one after the other, so that their execution does not overlap. Programming asynchronously is made easier by promises, objects that represent actions that might complete in the future, and async functions, which allow you to write an asynchronous program as if it were synchronous.

This Is A Custom Widget

This Sliding Bar can be switched on or off in theme options, and can take any widget you throw at it or even fill it with your custom HTML Code. Its perfect for grabbing the attention of your viewers. Choose between 1, 2, 3 or 4 columns, set the background color, widget divider color, activate transparency, a top border or fully disable it on desktop and mobile.

This Is A Custom Widget

This Sliding Bar can be switched on or off in theme options, and can take any widget you throw at it or even fill it with your custom HTML Code. Its perfect for grabbing the attention of your viewers. Choose between 1, 2, 3 or 4 columns, set the background color, widget divider color, activate transparency, a top border or fully disable it on desktop and mobile.