Extracting Parameters from URL Paths

CSCI-UA.0480-008

We're going to be talking about URLs…

URLs are Important!

URLs determine how your content / site is organized. They should be designed with clarity and longevity (we want URLs that don't disappear or change) in mind! What makes a good URL?

  • it makes it obvious what content or resource you're retrieving!
    • a list of blog posts may go under /posts or /post
    • a single blog post my go in /post/[date]/[name-of-post]
    • it helps identify a specific resource: order/762190
  • a URL should be human readable when possible:
    • post/how-to-make-good-urls
    • this is an example of a slug, a unique, short name with special characters (such as spaces) replaced by hyphens
  • sometimes you may want to include an action in your url
    • form pages or URLs that you post to may belong under: post/create

Some Technical Considerations and Conventions

In general, good URLs are meaningful (their relevance to your content helps site usability and even search engine optimization)!

Some additional best practices that should be followed when creating URLs include: →

  • never expose technical details (for example, extensions that reveal the technology stack that you use, such as .asp or .php)
  • avoid meaningless information (although we used it in a previous homework, /home as a prefix to other URLs is superfluous, such as /home/about… when about can just come off of root)
  • avoid needlessly long URLs
  • be consistent with word separators (hyphen seems to be an accepted convention)
  • never use whitespace or untypable characters
  • use lowercase for your URLs

Ok, great. We have a meaningful URL… how do we deal with pesky clients that ask for that URL?

Routing

One of the advantages of using a web framework is that most frameworks come with routing.

Routing is the mechanism by which requests (as specified by a URL and an HTTP method) are routed to the code that handles the request.

How does routing usually work for URLs in our application? How about routing for static files?

  • a url is requested…
  • and we map those urls to callback functions in our app or router objects (app.get, app.post or router.get and router.post)
  • or the path that's specified is used by the static files middleware to retrieve the contents of the file matching that path, starting from some folder on the file system
  • (of course, other middleware may provide responses as well)

Routers Gonna Route

Which brings us to… what's a router again (an actual router object)?

A router is an isolated instance of route handlers and middleware. It's an object that's essentially a mini-application

  • you can define routes (or route handlers - the HTTP verb methods, path and callback)
  • you can also use middleware in a router


A few other notes:

  • routers are just middleware … so to load a Router, just pass it into app.use
  • you can't do other things that the application object can do, like listen.

Defining Paths

In our previous examples of route handlers, we've matched paths exactly (well, with the exception of trailing slashes and casing):

/about

Sometimes, an exact match isn't what we want, though. In some cases we may want a single route handler for multiple, similar, paths (for example posts/some-post-title may always map to a route handler that retrieves a post with some-post-title).

Paths and Regular Expressions

Route handlers can use regular expressions to capture incoming requests that are all going to similar paths. We can specify patterns to match URLs.

A regular expression is a series of characters that define a pattern. These patterns can be made up of:

  • simple characters - characters that you want to match directly
  • special characters - characters that specify some a pattern rather than a direct match

Regular Expressions

What are some examples of regular expression special characters?

  • . - any character
  • \w - any word character
  • \d - any digit character
  • [xyz] - one of any of these characters
  • [^xyz] - any character that's not in this set of characters
  • ^ - beginning of line
  • $ - end of line
  • {n} - n of the preceding
  • {n,m} - at least n and at most m of the preceding
  • ? - 0 or 1 of the preceding
  • * - 0 or more of the preceding
  • + - 1 or more of the preceding

Regular Expressions in JavaScript

In JavaScript, regular expressions are bounded by forward slashes (they're not strings, so no quotation marks).

Here are a few examples of regular expressions using a String's match method (searches for regular expression in string):


'hello'.match(/ell/) // exactly ell
'swell'.match(/.ell/) // any character and ell
'hello'.match(/^.ell/) // starts with any character and ell
'swell'.match(/^.ell/) // starts with any character and ell

// these all demonstrate how to specify number of matches
'hello'.match(/el*/) // e, then 0 or more l's
'he'.match(/el*/) // e, then 0 or more l's
'hello'.match(/el+/) // e, then 1 or more l's
'he'.match(/el+/) // e, then 1 or more l's
'helllllo'.match(/el+/) //  e, then 1 or more l's
'hello'.match(/el{1,2}/) // e, then at least one l, at most 2 l's
'helllllo'.match(/el{1,2}/) // e, then at least one l, at most 2 l's

Some More Examples!

  • /\d\d\d/ - 3 digits
  • /h.\*$/ - h followed by 0 or more of any character up to the end of the line
  • ^\w\d?\d?$ - one letter at the beginning of a line followed by exactly 0, 1 or 2 digits

Let's See This Work

In one of route handlers, let's try to use a regular expression that matches the following URLs:


/class01
/class02
...
etc.

// notice that the first argument, the path, is a regular expression
router.get(/class\d\d/, function(req, res) {
	res.send('All the classes!');
});

Regular Expressions Continued

What path would you specify in your router to make all of these URLs match?

  • /jam
  • /jem
  • /jaaaaam
  • /jingoism


But doesn't match

  • /jm
  • /ajam



router.get(/^\/j.+m$/, function(req, res) {
	res.send('Matched');
});

That Was Neat and All, But…

What if the path you're responding to has some meaningful information trapped in the URL? For example, maybe we want take the class number out of our class\d\d URL?

Or perhaps you've seen a URL like this:


posts/2015-10-27/paths-are-great

What are some bits of this URL that may be important to our applications?

  • the date
  • the slug

Extracting Values From Paths

We can capture the values in a path by:

  • specifying a path - as a string - with a part that's prefixed by a colon for every value we want to capture
  • using req.params to access that variable



'/some/other/parts/:var1/:var2'

var1 and var2 can be accessed through:


req.params.var1
req.params.var2

A Full Example of Extracting Parameters from a URL


router.get('/some/other/parts/:var1/:var2', function(req, res) {
	res.send(req.params.var1 + ', ' + req.params.var2);
});

In your browser:


http://localhost:3000/some/other/parts/hello/world

Capturing Bits of a Regular Expression

We can also group parts of a regular expression so that they're captured in params as well!.

  • surround the part you'd like to capture with parentheses: /class(\d\d)/ (captures the 2 digits after class)
  • reference that part by indexing into req.params, with 0 being the first group, 1 the next, etc. … req.params[0]


Using our previous /class\d\d/class example… to grab just the digits, we could:


router.get(/class(\d\d)/, function(req, res) {
  res.send(req.params[0]);
});