MongoDB

CSCI-UA.0480-008

So You Want to Persist Data

In our homework and class examples, where did we store our data? →

  • in a global variable in our Express application
  • and where does that application live?
  • in memory!


What are some downsides to storing data as part of our application in memory?

  • when you restart the server, you lose that data!

Storing Data

What are some other options… let's list as many as we can.

  • on the filesystem
  • in the cloud (S3, firebase, parse, SalesForce)
  • in a database


We'll be using a database in this part of the class…

SO MANY DATABASES

We can categorize databases as:

  • relational
  • nosql (also non-relational)


nosql databases can be further categorized by the data model they use:

  • key-value
  • document
  • column
  • graph

Warning: Broad Generalizations Coming Up!

So, the following slides include high level overviews of different kinds of databases.

  • I'm going to use broad generalizations
  • but there are always exceptions
  • for example:
    • although PostgreSQL is considered a relational databases, it has a built-in data type for key-value storage!
    • many document stores can be used as key-value stores
    • some document stores are also relational!

Relational Databases

Relational databases organize data in a collection of tables (relations). Can you describe characterstics of a relational database?

  • each table has named columns… with the actual data that populates the table in separate rows
  • each table row has primary key that:
    • uniquely identifies that row
    • allows data in one table to be related to data in another (via foreign key relationships)

Relational Databases Continued

Regarding additional relational database features…

  • relational databases are typically pretty rigid:
    • highly structured
    • you have to define the columns and the types of columns before inserting rows
    • has a lot of features for maintaining data integrity (such user defined data constraints, foreign keys, etc.)
  • some relational databases guarantee that transactions (or changes in the database) are reliable
  • see ACID compliance - Atomicity, Consistency, Isolation, Durability
  • this database consultant has a pretty good write-up on relational databases

Aside on ACID

  • Atomicity - each transaction / (series of operations in a transaction) is all or nothing
  • Consistency - every transaction ensures that the resulting database state is valid (goes from one valid state to another)
  • Isolation - a failed transaction should have no effect on other transactions (even if the transactions are concurrent)
  • Durability - once a transaction / operation is done, the results will remain persistent even through crash, power loss, etc.

Quick Demo of Designing a Data Model for a Relational Database

Maybe we want to store these fields:

  • first name
  • last name
  • street address
  • city
  • state
  • zip


Let's get to it!

Examples of Relational Databases

What are some examples of relational databases?

  • MySQL
  • PostgreSQL
  • Oracle
  • Microsoft SQL Server


These are all great choices for storing highly structured data, related data.

They are all in common usage for conventional web applications. However, there's a bit of a learning curve, and some are difficult to set up.

NoSQL Databases

NoSQL databases can be categorized by how they store their data:

  • key-value
  • document
  • column
  • graph
  • there are others
  • note that nosql databases can have reliable transactions as well, but this is usually not the focus of a nosql database


We'll focus on key-value and document stores

Key Value Store

Probably the most simple conceptually… data is stored in key/value pairs. This should sound similar to some data structures that you've seen before.

  • maybe a hash
  • or a dictionary
  • or an associative array


They're typically good at scaling to handle large amounts of data and dealing with high volumes of changes in data.


What may be some good applications for key value stores?

  • caching
  • storing sessions!

Key Value Store Examples

Some key value databases include:

  • Redis (a popular backend for queuing)
  • Memcache (as the name implies, typically used for caching)
  • Riak
  • many others

Document Stores

As you might guess by the name, document stores organize data semi-structured documents.

  • think JSON (but there are many possible formats, such as XML, YAML, etc.)
  • or… a richer key-value store (there's meta data within the document… the keys are usually meaningful)
  • typically, no schema is required (that is, data types of values are inferred from values)
  • typically, semi structured (documents, property names, etc… do not have to be pre-defined)
  • some document stores are particularly featureful when it comes to high availability and scaling (through replication/redundancy and sharding/separating large databases into smaller ones)


They're particularly good for applications where flexible data storage or constantly changing data storage is required.

Document Store Examples

Two of the most popular NoSQL databases are:

  • MongoDB
  • CouchDB


Of course, there are a bunch of others


Some use cases for document stores include:

  • applications that require semi structured data / data that has does not have rigid requirements (perhaps a resume)
  • again, large volumes of data
  • fluid data or data whose structure is prone to change

So Which One are We Using?

We're using MongoDB. Not for all of the reasons we previously mentioned, though… We're using it because…

  • it uses a JSON like data structure (we know JSON)
  • it's query language is JavaScript (we know JavaScript syntax)
  • it's not very rigid when it comes to dealing with data (we don't have to be so precise/exacting)
  • it's fairly straightforward to set up, usually with little / no configuration required
  • (to the point where the default installation doesn't even require a username/password to connect to the database – wait, that's not so good!?)


All this can pretty much be summed up by saying that it's easy to use! (As an aside, I'm a bit biased to using relational databases, specifically Postgres)

MongoDB

  • MongoDB will be our data store that we use for this part of the course
  • It's a nosql database…
  • Specifically, it's a document store
    • a single record in Mongo is a document (a user, a bird in the case of our homework!)
    • a document is a bunch of key value pairs…
    • hey… that sounds like…
    • documents are similar to JSON objects (actually BSON?)

Documents and Collections

A couple of terms to remember (yay, definitions again!)

  • key - a field name - analogous to a column in a relational database
  • value - obvs, a value
  • document - a single object or record in our database,
    • consists of key value pairs
    • similar to a single row in a relational database
  • collection - a group of documents
    • analogous to tables in relational databases

Data Types

Although MongoDB doesn't require you to pre-define the types of values that your documents will have, it does have data types. These types are inferred from the value. Some available types include:

  • string - an empty string or an ordered sequence characters
  • numeric types - such as integer, double (float)
  • boolean - true / false
  • array - a list of values
  • timpestamp - 64 bit value where first 32 bits are seconds since the Unix epoch
  • Object ID every MongoDB object or document must have an Object ID which is unique


More about Object ID: a 12-byte binary value which has a very rare chance of duplication; consists of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter

Installation

Comprehensive docs are here

  • basically, just use the appropriate installer from their downloads page
  • if you use a package manager, do that instead
    • they have .debs for Debian and Ubuntu
    • since I'm on OSX, and I use homebrew, I used brew install mongodb
  • starting will vary based on OS
  • you may need to create and/or specify a directory where your data will be stored, so if mongo doesn't start up, it's missing its data directory

A Whirlwind Tour

Working with MongoDB on the commandline…

If your OS doesn't autostart by default, you can run:


mongod

To connect via the commandline MongoDB client and connect to a locally running instance:


mongo

This drops you into the MongoDB shell (yay… more shell). You can issue commands that

  • inspect the database
  • modify and create documents and collections
  • find documents

Some Commands

The following commands can be used to navigate, create and remove databases and collections

  • show databases - show available databases (remember, there can be more than one database)
  • use db - work with a specific database (if unspecified, the default database connected to is test)
  • show collections - once a db is selected, show the collections within the database
  • db.dropDatabase() - drop (remove) the database that you're currently in
  • db.collectionName.drop() - drop (remove) the collection named collectionName

To get some inline help:

  • help - get help on available commands

Starting Out

To begin using the commandline client to inspect your data:

  1. make sure that mongod is running in a different window (or running in the background or as a daemon)
  2. start up the commandline client with mongo
  3. type in use databaseName to switch to the database that you're looking through

From there, you can start querying for data, inserting documents, etc. These basic create, read, update, and delete operations are called CRUD operations…

CRUD!?

(C)reate, (R)ead, (U)pdate, and (D)elete operations:

  • db.[collection].insert(obj)
    • db.Person.insert({'first':'bob', 'last':'bob'})
  • db.[collection].find(queryObj)
    • db.Person.find({'last':'bob'})
    • db.Person.find() // finds all!
  • db.[collection].update(queryObj, queryObj)
    • db.Person.update({'first':'foo'}, {$set: {'last':'bar'}})
  • db.[collection].remove(queryObj)
    • db.Person.remove({'last':'bob'})


Where queryObj is a name value pair that represents the property you're searching on… with a value that matches the value you specify

More Examples

As prep for the next part, some insert and finds (with a test for greater than!)

Inserting, finding all, then finding by exact number of lives:


> db.Cat.insert({name:'foo', lives:9})
WriteResult({ "nInserted" : 1 })
> db.Cat.find()
{ "_id" : ObjectId("57ff86a14639d0fd263f87a0"), "name" : "foo", "lives" : 9 }
> db.Cat.find({lives:9})
{ "_id" : ObjectId("57ff86a14639d0fd263f87a0"), "name" : "foo", "lives" : 9 }

Inserting more, then using greater than!


> db.Cat.insert({name:'bar', lives:2})
WriteResult({ "nInserted" : 1 })
> db.Cat.insert({name:'qux', lives:5})
WriteResult({ "nInserted" : 1 })
> db.Cat.find({lives: {$gt: 4}})
{ "_id" : ObjectId("57ff86a14639d0fd263f87a0"), "name" : "foo", "lives" : 9 }
{ "_id" : ObjectId("57ff86c14639d0fd263f87a2"), "name" : "qux", "lives" : 5 }

Using MongoDB in Express

As with everything else we've done in node, there's a module for our specific task. If we'd like to use MongoDB in our application, there are a few options:

  • mongodb - the officially supported driver from MongoDB; optimized for simplicity
  • mongoose - lots of features, more complex, based on mongodb
  • monk - somewhere between mongoose and mongodb in terms of features and complexity (for example, no models)


We'll be using mongoose, as it seems to have the most traction out of the three. (But it's a bit more complicated than it needs to be).

ORM / ODM

Has anyone heard of ORM or ODM before?

  • ORM - object relational mapper
  • ODM - object document mapper


Both map objects in your application to their counterparts in your database (tables, collections). Mongoose is our ODM.

Mongoose Concepts

  • schema - analogous to a collection
  • model - the actual constructors that we use to create objects
  • object - a single document

A Quick Example of Storing Cat Names!

Let's use MongoDB and Mongoose to store our classic list of cat names.

Install Mongoose


npm install --save mongoose

Set Up Your Connection

For simplicity, we'll dump everything in a file called [PROJECT ROOT]/db.js for now. We'll see other ways of laying things out.


In db.js:


// as always, require the module
const mongoose = require('mongoose'); 

// some extra stuff goes here...

// connect to the database (catdb)
mongoose.connect('mongodb://localhost/catdb');

Create a Schema

Between your require and connect… create a schema. A schema represents a MongoDB collection. Here we're specifying a collection for Cats.

  • the cat schema will allow us to read objects from the collection
  • as well as modify and add objects to the collection



// define the data in our collection
const Cat = new mongoose.Schema({
	name: String,
	updated_at: Date
});

// "register" it so that mongoose knows about it
mongoose.model('Cat', Cat);

Using Our db Module

In app.js, simply:


require( './db' );


  • this will initialize our connection to the database when our application runs
  • it also sets up our schemas, so we can use them in our routes

Using Schemas

Ostensibly, we would want to create, update, read or delete data based on what page (path/url/etc.) we're on. Let's start by adding some setup to our index.js routes.


const mongoose = require('mongoose');
const Cat = mongoose.model('Cat');

Cats. Because Why Not?

Let's create a simple site that saves a bunch of cat names.

(for when we adopt three new adorable cats for our class)

What pages do you think we should have?

  • minimally…
  • a list of cat names
  • a form that allows you to add cat names

URLs and Routes

So what kind of routes will we need?

  • GET a form (perhaps /cats/create)
  • accept POSTs to that form (/cats/create again)
  • GET a list of cat names (how about… just /cats? yeeaah.)

/cats

Use the schema's find method to read objects (of course, we have to define a callback that gets triggered when the read is done)!


router.get('/cats', function(req, res) {
	Cat.find(function(err, cats, count) {
		res.render( 'cats', {
			cats: cats
		});
	});
});

/cat/create

We'll also need a form. We can handle this one pretty easily.



router.get('/cat/create', function(req, res) {
  res.render('create');
});

/cats/create

Let's accept posts to /cat/create. We'll use our schema to create an object:

  • create a new Cat object
  • set its properties
  • call save
  • tell save what to do when it finishes saving (callback, ftw)

router.post('/cat/create', function(req, res) {
	console.log(req.body.catName);
	new Cat({
		name: req.body.catName,
		updated_at : Date.now()
	}).save(function(err, cat, count){
		res.redirect('/cats');
	});
});

And Some Templates….

Our is essentially the same as all of the previous forms we've had


<form method="POST" action="">
cat name plz
<input type="text" name="catName">
<input type="submit">
</form>

And a Template for the List

As with our previous templates, we'll just loop through all of the objects that we retrieve.

  • note that because we have a list of objects…
  • we can reference each property name, rather than using this
  • for our example, name is a property of each object

<ul>
{{#each cats}}
<li>{{name}}</li>
{{/each}}
</ul>