MongoDB

CSCI-UA.0480-008

So You Want to Persist Data

In our homework and class examples, where did we store our data? →

in a global variable in our Express application
and where does that application live? →
in memory!

What are some downsides to storing data as part of our application in memory? →

when you restart the server, you lose that data!

Storing Data

What are some other options… let's list as many as we can. →

on the filesystem
in the cloud (S3, firebase, ~~parse~~, SalesForce)
in a database

We'll be using a database in this part of the class…

SO MANY DATABASES

We can categorize databases as: →

relational
nosql (also non-relational)

nosql databases can be further categorized by the data model they use: →

key-value
document
column
graph

Warning: Broad Generalizations Coming Up!

So, the following slides include high level overviews of different kinds of databases.

I'm going to use broad generalizations
but there are always exceptions →
for example:
- although PostgreSQL is considered a relational databases, it has a built-in data type for key-value storage!
- many document stores can be used as key-value stores
- some document stores are also relational!

Relational Databases

Relational databases organize data in a collection of tables (relations). Can you describe characterstics of a relational database? →

each table has named columns… with the actual data that populates the table in separate rows
each table row has primary key that:
- uniquely identifies that row
- allows data in one table to be related to data in another (via foreign key relationships)

Relational Databases Continued

Regarding additional relational database features… →

relational databases are typically pretty rigid:
- highly structured
- you have to define the columns and the types of columns before inserting rows
- has a lot of features for maintaining data integrity (such user defined data constraints, foreign keys, etc.)
some relational databases guarantee that transactions (or changes in the database) are reliable
see ACID compliance - Atomicity, Consistency, Isolation, Durability
this database consultant has a pretty good write-up on relational databases

Aside on ACID

Atomicity - each transaction / (series of operations in a transaction) is all or nothing
Consistency - every transaction ensures that the resulting database state is valid (goes from one valid state to another)
Isolation - a failed transaction should have no effect on other transactions (even if the transactions are concurrent)
Durability - once a transaction / operation is done, the results will remain persistent even through crash, power loss, etc.

Quick Demo of Designing a Data Model for a Relational Database

Maybe we want to store these fields:

first name
last name
street address
city
state
zip

Let's get to it! →

Examples of Relational Databases

What are some examples of relational databases? →

MySQL
PostgreSQL
Oracle
Microsoft SQL Server

These are all great choices for storing highly structured data, related data.

They are all in common usage for conventional web applications. However, there's a bit of a learning curve, and some are difficult to set up.

NoSQL Databases

NoSQL databases can be categorized by how they store their data:

key-value
document
column
graph
there are others
- (such as object, tuple store, etc.)
- check out a whole list
note that nosql databases can have reliable transactions as well, but this is usually not the focus of a nosql database

We'll focus on key-value and document stores…

Key Value Store

Probably the most simple conceptually… data is stored in key/value pairs. This should sound similar to some data structures that you've seen before. →

maybe a hash
or a dictionary
or an associative array

They're typically good at scaling to handle large amounts of data and dealing with high volumes of changes in data.

What may be some good applications for key value stores? →

caching
storing sessions!

Key Value Store Examples

Some key value databases include:

Redis (a popular backend for queuing)
Memcache (as the name implies, typically used for caching)
Riak
many others

Document Stores

As you might guess by the name, document stores organize data semi-structured documents.

think JSON (but there are many possible formats, such as XML, YAML, etc.)
or… a richer key-value store (there's meta data within the document… the keys are usually meaningful)
typically, no schema is required (that is, data types of values are inferred from values)
typically, semi structured (documents, property names, etc… do not have to be pre-defined)
some document stores are particularly featureful when it comes to high availability and scaling (through replication/redundancy and sharding/separating large databases into smaller ones)

They're particularly good for applications where flexible data storage or constantly changing data storage is required.

Document Store Examples

Two of the most popular NoSQL databases are:

MongoDB
CouchDB

Of course, there are a bunch of others

Some use cases for document stores include:

applications that require semi structured data / data that has does not have rigid requirements (perhaps a resume)
again, large volumes of data
fluid data or data whose structure is prone to change

So Which One are We Using?

We're using MongoDB. Not for all of the reasons we previously mentioned, though… We're using it because… →

it uses a JSON like data structure (we know JSON)
it's query language is JavaScript (we know JavaScript syntax)
it's not very rigid when it comes to dealing with data (we don't have to be so precise/exacting)
it's fairly straightforward to set up, usually with little / no configuration required
(to the point where the default installation doesn't even require a username/password to connect to the database – wait, that's not so good!?)

All this can pretty much be summed up by saying that it's easy to use! (As an aside, I'm a bit biased to using relational databases, specifically Postgres)

MongoDB

MongoDB will be our data store that we use for this part of the course
It's a nosql database…
Specifically, it's a document store
- a single record in Mongo is a document (a user, a bird in the case of our homework!)
- a document is a bunch of key value pairs…
- hey… that sounds like… →
- documents are similar to JSON objects (actually BSON?)

Documents and Collections

A couple of terms to remember (yay, definitions again!)

key - a field name - analogous to a column in a relational database
value - obvs, a value
document - a single object or record in our database,
- consists of key value pairs
- similar to a single row in a relational database
collection - a group of documents
- analogous to tables in relational databases

Data Types

Although MongoDB doesn't require you to pre-define the types of values that your documents will have, it does have data types. These types are inferred from the value. Some available types include:

string - an empty string or an ordered sequence characters
numeric types - such as integer, double (float)
boolean - true / false
array - a list of values
timpestamp - 64 bit value where first 32 bits are seconds since the Unix epoch
Object ID every MongoDB object or document must have an Object ID which is unique

More about Object ID: a 12-byte binary value which has a very rare chance of duplication; consists of a 4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter

Installation

Comprehensive docs are here

basically, just use the appropriate installer from their downloads page
if you use a package manager, do that instead
- they have .debs for Debian and Ubuntu
- since I'm on OSX, and I use homebrew, I used brew install mongodb
starting will vary based on OS
you may need to create and/or specify a directory where your data will be stored, so if mongo doesn't start up, it's missing its data directory

A Whirlwind Tour

Working with MongoDB on the commandline…

If your OS doesn't autostart by default, you can run:


mongod

To connect via the commandline MongoDB client and connect to a locally running instance:


mongo

This drops you into the MongoDB shell (yay… more shell). You can issue commands that

inspect the database
modify and create documents and collections
find documents

Some Commands

The following commands can be used to navigate, create and remove databases and collections →

show databases - show available databases (remember, there can be more than one database)
use db - work with a specific database (if unspecified, the default database connected to is test)
show collections - once a db is selected, show the collections within the database
db.dropDatabase() - drop (remove) the database that you're currently in
db.collectionName.drop() - drop (remove) the collection named collectionName

To get some inline help:

help - get help on available commands

Starting Out

To begin using the commandline client to inspect your data: →

make sure that mongod is running in a different window (or running in the background or as a daemon)
start up the commandline client with mongo
type in use databaseName to switch to the database that you're looking through

From there, you can start querying for data, inserting documents, etc. These basic create, read, update, and delete operations are called CRUD operations…

CRUD!?

(C)reate, (R)ead, (U)pdate, and (D)elete operations: →

db.[collection].insert(obj)
- db.Person.insert({'first':'bob', 'last':'bob'})
db.[collection].find(queryObj)
- db.Person.find({'last':'bob'})
- db.Person.find() // finds all!
db.[collection].update(queryObj, queryObj)
- db.Person.update({'first':'foo'}, {$set: {'last':'bar'}})
db.[collection].remove(queryObj)
- db.Person.remove({'last':'bob'})

Where queryObj is a name value pair that represents the property you're searching on… with a value that matches the value you specify

More Examples

As prep for the next part, some insert and finds (with a test for greater than!) →

Inserting, finding all, then finding by exact number of lives:


> db.Cat.insert({name:'foo', lives:9})
WriteResult({ "nInserted" : 1 })
> db.Cat.find()
{ "_id" : ObjectId("57ff86a14639d0fd263f87a0"), "name" : "foo", "lives" : 9 }
> db.Cat.find({lives:9})
{ "_id" : ObjectId("57ff86a14639d0fd263f87a0"), "name" : "foo", "lives" : 9 }

Inserting more, then using greater than!


> db.Cat.insert({name:'bar', lives:2})
WriteResult({ "nInserted" : 1 })
> db.Cat.insert({name:'qux', lives:5})
WriteResult({ "nInserted" : 1 })
> db.Cat.find({lives: {$gt: 4}})
{ "_id" : ObjectId("57ff86a14639d0fd263f87a0"), "name" : "foo", "lives" : 9 }
{ "_id" : ObjectId("57ff86c14639d0fd263f87a2"), "name" : "qux", "lives" : 5 }

Using MongoDB in Express

As with everything else we've done in node, there's a module for our specific task. If we'd like to use MongoDB in our application, there are a few options:

mongodb - the officially supported driver from MongoDB; optimized for simplicity
mongoose - lots of features, more complex, based on mongodb
monk - somewhere between mongoose and mongodb in terms of features and complexity (for example, no models)

We'll be using mongoose, as it seems to have the most traction out of the three. (But it's a bit more complicated than it needs to be).

ORM / ODM

Has anyone heard of ORM or ODM before? →

ORM - object relational mapper
ODM - object document mapper

Both map objects in your application to their counterparts in your database (tables, collections). Mongoose is our ODM.

Mongoose Concepts

schema - analogous to a collection
model - the actual constructors that we use to create objects
object - a single document

A Quick Example of Storing Cat Names!

Let's use MongoDB and Mongoose to store our classic list of cat names.

Install Mongoose


npm install --save mongoose

Set Up Your Connection

For simplicity, we'll dump everything in a file called [PROJECT ROOT]/db.js for now. We'll see other ways of laying things out.

In db.js:


// as always, require the module
const mongoose = require('mongoose'); 

// some extra stuff goes here...

// connect to the database (catdb)
mongoose.connect('mongodb://localhost/catdb');

Create a Schema

Between your require and connect… create a schema. A schema represents a MongoDB collection. Here we're specifying a collection for Cats.

the cat schema will allow us to read objects from the collection
as well as modify and add objects to the collection


// define the data in our collection
const Cat = new mongoose.Schema({
	name: String,
	updated_at: Date
});

// "register" it so that mongoose knows about it
mongoose.model('Cat', Cat);

Using Our db Module

In app.js, simply:


require( './db' );

this will initialize our connection to the database when our application runs
it also sets up our schemas, so we can use them in our routes

Using Schemas

Ostensibly, we would want to create, update, read or delete data based on what page (path/url/etc.) we're on. Let's start by adding some setup to our index.js routes. →


const mongoose = require('mongoose');
const Cat = mongoose.model('Cat');

Cats. Because Why Not?

Let's create a simple site that saves a bunch of cat names. →

(for when we adopt three new adorable cats for our class)

What pages do you think we should have? →

minimally…
a list of cat names
a form that allows you to add cat names

URLs and Routes

So what kind of routes will we need? →

GET a form (perhaps /cats/create)
accept POSTs to that form (/cats/create again)
GET a list of cat names (how about… just /cats? yeeaah.)

/cats

Use the schema's find method to read objects (of course, we have to define a callback that gets triggered when the read is done)!


router.get('/cats', function(req, res) {
	Cat.find(function(err, cats, count) {
		res.render( 'cats', {
			cats: cats
		});
	});
});

/cat/create

We'll also need a form. We can handle this one pretty easily. →



router.get('/cat/create', function(req, res) {
  res.render('create');
});

/cats/create

Let's accept posts to /cat/create. We'll use our schema to create an object:

create a new Cat object
set its properties
call save
tell save what to do when it finishes saving (callback, ftw)


router.post('/cat/create', function(req, res) {
	console.log(req.body.catName);
	new Cat({
		name: req.body.catName,
		updated_at : Date.now()
	}).save(function(err, cat, count){
		res.redirect('/cats');
	});
});

And Some Templates….

Our is essentially the same as all of the previous forms we've had →


<form method="POST" action="">
cat name plz
<input type="text" name="catName">
<input type="submit">
</form>

And a Template for the List

As with our previous templates, we'll just loop through all of the objects that we retrieve. →

note that because we have a list of objects…
we can reference each property name, rather than using this
for our example, name is a property of each object


<ul>
{{#each cats}}
<li>{{name}}</li>
{{/each}}
</ul>