Exploring the data model
The internals of DocumentDB can themselves be described in a JSON document itself. The following figure displays a hierarchy of DocumentDB and its entities. This is called the DocumentDB resource model and all the entities are called resources.
DocumentDB account
The main entry point is obviously a DocumentDB account. You need to have an account before you can start working with it. An account can contain databases and, as part of a preview feature, an amount of blob storage for attachments.
All resources are accessible through a logical URI, for example, the database with the name persons
can be addressed through the logical URI /dbs/persons
and the document describing the person John Doe, which has an ID of 12345
, can be addressed by the local URI /dbs/persons/<collectionid>/docs/12345
.
Creating databases
A database is a container where documents are stored inside collections. Part of the database also forms a user's container. The user's container has a set of permissions, and the permissions to access collections, UDFs, triggers and SP are set on a database level. We can grant access to users in a fine-grained manner for accessing collections and documents.
A database is scaled elastically and does not need any interference from the account owner. It can scale from megabytes to petabytes. The data is stored on an SSD disk, providing fast access to your documents.
Databases are spanned implicitly across different underlying machines in order to provide the level of scaling we need.
Administering users
A user stored in DocumentDB is an abstraction of the concept user. A user is not a person logging in, but a set of permissions. Eventually, a DocumentDB user can be mapped to an Active Directory user or some third-party identity management system.
A simple, straightforward way of designing the user model is to create exactly one user per tenant or customer. That user only has permission to access collections and documents inside one database. This is the database belonging to the designated tenant and/or customer. This is a flat user model, but it is also possible to create user identities corresponding to actual users representing certain personas. This can give you more fine-grained control but will also increase the burden of user administration.
Users can be managed through the Azure portal (portal.azure.com) and also via the rich REST API or client SDKs that are provided by Microsoft.
Setting permissions
Implicitly, DocumentDB contains two types of roles: an administrator and a user. The administrator is the one that has the permission to manage and manipulate database accounts, databases, users, and permissions. These are considered the administrative resources, analogous to the metadata describing the full DocumentDB ecosystem. The administrator is provided with a master key. The master key is part of the DocumentDB account and is provided to the one setting up the account.
A user, on the other hand, is the person who manipulates actual data inside collections and documents, or changes UDFs (application resources). A user gets a resource key that provides access to specific application resources like databases and collections.
Managing collections
A collection can be described as a container for all the documents, but it is also a unit of scale. Adding collections will result in some SSD storage to be allocated for that particular collection. As we saw before, setting the performance level is done on a collection level. Inside your database, you can have multiple collections, each having their own performance level (S1, S2, or S3). For example, we could have a UserProfile
collection containing documents with specific profile information like addresses, images, UI preferences, and so on. This collection is queried once a user logs in to your system and the profile information is retrieved from the UserProfile
collection. Imagine we have another collection containing all the products that can be ordered. This collection will be accessed more frequently, hopefully, and therefore we can set an S3 level on this collection to provide the best performance for our potential buyers.
Collections grow and shrink implicitly when documents are added or removed. There is no need to allocate space or do other preconfiguration steps.