RavenDB

RavenDB Staleness consistency when projecting

Upon performing a query, RavenDB allows you to return whole documents, or just a subset of data from each document (aka projection query). I won’t demonstrate this feature, as you can easily learn about it from this source.

Whenever you issue a projection query, all the documents matching your query filters will be internally loaded by RavenDB, then all the fields you requested will be projected out of the documents, and returned to you. The time necessary to execute this operation depends on the number and size of your documents, and many other factors (e.g.: if documents are already present in RavenDB’s internal cache). To speed things up, RavenDB allows you to designate fields to be stored in the index. This way, the database won’t have to load all the documents, as it will find all the necessary fields in the index used for querying, and retrieving them from the index is extremely fast. However, more fields stored in the index results in an increased index size, and after a certain point, I think it is better to just let RavenDB load the whole document, and apply the projection himself, rather then storing all the fields of a document in the index.

To sum it all up, this is how projections are handled by RavenDB:

  1. It will try to fetch the projected fields from the index-stored fields in the index used for querying.
  2. If not all desired fields could be resolved from the index-stored fields, it will proceed to load the whole document, and project the remaining fields from it.

Now, let’s go through an example where the behavior of combining index-stored fields with document projected fields can lead to unexpected results. Let’s say we have this class/document:

public class Invoice
{
   public string Id { get; private set; }
   public decimal Price { get; private set; }
   public decimal Fees { get; private set; }
   public decimal TotalValue { get; private set; }

   public void SetPrices(decimal price, decimal fees)
   {
      Price = price;
      Fees = fees;

      TotalValue = Price + Fees;
   }
}

Please note that the TotalValue is made out by adding Price and Fees together.

Let’s define an index so that we can query our invoices.

public class InvoiceIndex : AbstractIndexCreationTask
{
   public InvoiceIndex()
   {
      Map = invoices => from invoice in invoices
                        select new
                        {
                           invoice.Price,
                           invoice.Fees,
                           invoice.TotalValue
                        };

      Store(x => x.TotalValue, FieldStorage.Yes);
   }
}

Please note that TotalValue is stored in the index.

Let’s issue a projection query against this index.

var invoices = session.Query<Invoice, InvoiceIndex>()
                      .Select(x => new
                                    {
                                       x.Price,
                                       x.Fees,
                                       x.TotalValue
                                    })
                     .ToArray();

Now, the question of the day:
After performing the above projection query, will we reach the value of the TotalValue if we sum the Price and Fees together? We should, right? There is only one way to change the Price and the Fees (as they have private setters – lets keep it simple and not think about reflection), and every time that happens, the TotalValue gets updated too.

What if I told you that the following test fails?

[Fact]
public void ProjectedQueryHasTheSameStalenessLevel()
{
   SetupTestInvoiceAndIndex();

   StartChangingTheTestInvoiceLikeCrazy();

   while (true)
   {
      using (var session = _store.OpenSession())
      {
         var invoices = session.Query<Invoice, InvoiceIndex>()
                               .Select(x => new
                                 {
                                    x.Price,
                                    x.Fees,
                                    x.TotalValue
                                 })
                               .ToArray();

         var targetInvoice = invoices.Single();

         var expected = targetInvoice.Fees + targetInvoice.Price;
         var actual = targetInvoice.TotalValue;

         Assert.Equal(expected, actual);
      }
   }
}

What the test is doing is pretty simple, it creates one invoice (SetupTestInvoiceAndIndex), then it spawns a few threads that will continuously load the invoice, call SetPrices with random values, and store the invoice (StartChangingTheTestInvoiceLikeCrazy). Next, it issues a projection query and verifies if Fees+Price==TotalValue. You can find the complete source code here.

The reason why this is happening, is because index-stored fields are subject to staleness, while the document is always up-to-date, therefore you may get data from two different points in the lifetime of the document (if the index is stale). The picture bellow depicts the value of the TotalPrice index-stored field across document change and index operations.

Index-stored fields evolution across document updates

Index-stored fields evolution across document updates

A step by step description of what the test is doing is the following:

  1. SetupTestInvoiceAndIndex method introduces the first phase in the above picture – creates an invoice and waits until it gets indexed.
  2. StartChangingTheTestInvoiceLikeCrazy method introduces the second phase – does a lot of changes to the invoice, thus making the index return stale results.
  3. The test runs a query projecting Price, Fees, and TotalPrice.
  4. Because the query accepts stale results, projection occurs as following:
    1. Price – projected from the document – always up-to-date
    2. Fees – projected from the document – always up-to-date
    3. TotalPrice – projected from the index – subject to staleness
  5. Therefore, Price+Fees might not always equal TotalPrice

Of course, you can work around this by:

  • not accepting stale results
  • retrieving the whole document
  • adding Price and Fees as index-stored fields
  • removing TotalPrice from the index-stored fields

While I can see the benefits of trying to resolve projections directly from the index-stored fields, I don’t see mixing stale data with up-to-date data such a good idea. My opinion is that normal projections should always be performed from the document, and you should use ProjectFromIndexFieldsInto<T> when you want to project from the index-stored fields, but there should not be any fallback, meaning that if some fields could not be found in the index-stored fields, the query should fail.

Notifications from RavenDB server

Introduction

Nowadays, all too often we are facing the need to make data available to our clients as soon as we put our hands on it (aka stored it in the database). Depending on the used technology stack and the nature by which we get the data, this job can become harder to achieve. However, RavenDB features the Changes API, which aims at notifying you, that something has made it’s way into the database (or got changed). There are several type of notifications that get pushed to you, and while their names are self-explanatory, we can talk about their usage scenarios:

  • ForAllDocuments
    • This is a general purpose notification, and well suited for a rather static database, that doesn’t suffer frequent changes;
  • ForDocumentsStartingWith
    • This can be used for general change notification of documents of a type (ex: Trades, Users, etc);
  • ForDocument
    • The most fine grained document change notification – it allows you to track a single document for changes (for example: it gives you the ability to notify the current user that the document has been changed server-side, and he or she should refresh);
  • ForAllIndexes
    • General purpose notification, similar to ForAllDocuments, except that the target of monitoring is the indexes and not the documents;
    • Note! You will receive notifications only for indexes defined by you – thus, you will not get a notification for the Raven/DocumentsByEntityName index;
  • ForIndex
    • The most fine grained index change notification, triggered every time an indexed document is changed;
    • Note! If you have an index which targets documents of type People, then every time a People document get changed you will receive an index changed notification (this happens regardless if the document will end up in the index or not);
  • ForBulkInsert
    • Lets you know when a bulk insert operation starts (DocumentChangeTypes.BulkInsertStarted), finishes (DocumentChangeTypes.BulkInsertEnded), or errors (DocumentChangeTypes.BulkInsertError)
  • ForAllReplicationConflicts
    • Lets you know when a replication conflict happened including a payload letting you know the replication type that failed (ReplicationConflictTypes.DocumentReplicationConflict or ReplicationConflictTypes.AttachmentReplicationConflict), and the attempted operation (ReplicationOperationTypes.Put or ReplicationOperationTypes.Delete)

Subscribe, handle, and unsubscribe

There are two ways of subscribing to these notifications, one of them is by creating a type implementing the IObserver<T> (where T represents the notification type published by the notification you are subscribing to):

var connectionTask = await _store.Changes().ForIndex("MyIndex").Task;

connectionTask.Subscribe(new IndexChangeObserver());

//IndexChangeObserver is your custom type that implements IObserver<IndexChangeNotification>

Or, and I recommend this approach, by adding a reference to Reactive Extensions, and handling notifications in a Rx manner:

var connectionTask = await _store.Changes().ForIndex("MyIndex").Task;

connectionTask.Subscribe(
   notification =>
   {
      //notification received
   },
   error =>
   {
      //error happened
   });

If you opt for using the second approach, then you can easily add more behaviour to your subscription. For instance, lets say that your database has a lot of changes, and you might not want to be notified for each change, but instead, be notified when a batch of changes finishes:

var connectionTask = await _store.Changes().ForIndex("MyIndex").Task;

connectionTask
   .Throttle(TimeSpan.FromSeconds(1))
   .Subscribe(
      notification =>
      { 
         //notification received
      },
      error =>
      {
         //error happened
      });

You can see that I am using the Throttle method, this causes all the notifications to be ignored, until there is a delay of 1 second between notifications, only then, the latest notification will be received by the subscriber.

Unsubscribing from the notifications follows the unsubscribing pattern from reactive extensions, and simply consists of calling Dispose on the subscription token received when subscribing:

var connectionTask = await _store.Changes().ForIndex("").Task;

var subscriptionToken = connectionTask.Subscribe(notification => { });

subscribeTask.Dispose(); //unsubscribing from notifications

 

Error handling

When it comes to error handling, RavenDb makes your life easier in the following ways:

  • If the server is offline when you try  to connect to it, the client will silently and infinitely continue to attempt to connect to the server
  • If the connection between the client and the server goes down, as previously mentioned, the client will continuously attempt to reestablish the connection, and the server will save the notifications which your client has missed. Therefore, when the connection is reestablished, the client will receive all the notifications that has missed while the connection was lost. An important note here, is that the server will only hold your missed notifications for about a minute, after that it will discard them, therefore, you have about a minute to get back in touch, and you wont miss anything, even though Ayende talks about this behavior in a blog post, I think that post is obsolete, as I wasn’t able to reproduce this behavior, nor do I see anything in the ravendb codebase that would achieve that. After further investigation, I found out that it was removed in this commit, due to this issue.

 

All in all, changes API is a neat feature of RavenDB, that can make some tasks incredibly easy to achieve. However, care must be taken when using it, as the client might miss server side events. This can be easily translated to: do not use this feature to implement your own caching – RavenDB already has several caches in place – there is no need for you to roll out your own.