query

RavenDB Staleness consistency when projecting

Upon performing a query, RavenDB allows you to return whole documents, or just a subset of data from each document (aka projection query). I won’t demonstrate this feature, as you can easily learn about it from this source.

Whenever you issue a projection query, all the documents matching your query filters will be internally loaded by RavenDB, then all the fields you requested will be projected out of the documents, and returned to you. The time necessary to execute this operation depends on the number and size of your documents, and many other factors (e.g.: if documents are already present in RavenDB’s internal cache). To speed things up, RavenDB allows you to designate fields to be stored in the index. This way, the database won’t have to load all the documents, as it will find all the necessary fields in the index used for querying, and retrieving them from the index is extremely fast. However, more fields stored in the index results in an increased index size, and after a certain point, I think it is better to just let RavenDB load the whole document, and apply the projection himself, rather then storing all the fields of a document in the index.

To sum it all up, this is how projections are handled by RavenDB:

  1. It will try to fetch the projected fields from the index-stored fields in the index used for querying.
  2. If not all desired fields could be resolved from the index-stored fields, it will proceed to load the whole document, and project the remaining fields from it.

Now, let’s go through an example where the behavior of combining index-stored fields with document projected fields can lead to unexpected results. Let’s say we have this class/document:

public class Invoice
{
   public string Id { get; private set; }
   public decimal Price { get; private set; }
   public decimal Fees { get; private set; }
   public decimal TotalValue { get; private set; }

   public void SetPrices(decimal price, decimal fees)
   {
      Price = price;
      Fees = fees;

      TotalValue = Price + Fees;
   }
}

Please note that the TotalValue is made out by adding Price and Fees together.

Let’s define an index so that we can query our invoices.

public class InvoiceIndex : AbstractIndexCreationTask
{
   public InvoiceIndex()
   {
      Map = invoices => from invoice in invoices
                        select new
                        {
                           invoice.Price,
                           invoice.Fees,
                           invoice.TotalValue
                        };

      Store(x => x.TotalValue, FieldStorage.Yes);
   }
}

Please note that TotalValue is stored in the index.

Let’s issue a projection query against this index.

var invoices = session.Query<Invoice, InvoiceIndex>()
                      .Select(x => new
                                    {
                                       x.Price,
                                       x.Fees,
                                       x.TotalValue
                                    })
                     .ToArray();

Now, the question of the day:
After performing the above projection query, will we reach the value of the TotalValue if we sum the Price and Fees together? We should, right? There is only one way to change the Price and the Fees (as they have private setters – lets keep it simple and not think about reflection), and every time that happens, the TotalValue gets updated too.

What if I told you that the following test fails?

[Fact]
public void ProjectedQueryHasTheSameStalenessLevel()
{
   SetupTestInvoiceAndIndex();

   StartChangingTheTestInvoiceLikeCrazy();

   while (true)
   {
      using (var session = _store.OpenSession())
      {
         var invoices = session.Query<Invoice, InvoiceIndex>()
                               .Select(x => new
                                 {
                                    x.Price,
                                    x.Fees,
                                    x.TotalValue
                                 })
                               .ToArray();

         var targetInvoice = invoices.Single();

         var expected = targetInvoice.Fees + targetInvoice.Price;
         var actual = targetInvoice.TotalValue;

         Assert.Equal(expected, actual);
      }
   }
}

What the test is doing is pretty simple, it creates one invoice (SetupTestInvoiceAndIndex), then it spawns a few threads that will continuously load the invoice, call SetPrices with random values, and store the invoice (StartChangingTheTestInvoiceLikeCrazy). Next, it issues a projection query and verifies if Fees+Price==TotalValue. You can find the complete source code here.

The reason why this is happening, is because index-stored fields are subject to staleness, while the document is always up-to-date, therefore you may get data from two different points in the lifetime of the document (if the index is stale). The picture bellow depicts the value of the TotalPrice index-stored field across document change and index operations.

Index-stored fields evolution across document updates

Index-stored fields evolution across document updates

A step by step description of what the test is doing is the following:

  1. SetupTestInvoiceAndIndex method introduces the first phase in the above picture – creates an invoice and waits until it gets indexed.
  2. StartChangingTheTestInvoiceLikeCrazy method introduces the second phase – does a lot of changes to the invoice, thus making the index return stale results.
  3. The test runs a query projecting Price, Fees, and TotalPrice.
  4. Because the query accepts stale results, projection occurs as following:
    1. Price – projected from the document – always up-to-date
    2. Fees – projected from the document – always up-to-date
    3. TotalPrice – projected from the index – subject to staleness
  5. Therefore, Price+Fees might not always equal TotalPrice

Of course, you can work around this by:

  • not accepting stale results
  • retrieving the whole document
  • adding Price and Fees as index-stored fields
  • removing TotalPrice from the index-stored fields

While I can see the benefits of trying to resolve projections directly from the index-stored fields, I don’t see mixing stale data with up-to-date data such a good idea. My opinion is that normal projections should always be performed from the document, and you should use ProjectFromIndexFieldsInto<T> when you want to project from the index-stored fields, but there should not be any fallback, meaning that if some fields could not be found in the index-stored fields, the query should fail.