Introduction to Tectonic

Learn the gap between different distributed file systems and reasons for creating Tectonic.

Storage systems—specialization vs. generality

Over the years, organizations have built large distributed storage systems to meet their evolving needs. Such systems are often optimized for specific use cases and might not be a good fit for a general storage workload. The operational complexity of evolving and maintaining many storage systems takes its toll in terms of monetary cost and potential duplicate work. As operational experience with specialized systems builds, system designers often get new insights on how they could use a single generalized system to meet the needs of many use cases.

Note: In system design, we often start with a specialized system that is optimized for a specific use case. Over time it might be possible to consolidate many such specialized systems into one general system, until we get some new use case that a general system is not able to meet. The design activity acts like a pendulum between specialized and general systems over here.

The Facebook service is a canonical example where data needs are diverse in terms of workload, and overall data size is huge and increasing. In the following lesson, we’ll discuss Facebook’s storage systems to better understand specialization versus generalization, in the context of storage systems.

Facebook: From a constellation of storage systems to Tectonic

There are numerous different tenantsA tenant can be considered an organizational division or group with well-defined and coherent business and technical objectives. and hundreds of use cases/applications per tenant, for a variety of storage needs. Blob storage and data warehousing are two major storage applications with different workload characteristics and storage needs.

For a blob store, data access patterns change over time. Some proportion of data is heavily accessed and such a workload needs a substantial number of input-output per second (IOPS) to serve the clients well. Over time, as new hot data comes in, while the older data starts becoming cool off as fewer read/write requests come in for such data. Such data has much lower needs in terms of IOPS but an always growing ...