How the modern IT services store the data, and what the clouds have to do with it

How do the modern applications store petabytes of data? We can tell you what you need to know about this to speak the same language with your IT Department, and how the data storage is connected to the clouds.

Modern applications generate a huge amount of information that requires a space to be stored somewhere. These amounts of data cannot be uploaded to a single server or disk, so the more complex technological solutions have become necessary. Through this, several basic approaches to the storage of application data have appeared. Let’s figure out what their features are.

The modern data storage methods are closely related to the cloud: it is possible to increase the amount of storage in the cloud almost indefinitely, without additional hardware purchases, and utilize the paid resources by 100%.

Block storages as the basic level of data storage in the cloud

The basic level of data storage in the cloud is block storage. Similarly to physical computer disks, a “block” is a virtual disk of virtual machines, so almost all cloud services have block storages “under the hood.”

Block storage is an analog to hard disks for virtual machines

 

Block storages lie in the core of the part of the cloud data storages that I’m going to talk about in this article.

Databases for the storage of applications data

Databases are the auxiliary blocks for almost any application. They store and process data that may take up a large amount of space. The data fragments stored in a database could be a number, a text fragment, or even a file. There is a great variety of database types: for transaction processing, for the storage of unstructured data, for caching the information, and for many other tasks.

Databases are more than adequate for typical and persistent operations. For example, the information about orders received in an online store is recorded to a database, and used by the app to issue an invoice for payment automatically.

Databases store the application data in a specific format and order.

 

Databases are deployed on their servers or as a cloud service, generally called DBaaS (DataBase As A Service). Cloud databases fulfill the same functions as regular databases, but they provide several benefits:

  • Any possible amount of storage — it could be expanded at any time without purchasing hardware.
  • Fault tolerance — built-in data backup, so if the hardware fails, the backups are always available.
  • Advanced security — DBaaS operate in highly reliable and secure environments, under the control of powerful technical protection tools and security experts.

The cloud databases PostgreSQL, MySQL, ClickHouse, Redis, and Arenadata DB, can be deployed in the cloud Mail.ru Cloud Solutions.

File storages — for a small office

As the name implies, the file storages operate with files. In this storage, files are arranged into directories and subdirectories (folders), and each file can be found by its name and path of nested directories. Files can be added, deleted, overwritten, read, or executed. We get used to seeing such file systems on our PCs and laptops.

The nested directories and files in the file storage. The directory hierarchy is the same as that one on personal computers.

 

File storages are convenient for people working directly with files, as many people are used to working with files in the directory tree. Some applications also use file storage in case if they mainly work with such objects as files and directories.

When there are not many files, file storage could manage it well. However, with a large number of files, the directory structure becomes cumbersome, and both search and access for files are slowing down. Therefore, file storages are suitable for simple office tasks — a collaboration of a small number of employees, file sharing, and storage of archives. But they are not suitable for large arrays of heterogeneous data that have to be quickly processed.

Like databases, file storages could be deployed both on physical hardware and in the cloud on virtual disks. Cloud storages can accommodate more files, and it is easier to provide remote access to them for employees. They can be rented from cloud providers, and usually, only the reserved amount of file storage is charged.

Object storages are intended for an unlimited amount of any data

In object storage, the data is placed without the folder hierarchy accepted in file storage. All the data is stored as objects, and anything can be taken as an object: a document, an image, a heavy video source file, or a code snippet.

Cloud storage are initially cloud-based, either rented from public cloud providers or built within a private cloud. In both cases, the advantage over file storages is that the speed of access to any object does not depend on the number of these objects. Placing a billion objects does not cause a drop in speed — it’s an inconceivable situation for file storage.

S3 object storage

Convenient storage in the cloud, from megabytes to petabytes of data

Go

As a public service, object storage is complemented by other several unique benefits:

  1. Possibility of unlimited changing of the amount of storage without reconfiguring applications working with this storage. Just for comparison, if the application uses a regular physical or cloud disk, the disk runs out one day. So, you not only need to add a new disk, but also to configure the app so that it knows when and which disk to access. There are no such problems with cloud storage: applications operate with any amount of storage by the same rules.
  2. Built-in data replication. Data is copied automatically, and these copies are stored on different servers in different datacenters. This ensures their safety and quick recovery, even if access to one of the copies fails.
  3. Seamless access to the storage objects for any number of users. You can place a video clip in the storage, and when tens of thousands of people are watching it at the same time, the number of their requests will not cause any problems.

In private clouds, all these properties have to be additionally provided for object storage.

The speed of accessing objects in the storage does not decrease with a large number of requests

 

Object storages are designed for being embedded into applications, so their main interface is a programming interface (API), i.e., the commands that the storage and applications are sending to each other. One of the most common APIs is S3 (Simple Storage Service), and the object storages that support it are called S3 storages. Object storages also have a “human-friendly” or user interface (UI) that allows, for example, loading an object and configuring the access to it.

Examples of using object storage:

  1. Storage of an archive of documents: files, letters, archived data, and regulatory documentation. Here we should specially highlight the storage of “heavy” files that are being accumulated, but it’s not supposed to buy more and more hardware for their storage: source media files, complete DNA sequences.
  2. Storage of unstructured data that does not possess a fixed format, and consists of objects of different size, type, and structure. In object storages, big data is often accumulated, and then the computer analysis could be applied to it for getting projections and making business decisions.
  3. Content distribution: for video hosting, photo banks, galleries, game code, and even for static (i.e., non-changing) website pages. It is important not only to place an almost unlimited number of objects, but also to arrange access to them for any number of users.
  4. Storage of backup copies (backups) of data. Object storage can be integrated with systems that make backups of any folders, disks, databases, and entire infrastructures, and the automatic backups with a saving of versions could be provided.

Sometimes object storages are still used as file systems when it is demanded to work with files in the directory hierarchy. Still, it is better to keep the benefits of object storage in terms of reliability and scalability. In this case, you can use additional utilities, such as Disk-O, to create a directory tree and simulate working with your storage objects as files. It is not difficult to use these utilities, so in some cases, it is possible to replace the file storage with object storage.

You can deploy object storage in your data processing center (DPC) as part of a private cloud, or use the services of cloud providers. Providers usually charge for the amount of storage, traffic, and requests to the storage, that may differ in different tariffs.

 

Cloud or physical storage?

In one form or another, most modern applications use databases, file, or object storages.

Storage on your physical hardware could be justified in case if you have a well-predictable future amount of storage, and if you have enough specialists to maintain this storage.

For running small applications and testing business hypotheses, it is better to use cloud storage that could be easily collapsed if the application does not succeed, and scaled if the client base begins to grow without restraint.