Provisioning Storage

Once we have selected the appropriate storage type, we need to plan our provisioning. This means we will determine precisely how to get the storage available that we need. Does it start by answering some key questions. What data requires storage now that we know what the data is and the data format and structure? In other words, is it going to work with object store or file store like buckets, or do we need to store it in a block storage solution? How much data is there? How much space do we need in order to store the data? How sensitive is it? What kind of security are we going to have to implement? What policies relate to storing the data, and what regulations relate to storing the data? So we might have business rules, and there might be regulations that impose rules upon us, like storing personally identifiable information or PII. There are often regulations that control how and what we store.

Regarding provisioning, we have two concepts that we want to explore: thick and thin storage.

Thick storage allocates virtual storage at the time of the request. So let us say we determine we want terabytes of storage. Then we get all two terabytes at once, and it may cost more because we have allocated it ahead of time. However, it potentially provides improved performance because the storage does not have to grow over time as we need it to. It is mainly used when creating a virtual DAS, virtual direct-attached storage. For example, in AWS, we create an EBS volume and elastic block store volume. Furthermore, we may pre-provision that to the size we need it to be.

Then there is thin storage provisioning that allocates storage as needed. It costs a lot less, but it may not perform well, particularly for virtual DAS. It is primarily used for file and object store, though, so this would be an S3 bucket; whereas we put more stuff in the bucket, the storage we are using grows, and we only pay for what we are using.

We also have to think about the sensitivity of that data and, therefore, encryption. When we think about encryption, we think about it a couple of ways, at-rest or in-transit.

At-rest encryption is used on storage media. So this has to do with where the data is stored. It protects against unauthorized local access or data access with media theft. We are sure we have watched those police shows or investigative shows where they have a drive, and they say we have got this drive, and then they say, oh, but it is encrypted. So if someone steals the actual disk drive, but the data is encrypted, it is harder for them to get it off. How long is it going to take us to break that encryption? About 20 minutes. The biggest joke in the history of time, because most encryption people would use on the drive, will make us 20 weeks to 20 years. Nevertheless, they have a little fun with it at our expense. So the reality is that if we encrypt using solid encryption, like the advanced encryption standard at 128 bits on a drive, and they cannot get access to our keys, guess what? It will take much longer than that, 20 minutes even using special government technology. So the critical thing is to know that encryption at risk protects us from stealing our data when they steal our drive, or someone can locally walk up to it and try to get access.

In-transit encryption is about protecting the data while it moves across our network. So this protects against eavesdropping and men in the middle attacks. Eavesdropping means they are just capturing the frames as they go on the network, frames of those bundles of information that traverse our network. Moreover, if they capture them, they can read them if they are not encrypted.

Man in the middle means we place ourselves between someone and the thing we are using and everything we think we are sending to the thing we are using. Someone is sending it to us, and then we are sending it to the thing we are using. It is responding to us, and then we are responding to someone. So we get everything in the middle. However, if it is encrypted, we are getting no advantage there. So those are some key things to think about as to whether or not we want to use encryption.

We also have this concept of tokenization. A tokenization system allows sensitive data to be stored in a more secure storage solution. In other words, the token is stored in the place of the data in the typical data storage system, and then it is used to retrieve the actual data from the more secure storage. So high secure storage is usually more expensive per megabyte. Right. So what we can do is take our supersensitive data, put it in that high-security system, and then have a token in our less secure system. So every time our system runs into that token, it knows, oh, we have got to retrieve this from the vault. OK, so examples would be data like personally identifiable information PII protected health information PHI. There might be regulations in place that say we have to protect it, store a token in place of the Social Security number, for example. Then when we need the Social Security number, we can retrieve it. Our strategic use could be that part of the token is the last four of the Social Security number. So the only thing that’s available to us without going into the vault is the last four. We could always go to the vault if we needed the entire Social Security number. So that is a general example of the concept. It is often used for payment processing, though, so that our credit card number does not have to be readily available in the standard access system. So tokenization is a way to keep the data but not always have it front and center available to everyone that can access our typical data store.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy