Today, I'm starting with OpenStack Object Storage, also known as Swift:
1. What are the benefits of OpenStack Object Storage?
- Uses commodity hardware (lower price//GB)
- HDD/node failure agnostic (data redundancy)
- Unlimited storage (large and flat namespace, serves content directly from storage system)
- Multi-dimensional scalability (You can scale vertically by adding disks to a single node and horizontally by adding nodes)
- Account/container/object structure (no nesting, not a traditional file system, scales to multiple petabytes and billions of objects)
- Built-in replication (default=3X) + data redudancy (versus 2X on RAID), configurable
- Easily add data capacity via elastic data scaling (versus RAID resize)
- No central database (no bottleneck)
- RAID not required (handles lots of small random reads/writes efficiently)
- Built-in management tools (Account Management, Container Management, Monitoring)
- Drive auditing (Detects drive failurespreempting data corruption)
- Expiring objects (Users can set an expiration time or a TTL on an object)
- Direct object access (Enables direct browser access to content)
- Realtime visibility into client requests
- Supports S3 API
- Restricts container usage per account
- Supports NetApp, Nexenta, and Solidfire storage systems for block volumes
- Includes snapshot and backup API for block volumes
- Standalone volume API (integrates with other compute systems)
- Integration with OpenStack Compute (full integrated for attaching block volumes and usage reporting)
2. How do you delete a container or objects within a container from the command line?
- swift delete
3. How do you download objects from containers?
- swift download
4. How do you list the containers for an account?
- swift list
5. How do you list objects inside a container?
- swift list CONTAINER
6. How do you update meta info for an account, container, or object, or create a container if not present?
- swift post
7. How do you display info for an account, container or object?
- swift stat
8. How do you get the full StorageURL for an account?
- swift stat --verbose
9. How do you display info for an object?
- swift stat CONTAINER OBJECT-FILENAME
10. How do you upload files or directories?
- swift upload CONTAINER OBJECT-FILENAME
11. What are the 4 layers of swift server processes?
- PROXY LAYER - the public face of swift, communicates with external clients, first and last to handle API requests, determines correct storage node responsible for the data (based on a hash of the object name), sends request to those servers concurrently. If primary storage nodes are unavailable, proxy choose an appropriate hand-off node.
- ACCOUNT LAYER - handles metadata requests for individual accounts or the list of containers within each account (this info is stored in a SQLite database on disk)
- CONTAINER LAYER - handles requests regarding container metadata or the list of objects within each container. This list of objects doesn't contain info about object location. It only says that the object belongs to a particular container. Again, info is stored in a SQLite database.
- OBJECT LAYER - actually stores objects on drives of its node. Objects are stored as binary files using a path that is made up in part of its associated partition and the operation's timestamp. This allows multiple version of an object to be stored. Metadata is stored within file's extended attributes (xattrs) - this means the data and metadata are stored together and copied as a single unit.
12. How does swift provide data durability?
- By writing multiple (typically 3) complete copies of the data.
13. What are the 3 rings in OpenStack Object Storage (Swift)?
- ACCOUNT RING - used to determine where account data is located
- CONTAINER RING - used to determine where container data is located
- OBJECT RING - used to determine where object data is located
14. What is a ring in OpenStack Object Storage (Swift)?
- Each ring is a modified consistent hashing ring distributed to every node in the cluster. Basically, a modified consistent hashing ring contains a pair of lookup tables that swift processes and services use to determine data locations. One table has info about the drives in the cluster. The other table is used to look up where any piece of account, container, or object data should be placed.
15. What's a hashing ring?
- A hashing ring is the full range of possible hash values that can be calculated when hashing storage locations.
16. What makes Swift's hashing ring consistent?
- The ring is chopped up into a number of parts, each of which gets a small range of the hash values associated to it. These chopped up ring parts are the partitions of Swift. Partitions are a set number and uniform in size.
17. What do you mean by modified in "modified consistent hashing ring"?
- As a ring is built, partitions are assigned to drives in the cluster. The partition is just a directory sitting on a disk with a corresponding hash table of what it contains.
18. The more drives in a cluster, the fewer the ...
- ... partitions per drive. Ex. If you have 150 partitions and 2 drives, each drive will have 75 partitions mapped to it. If you add a new drive, each of the 3 drives would have 50 partitions.
19. What is the smallest unit of that Swift works with?
- Partitions are the smallest unit of data that Swift works with. As data is added to partitions, consistency processes check the partitions, and partitions are moved to new drives. By having many actions happen at the partition level, Swift is able to keep CPU and network traffic low. As the system scales, behavior continues to remain predictable as the number of partitions remains fixed.
20. What is a Swift replica?
- Partitions that are replicated. During the initial creation of Swift rings, every partition is replicated, and each replica is placed as uniquely as possible across the cluster. Each subsequent rebuilding of the rings will calculate which, if any, of the replicated partitions need to be moved to a different drive.
21. What happens when a drive fails?
- When a drive fails, Swift's replication/auditing processes notice and push the missing data to handoff locations/drives. The probability that all replicated partitions across the system become corrupt before the cluster notices and is able to push the data to handoff locations is very small. This is why Swift is called durable.
22. What are the 2 internal data structures of the ring?
- THE DEVICES LIST - populated with all of the devices that have been added to a special ring building file. Each entry for a drive includes ID#, zone, weight, IP, port, and device name.
- THE DEVICES LOOKUP TABLE - contains 1 row per replica and 1 column per partition. This generates a table that is typically 3 rows and thousands of columns. When a ring is built, Swift calculates the best drive to place each partition replica using drive weights and the unique-as-possible placement algorithm.
23. What does the hash value of a storage location map to?
- The partition value.
24. Where is the partition value used by the proxy server process?
- On the devices lookup table.
25. When a ring is built, how is the total number of partitions calculated?
- With a value called partition power. Should not be changed. (so the total number of partitions will remain the same in the cluster)
26. What is the default replica lock time (so partition replicas are not movable to ensure data availability during a partition move)?
- 24 hours.
27. How does drive weight affect how many partitions are assigned to a drive?
- The higher the weight, the greater the number of partitions.
28. How does unique-as-possible placement work?
- An algorithm identifies the least used place in the cluster. If all regions contain a partition, it then looks for the least used zone, then least used server (IP:port), and finally the least used disk. Also, an attempt is made to place replicas as far from each other as possible.
29. What parts make up a request sent to Swift?
- HTTP verb (GET, PUT, DELETE)
- Auth info
- Storage URL (swift.example.com/v1/account)
- Cluster location (swift.example.com/v1)
- Storage locations for an object (/account/container/object)
- Optional metadata or data